We all have had healthcare issues personally or know someone close to us who has been impacted by a health concern. Advances in technology, especially machine learning and AI, hold enormous potential to improve healthcare. We are witnessing clinical breakthroughs in diagnostics, alternative treatments, and personalized medicine, as well as transformations in business endeavors. We live in an increasingly “datafied” world and too often hear stories about algorithms being built on biased data and data science teams not even knowing if the most fundamental data used to train machine learning models are accurate. We must ask ourselves “what if we knew better?”
After years of working with healthcare data in its many and increasing forms, we recognized that this data is inherently messy, yet despite this, we also knew that this messy data, if improved, could lead to great things for patients, individuals, populations, and those serving these groups. What was missing was being transparent about data shortcomings. We took truth in analytics as our mission and spent years uncovering and then correcting problems in healthcare data. Starting with a database of over 310 million patients, our team of data scientists, clinicians, statisticians, and subject matter experts uncovered errors and then developed provable solutions to correct, enhance, and complete the data.
Real-world data is collected by a variety of sources, including claims databases, EHRs, labs, registries, apps, and the sources and amount of data being collected is rapidly expanding as are the applications. The problem is, all this data is not “answer-ready” and may have inherent limitations on how it can be used. For example, before using claims data, it is important to remember that while claims contain important data, their reason for being is a reimbursement mechanism. Its information can be divorced from real-world data in the context of conditions and co-morbidities and other data that may be missing.
If we want to be part of the solution to what ails healthcare, we must be transparent about the strengths and weakness, capabilities and biases of the data, and have honest conversations with our partners who are currently using data to uncover better care delivery, developing personalized medicine, understanding ways to control costs, and providing life-enhancing and life-saving pharmaceuticals to those who need them.
When it comes to machine learning and AI, having clean data sets to train models is never something to take for granted. It is critical because flawed and biased data leads to flawed and biased analytics. There are many reasons for the biases and flawed data, and I will write about that another time. For now, our guiding principles include asking fundamental questions like what if we could improve outcomes by delivering better, more complete, and accurate data.
We listen to what our partners want to accomplish and have honest conversations about what our data and technology can do to assist them in their discovery. It’s a collaborative approach but one we hope leads to better outcomes, no matter what the endeavor. Truth and transparency are necessary ingredients for building analytics, for adopting machine learning and AI, and for strong partner relationships.
We will continue to ask questions about how we can make healthcare data better and more easily accessed and adopted by those who rely on data. Our work in delivering accurate and complete data delivered in a platform that speeds time to adoption and innovation is a guiding principle because we know that we are not the experts in healthcare delivery solutions, finding medical breakthroughs and life-saving drugs, and other noble pursuits. What we can do is serve our healthcare industry experts so they have the best chances of success.
Strategic partnership enables automation of data enhancement for increased return on data investments, 8x faster results, and richer, more granular insights.
While many data scientists address missingness by simply omitting bad data and using clean-up methods such as imputation, doing so is a very risky proposition. Why?
Kythera Labs was an early adopter of Databricks, and we are a founding member of their Data Lake Technology Council. After evaluating Snowflake, we knew Databricks was the right solution for us. We were so convinced of its benefits, we became the first Databricks OEM for health providers.