Why We Built Our Platform On Databricks

Health Care organizations of all types and scale are leveraging RWD to achieve their strategic objectives

Kythera Labs is a technology company that accelerates the insights generated by life sciences and healthcare enterprises by unifying data with machine learning to make information more accurate and accessible. Kythera Lab’s all-in-one-cloud-based data science-enabled platform, Wayfinder, is built on Databricks and takes almost any healthcare source data and refines it into higher quality assets. We created Wayfinder to reduce burdens such as poor data quality, missing data and inherent biases that are typically found in accessing and using healthcare data so data users can have more confidence in the underlying data.

Wayfinder, powered by Databricks, is a highly efficient way to use data at scale and enable faster time to insights

Kythera Labs was an early adopter of Databricks, and we are a founding member of their Data Lake Technology Council. After years of experience and direct evaluation of the competition, we knew Databricks was the right solution for us. We were so convinced of its benefits, we became Databrick's first OEM partner for healthcare.

Before selecting Databricks, Kythera evaluated different solutions, including Snowflake. There were many apparent benefits of the Databricks technology. It has more language support, and we found the notebook interface is better for batch pipeline development. We also felt the scalable clusters with spot price management would be more cost effective in the long run. Real world data comes in many formats, including structured, semi-structured and unstructured, and based on our experience, Snowflake was not as capable for processing semi-structured data.

Wayfinder Powered by Databricks

Since data is the essential ingredient for solving health care challenges, Wayfinder needs to process our own proprietary data as well as data from external sources and our client’s own data. We needed the tools found in data lakes, like low-cost storage, being able to accommodate data in different formats, and scalability. We were also impressed by the parallelization, speed and access to data using Delta and Apache SparkTM, and the delta platform has continually improved in speed and efficiency due to the large community of contributors.



Recent Research from Barcelona Supercomputing Center found that Databricks was 2.7x faster and 12X better in terms of price performance.

Kythera Labs chose to power its platform, Wayfinder, on Databricks because it allows users of real world data and other healthcare data to access large amounts of data efficiently and cost effectively. For example, Kythera Labs’ own data assets include over 40 billion healthcare events from which we build case assets like surgical cases, hospital encounters, and other use case-specific refined data products. The recent research on Databricks record setting performance is consistent with what we have seen since our early adoption of Databricks SQL. Wayfinder’s accessibility allows users to select the data they need without having to manage any pipelines or move any data sets. Unlike traditional warehousing, using Wayfinder does not require the data to be loaded into a database before building business intelligence insights and dashboards. Not having to move data from platform to platform (i.e., BI data layers) and being able to work with it in place for analytics and BI has been a tremendous improvement in getting to insights quickly. Users can run their queries or build BI right on the platform in a matter of seconds and minutes.

Performance speed was on display at the Nashville Analytics Summit which took place in October where Clayton Severson, Kythera’s Director of Analytics, created a visualization using SQL Analytics (Databricks' proprietary business intelligence and data visualization tool), and instead of waiting minutes for the visualization to generate, he was able to take six years of claims history with 33,000,000,000 rows of data and create visuals and distinct counts in under a minute directly on the platform.


Kythera Labs were early adopters of SQL Analytics and Proton. We use these tools when working with our own data products to improve the quality of our data assets. Our data scientists work in a variety of programming languages which are natively supported by Databricks including Spark, Python, Scala, Java, R, and SQL. While the workbook interface is a useful tool for data scientists and analysts, I sometimes choose to make my own libraries in IDEs like IntelliJ. Also, Databricks made it very easy to develop and distribute our own extensions. Finally, being able to offer our internal team and our customers the choice of language is an important benefit.


As a Databricks Council Member, we are always eager to work with the new, powerful features as they are released in private previews. For example, the new Serverless processing layer for Databricks SQL is something we are excited to be engaged with and provide feedback to help get it ready for general availability to our clients. The reduced startup time and improved cost effectiveness will make the Wayfinder platform even more efficient for our clients.

Why We Built Our Platform On DatabricksLinkedIn

Matt Ryan

Chief Technology Officer

Matt leads the data engineering team and is responsible for architecture, engineering, and technical operations at Kythera Labs. He has over 30 years of experience in software development and enterprise architecture, including big data environments for healthcare, finance, and telecommunications. Matt has been recognized by Databricks as one of their top 10 innovators.
Why We Built Our Platform On DatabricksLinkedIn