Using RWD in Life Sciences Research: A View from Two Perspectives

Real-world data (RWD), combined with additional healthcare data sources, is an essential ingredient for many functions in the Life Sciences industry. From helping to inform the design of clinical studies to making data-driven commercialization decisions and informing an understanding of patient experiences in the real world through a health outcomes lens (e.g., HEOR studies), access to high-quality RWD can be a powerful research accelerator for discovering and making new medicines available to a wide range of patients.

A Life Sciences Perspective

Importance of Data Readiness for Life Sciences Research

In a previous role at a large pharmaceutical company, I had the opportunity to work with several types of data, such as wearable devices, clinical study patient questionnaires, and physician-reports of patients’ functioning during clinical trial participation. Leveraging data to understand the patient experience, whether from an efficacy, safety, or treatment access and compliance perspective, is of paramount importance in drug discovery and development. However, there are known challenges in the field of Life Sciences that are specifically related to collecting, sharing, and efficiently using large amounts of seemingly disparate patient or physician data.

Since clinical trials are often complex by design (and conducted globally by multiple partners), the data collection and standardization for analysis are not trivial. This complexity poses particular challenges in leveraging data, state-of-the-art machine learning, and statistical models to derive real-time patient insights and solutions. The amount of time spent waiting to access data and preparing data for analysis is significant and requires considerable engineering, statistical, and data science resources, which can limit opportunities to develop data-driven patient or physician solutions quickly and at scale.

Machine learning and AI bring immense potential to accelerate Life Sciences research by improving accuracy in disease identification, personalized treatment solutions, and patient outcomes through automated analysis of large healthcare datasets such as genomics, electronic health records (EHR), and medical claims. However, developing meaningful and trustworthy data science and AI solutions in the Healthcare and Life Sciences industries has a lot to do with the readiness and fidelity of available data. When working with patient-level data from clinical trials, for example, it is critical to maintain a high level of fidelity, trust, and protection to prevent the exposure of private or sensitive patient information. The same applies to real-world data sources, such as medical claims, EHR, and lab data, where the de-identification and protection of patient-level data is crucial. For these reasons, even within the same Healthcare or Life Sciences organization, data sources might be siloed across therapy areas or individual sectors to reduce the risk of unauthorized access or misuse.

The importance of adhering to the FAIR (findable, accessible, interoperable, and reusable) data principles in Life Sciences research and development (R&D) has gained increasing importance as the need to reduce data silos and time spent in data preparation underpin important discoveries for patients by leveraging the latest machine learning and AI approaches.

Data Readiness is a Collaborative and Sometimes Costly Effort

The tasks of setting up data ingestion, formatting pipelines, and ensuring that data is regularly refreshed aren’t trivial tasks for most life science companies. Data collection, processing, storage, and governance often require the contributions of multi-expert teams to ensure data is ready and safe to use.

In Life Sciences R&D, once data for a clinical trial or a specific exploratory project is prepared and approved for analysis, multiple stakeholders, including data engineers, statisticians, and data scientists, interact with the data. This process isn’t always easily accomplished as disparate technical infrastructures and data silos can cause bottlenecks in data sharing and movement. For Life Sciences companies, any delays in deriving value from patient data can negatively impact the speed with which new medicines reach patients in need. While a team might ultimately be focused on when, for example, developing a machine learning algorithm that predicts the likelihood of experiencing an adverse reaction to a new treatment, the data readiness steps need to take place first.

Data Access and Readiness at Kythera

Since joining Kythera, working with large datasets has been incredibly efficient and intuitive. When I first joined Kythera, I had the usual trepidation before getting my hands on the data, especially since our dataset contains billions of claims and other data sources, like EHR, drug, and lab data. The very first thing I noticed was my access to FAIR, RWD data on day one, and that this was not only true for me as a new employee but also for all of our customers.

The Wayfinder platform is built on Databricks, so Kythera employees and customers alike have access to all the data tools built into the platform with the added benefit of using all the functionality of Wayfinder, like data confirmation, standardization, de-identification, and tokenization. This makes accessing regularly refreshed and standardized data extremely intuitive and easy. In addition, Wayfinder is highly interoperable and extensible, which means our customers can easily connect Kythera data and technology with their own.

Reducing Uncertainty in Healthcare Data

The focus of Kythera has always been reducing uncertainty in healthcare data by empowering Healthcare and Life Sciences organizations to rapidly integrate, access, and analyze healthcare data with scale and speed. Our data science enhances data quality through a medallion architecture. Our architecture and processing technology create specific, progressive improvements to large volumes of Healthcare RWD as it moves through the Bronze, Silver, and Gold layers. We restructure and reformat data to produce data assets with consistent and unified structures, remove incorrect information, impute correct information, and infer the existence of missing healthcare events to increase confidence that our data is correct and complete. Life Sciences organizations can take advantage of the improvements in the Bronze, Silver, and Gold layers, including data cleaning, standardizing, de-identifying, uplifting, curating, and joining functionality to integrate data and analyze robust, holistic, and unique datasets at scale.

For example, within the Silver layer, we join data to healthcare industry dimensional data to denormalize this relational information, speeding up queries by reducing the number of expensive “join” operations in those varied analyses. Patient token transformation tasks are automatically performed at scale so that it conforms with HIPAA privacy rules. Within the Gold Layer, data is organized in consumption-ready analytics for various projects such as building patient cohorts, drug commercialization, market intelligence assessments, etc.

Analysis-Ready Data on Day 1

Kythera’s expertise in Extract, Transform, Load (ETL) on large datasets enables straightforward data integration using our pre-built pipeline orchestration), reducing data engineering effort and time. Once I was ready to work with data, I was honestly surprised by its quality. Kythera’s RWD is curated and organized in use-case-friendly data products, so I don’t have to spend time formatting and standardizing the data or doing a lot of complex data joins. For someone like me who wasn’t familiar with using claims data, I could easily get to work using data assets that are already curated. For example, I can quickly track a patient's journey for a specific type of breast cancer because of how our RWD data is curated and organized.

The quality and thoroughness of Kythera’s RWD data enable scientific exploration across a wide range of use cases in the Life Sciences industry and, importantly, allow for pursuing opportunities to create products and applications that have a tangible impact on both the Life Sciences industry and patients.

Another big benefit is having all the data located in one place and organized in a user-friendly data catalog that follows FAIR data principles. It is quite easy to select data to explore (through Unity Catalog), check exactly when the data was last updated, and track the data lineage. In addition, embedded metadata in Unity Catalog tables facilitates getting familiar with new data without having to reference a data dictionary off platform. Finally, access controls can easily be put in place to safeguard sensitive data on a per-user basis, even for users in the same organization.

Conclusion

When it comes to accessing and using big healthcare datasets, my experience at Kythera has been an eye-opener. I am thrilled with the efficiencies and the results of using clean data and Wayfinder. I like the fact that I can open up Wayfinder and get right to the important work of harnessing the power of healthcare data to create solutions that have a measurable positive impact. Kythera’s data and technology truly allow for democratizing healthcare data access.

At Kythera Labs, we recognize that many Life Sciences customers have invested in and adopted systems and infrastructures. Our data and technology are interoperable by default, allowing our customers to integrate with their existing solutions to meet their unique data access needs.

Kythera’s expanding Life Sciences product line is driven by a unique combination of deep Healthcare and Life Sciences expertise, scientific rigor in our approach, and a genuine ambition to help our Life Science clients accelerate their R&D with powerful data and technology solutions. At Kythera, we help our clients solve their most challenging scientific and business needs with the same tenacity, professionalism, and integrity we would approach leveraging healthcare data to help a loved one in need.

If you’d like to find out more about our technology and data products, connect with me on LinkedIn.

About Aleks

Aleksandra (Aleks) Petkova, Senior Product Manager for Life Sciences Data & AI, joined Kythera Labs one year ago. Aleks’s background in implementing digital solutions for clinical trials at a large international pharmaceutical company and conducting human subject studies in academic and clinical settings brings a deep familiarity with both pain points and opportunities when it comes to accessing and deriving value from various types of healthcare data.

Published:

June 7, 2024

Insights from Kythera’s OHDSI 2025 Showcase: Harnessing Multi-Source, OMOP-Conformed Real World Data in a Cloud Native Environment

Standardizing RWD Infrastructure for Life Sciences

January 13, 2026

Why Data Fidelity-Not Coverage-Is the Real Measure of Truth in Healthcare Intelligence

Solving for the Myth of More

January 7, 2026

Why the “Unknown Denominator” Is the Biggest Blind Spot in Healthcare Analytics and How to Fix It

Fixing the biggest blindspot in Healthcare Analytics

December 8, 2025