Using RWD in Life Sciences Research: A View from Two Perspectives

Real-world data (RWD), combined with additional healthcare data sources, is an essential ingredient for many functions in the Life Sciences industry. From helping to inform the design of clinical studies to making data-driven commercialization decisions and informing an understanding of patient experiences in the real world through a health outcomes lens (e.g., HEOR studies), access to high-quality RWD can be a powerful research accelerator for discovering and making new medicines available to a wide range of patients. Aleksandra (Aleks) Petkova, Senior Product Manager for Life Sciences Data & AI, joined Kythera Labs one year ago. Aleks’s background in implementing digital solutions for clinical trials at a large international pharmaceutical company and conducting human subject studies in academic and clinical settings brings a deep familiarity with both pain points and opportunities when it comes to accessing and deriving value from various types of healthcare data. In this blog, Aleks shares her observations about working with different types of RWD, drug, and clinical trial data since joining Kythera.

A Life Sciences Perspective

Importance of Data Readiness for Life Sciences Research 

In a previous role at a large pharmaceutical company, I had the opportunity to work with several types of data, such as wearable devices, clinical study patient questionnaires, and physician-reports of patients’ functioning during clinical trial participation. Leveraging data to understand patient experience, whether from an efficacy, safety, or treatment access and compliance perspective, is of paramount importance in drug discovery and development. However, there are known challenges in the field of Life Sciences that are specifically related to collecting, sharing, and efficiently using large amounts of seemingly disparate patient or physician data. Since clinical trials are often complex (conducted globally by multiple partners), the data collection and standardization for analysis are not trivial and pose particular challenges leveraging data, state-of-the-art machine learning models, and statistical technologies to derive real-time patient insights and solutions. The amount of time spent waiting to access data and preparing data for analysis is significant and requires considerable engineering, statistical, and data science resources, which can limit opportunities to develop data-driven patient or physician solutions quickly and at scale. 

Data science and AI bring immense potential to accelerate Life Sciences research by improving accuracy in disease identification, personalized treatment solutions, and patient outcomes through automated analysis of large healthcare datasets such as genomics, electronic health records (EHR), and medical claims. However, developing meaningful and trustworthy data science and AI solutions in the Healthcare and Life Sciences industries has a lot to do with the readiness and fidelity of available data. When working with patient-level data from clinical trials, for example, it is critical to maintain a high level of fidelity, trust, and protection to prevent the exposure of private or sensitive patient information. The same applies to real-world data sources, such as medical claims, EHR, and lab data, where the de-identification and protection of patient-level data is crucial. For these reasons, even within the same Healthcare or Life Sciences organization, data sources might be siloed across therapy areas or individual sectors to reduce the risk of unauthorized access or misuse. 

The importance of adhering to the FAIR (findable, accessible, interoperable, and reusable) data principles in Life Sciences research and development (R&D) has gained increasing importance as the need to reduce data silos and shorten the time spent in data preparation underpin making important discoverings for patients and leveraging the latest data science and AI approaches.

Data Readiness is a Collaborative and Sometimes Costly Effort

The tasks of setting up data ingestion, formatting pipelines, and ensuring that data is regularly refreshed aren’t trivial and can help pinpoint unique opportunities to make an immense impact on Life Science research. Data collection, processing, storage, and governance often require the contributions of multi-expert teams to ensure data is ready and safe to use when, for example, developing a machine learning algorithm that predicts the likelihood of experiencing an adverse reaction to a new treatment. 

In Life Sciences  R&D, once data for a clinical trial or a specific exploratory project is prepared and approved for analysis, multiple stakeholders, including data engineers, statisticians, and data scientists, interact with the data. This process isn’t always easily accomplished as disparate technical infrastructures and data silos can cause bottlenecks in data sharing and movement. For Life Sciences companies, any delays in deriving value from patient data can negatively impact the speed with which new medicines reach patients in need. 

Data Access and Readiness at Kythera

Since joining Kythera, working with large datasets has been incredibly efficient and intuitive. When I first joined Kythera, I had the usual trepidation before getting my hands on the data, especially since our dataset contains billions of claims and other data sources, like EHR, drug, and lab data. The very first thing I noticed was my  access to FAIR, RWD data on day one, and that this was not only true for me as a new employee but also for all of our customers.”

The Wayfinder platform is built on  Databricks, so Kythera employees and customers alike have access to all the data tools built into the platform with the added benefit of using all the functionality of Wayfinder, like data confirmation, standardization, de-identification, and tokenization. This makes accessing regularly refreshed and standardized data extremely intuitive and easy. In addition, Wayfinder is highly interoperable and extensible, which means our customers can easily connect Kythera data and technology with their own. 

Reducing Uncertainty in Healthcare Data

The focus of Kythera has always been reducing uncertainty in healthcare data by empowering Healthcare and Life Sciences organizations to rapidly integrate, access, and analyze healthcare data with scale and speed. Our data science enhances data quality through a medallion architecture. Our architecture and processing technology create specific, progressive improvements to large volumes of Healthcare RWD as it moves through the Bronze, Silver, and Gold layers. We restructure and reformat data to produce data assets with consistent and unified structures, remove incorrect information, impute correct information, and infer the existence of missing healthcare events to increase confidence that our data is correct and complete. Life Sciences organizations can take advantage of the improvements in the Bronze, Silver, and Gold layers, including data cleaning, standardizing, de-identifying, uplifting, curating, and joining functionality to integrate data and analyze robust, holistic, and unique datasets at scale. 

For example, within the Silver layer, we join data to healthcare industry dimensional data to denormalize this relational information, speeding up queries by reducing the number of expensive “join” operations in those varied analyses. Patient token transformation tasks are automatically performed at scale so that it conforms with HIPAA privacy rules. Within the Gold Layer, data is organized in consumption-ready analytics for various projects such as building patient cohorts, drug commercialization, market intelligence assessments, etc. 

Analysis-Ready Data on Day 1

Kythera’s expertise in Extract, Transform, Load (ETL) on large datasets enables straightforward data integration using our pre-built pipeline orchestration (Airflow), reducing data engineering effort and time. Once I was ready to work with the data, I was honestly surprised by its quality. Kythera’s RWD is curated and organized in use-case-friendly data products, so I don’t have to spend time formatting and standardizing the data or doing a lot of complex data joins. For someone like me who wasn’t familiar with using claims data, I could easily get to work using data assets that are already curated. Now, I can efficiently track a patient's journey for a specific type of breast cancer, for example, because of how our RWD data is curated and organized. The quality and thoroughness of our RWD data enable scientific exploration across a wide range of use cases in the Life Sciences industry and, importantly, allow for pursuing opportunities to create products and applications that have a tangible impact on both the Life Sciences industry and patients.

Another big benefit is having all the data located in one place and organized in a user-friendly data catalog that follows FAIR data principles. It is quite easy to select data I want to explore (through Unity Catalog), see exactly when the data was last updated, and track the data lineage. What is particularly helpful is the embedded metadata (in Unity Catalog), so I don’t have to look at a dictionary that is separate from the data…it's right there in my table. And when I want to share the data and work collaboratively, I can define who can access the data right in Wayfinder. 


When it comes to accessing and using big healthcare datasets, my experience at Kythera has been an eye-opener. I am thrilled with the efficiencies and the results of using clean data and Wayfinder. I like the fact that I can open up Wayfinder and get right to the important work of harnessing the power of healthcare data to create solutions that have a measurable positive impact. Kythera’s data and technology truly allow for democratizing healthcare data access. 

We recognize that many Life Sciences customers have invested in and adopted systems and infrastructures. Building our data and technology interoperable by default allows customers to integrate with their existing solutions to meet their unique data access needs. Our expanding data offerings and product line is driven by a unique combination of deep Healthcare and Life Sciences expertise and lots of curiosity and humility. We leverage data and technology solutions to help our clients solve their most challenging scientific and business needs with the same tenacity, professionalism, and integrity with which we would analyze data and develop new technology to help a loved one in need.” 

If you’d like to find out more about our technology and data products, connect with me on LinkedIn.

Using RWD in Life Sciences Research: A View from Two PerspectivesLinkedIn

Aleksandra Petkova, Ph.D.

Senior Product Manager

Aleks is a Product Management leader at Kythera Labs, creating AI and data products for Life Sciences leveraging clinical, biological, and real-world data. With a digital health background in pharma R&D, Aleks is passionate about accelerating drug development and enhancing patient experience. A certified yoga instructor, she finds inspiration, focus, and rest on the yoga mat.
Using RWD in Life Sciences Research: A View from Two PerspectivesLinkedIn