AI-Driven Personalized Data Dictionaries

When working with large amounts of data, it is imperative to understand what data you have, where it resides, and how to access it. A data dictionary catalogs all the information about the data; essentially, it is a guide to databases, tables, columns, and fields. Data dictionaries are more than data descriptions – they show relationships among data elements and are a single source of truth. Data dictionaries standardize the data format, no matter the data type, to minimize misunderstanding about the data attributes and are a critical tool for effectively using complex data and enormous datasets (like healthcare claims).

Kythera Labs maintains a database of over 44 billion rows of healthcare claims data, including our enriched datasets and other real-world data types. Our data products, accessed through Wayfinder, our data transformation platform, are curated and configured for use cases confronted by Life Sciences organizations and Healthcare Providers. We know that solving for these use cases and answering unasked questions requires many data inputs and access to more than just dashboards. We intentionally provide and encourage our customers to access and use the data best fit for their purpose. This deep, detailed access is why we incorporated AI capabilities into developing an enriched data dictionary. This dictionary not only describes the data it also helps users reap the most value from the data for their analysis.

Defining Personas and Market Segments

The first step in enriching a data dictionary involves defining personas and understanding market segments. In the Healthcare industry, we need to know who is buying the data and what they care about. What questions will be asked and answered by the data? How do they want to use this data? This process involves a deep dive into user profiles, market research, and talking with our customers to understand their needs.

Providing Context

Once we clearly understand our users’ needs, we delve into the data itself. This deep dive involves developing a detailed description of each column in our dataset, providing context on what the data represents. For example, when considering medical claims data, “diagnosis_code_1 column” represents a string value that contains the diagnosis code associated with a specific medical claim. This code is used to identify the medical condition or disease for which the claim was submitted. It provides a standardized way to categorize and track different diagnoses within the dataset and is a common language for healthcare professionals and researchers when communicating and analyzing data. Adding context is crucial, reshaping the data from a string of numbers into a rich source of information, with each column telling a story or revealing a trend.

Personalizing Data

With a clear understanding of our users, their use cases, and our data, we can ask our AI to provide details of a specific column in the context of a specific market segment or persona. This is where the magic happens. We are not just providing raw data; we are providing personalized, meaningful insights. This process doesn't just enrich our data dictionary; it transforms it into a powerful tool that can speed time to insights.

Let's look at a few examples of how Life Sciences and Healthcare Provider users may benefit from our personalized approach to the data dictionary.

Healthcare Providers

As an analyst at a Healthcare Provider, your role includes analyzing data to reveal insights about referral patterns, market opportunities, landscape assessments, and competitive analysis. The field "diagnosis_code_1" can be useful in understanding and categorizing patient diagnoses. By analyzing this field, you can identify diagnosis patterns, track the frequency of specific conditions or diseases, and analyze the utilization of various healthcare services based on these diagnoses. This information can be valuable in assessing patient needs, identifying areas for improvement in healthcare delivery, and understanding the market dynamics.

By combining the "diagnosis_code_1" field with other relevant data, such as patient demographics, provider information, and referral data, you can gain insights into patient treatment patterns, identify the doctors they visit, and the overall market share of competing healthcare providers. This analysis can help inform decision-making processes, optimize resource allocation, and enhance the quality of patient care.

This illustration shows how refining the comment column with AI adds contextual information specific to a persona, in this case, a Healthcare Provider Analyst.

‍

Life Sciences

As a Life Sciences data analyst focused on researching patient outcomes, post-clinical trial tracking, and patient cohorts by disease, you can utilize the "diagnosis_code_1" field to analyze and track patients based on their diagnosed conditions. Examining the diagnosis codes allows you to identify patients with specific diseases or conditions of interest and study their outcomes, treatment responses, and disease progression.

As you work with the "diagnosis_code_1" field, you can apply various data analysis techniques to gain insights into patient outcomes. For example, you can create patient cohorts based on specific diagnosis codes to compare their response to different treatments or interventions. Additionally, you can track patient outcomes over time, examining factors such as disease progression, treatment efficacy, and overall survival rates.

By refining the comment column with AI, we add contextual persona-specific information, in this case, Life Sciences Data Analyst.

Conclusion

Applications for AI in Healthcare data analytics grow daily, and a well-designed data dictionary is vital to ensuring you are accessing the correct data and inputs.

Kythera Labs uses AI to make our data and analytic products more efficient. In this case, enriching and personalizing our data dictionary based on the end user is a powerful approach to data management, ensuring that our data is comprehensive, accurate, relevant, and meaningful to those who use it. Want to learn more about our approach to getting work done faster and smarter? Get in touch at grant.stubblefield@kytheralabs.com or connect with me on LinkedIn.

Published:

April 17, 2024

Grant Stubblefield

Director of Product

Grant is a Product Management leader at Kythera Labs, focusing on developing products that deliver insights from public and private data. Having supported 30+ US payers to solve challenges in claims adjudication, member enrollment, and provider data management, Grant has deep experience and understanding of the technical challenges facing the Healthcare industry.

Leveraging Serverless Cloud Computing And Data Sharing For Favorable ESG Opportunities

Guest author Cody A. Hyman, PhD, discusses how innovative technologies like Serverless Compute and Delta Sharing assist companies in meeting evolving ESG requirements by reducing carbon footprints, optimizing resources, and enhancing sustainability.

July 16, 2024

Five Takeaways from the Databricks' Data + AI Summit

Kythera Labs Product leaders share their key takeaways from the 2024 Databricks Data+AI Sumit.

July 2, 2024

Using RWD in Life Sciences Research: A View from Two Perspectives

Aleksandra Petkova, PhD, Kythera Labs’ Senior Product Manager for Life Sciences Data & AI, Shares Her Experiences Accessing Analysis-Ready RWD

June 7, 2024