Kythera Labs

Health systems today operate in an environment defined by margin uncertainty, referral instability, site-of-care migration, and expanding downside risk. The data to manage these forces already exists in claims feeds, EHR data, and operational systems.

The challenge isn’t how much data exists. It’s the lack of coherence across it. Claims data is high-volume and high-entropy: fragmented across clearinghouses and payers, refreshed at different cadences, filled with inconsistent identifiers, and constantly shifting due to enrollment churn. Even when claims are technically correct, billing artifacts can obscure the true clinical interaction, requiring additional layers of normalization and event reconstruction before meaningful analysis is possible. If you layer AI or dashboards on top of that instability, you don’t get intelligence. You get amplified confusion.

At Kythera, we take a different approach. Before AI. Before dashboards. Before analytics. We fix the foundation. Our foundation is built on the Databricks Lakehouse, enabling something far more powerful than reporting: agentic AI.

Why Agentic AI Requires a Governed Lakehouse

AI agents can reason only as well as the governed data layer that supplies their context. Agentic AI doesn’t just retrieve information; it reasons across datasets, investigates anomalies, and guides action through conversational workflows. For that to work in healthcare, the underlying data must be:

Identity-resolved
Coverage-aware
Versioned and reproducible
Governed and lineage-traceable

That’s why Kythera built its platform on Databricks. Spark provides distributed compute to process billions of claims records efficiently. Delta Lake provides ACID-compliant storage, time travel, and versioning, which is essential when transformation logic evolves. Spark Declarative Pipelines allow us to define modular, recomputable transformation layers that can be updated without destabilizing downstream systems. Without this architecture, agentic AI becomes guesswork. With it, AI becomes operational.

Build the Foundation Before the Intelligence

Open claims frequently represent the same patient in multiple ways across clearinghouses and payer systems. Enrollment files may be missing. Payer classification can drift. Provider hierarchies change mid-year. Before AI agents can reason about referral leakage or risk performance, the system must reduce entropy. Inside the Databricks Lakehouse, Kythera ingests raw claims into Delta tables, normalizes identifiers, and executes Spark-based patient mastering to construct a unified patient spine. Because this logic runs inside Spark Declarative Pipelines, it is versioned, governed, and reproducible. When matching rules improve or payer classification logic evolves, we recompute deterministically.

Another critical layer in this foundation is the event layer, which resolves fragmented claims artifacts into analytically meaningful healthcare events and reconstructs coherent patient care journeys.

A single clinical encounter can generate multiple billing artifacts across facilities, professional claims, clearinghouses, and payer systems. Provider identifiers may reflect billing entities rather than the clinically relevant referring physician, and site-of-care indicators may reflect billing structure rather than the true care setting.

Within the Lakehouse, Kythera resolves these artifacts into normalized care events that represent the underlying clinical interaction. This process consolidates fragmented claims, reconciles conflicting metadata, and applies contextual logic to determine the most analytically relevant attributes such as referring provider, servicing provider, and site of care.

By resolving claims into events before downstream analytics, the platform removes a significant reasoning burden from the AI agent. Without this layer, agents would need to interpret claims billing logic and reconcile conflicting information during analysis—tasks that typically require deep claims-processing expertise. The event layer ensures that AI agents reason over coherent clinical journeys rather than fragmented billing artifacts, resulting in more reliable strategy insights.

From Static Dashboards to Conversational Workflows

Traditional healthcare analytics looks like this: Data flows into dashboards that require manual analysis and delayed action. Agentic AI flips that model: data flows into the governed Lakehouse, enabling AI agents to power conversational workflows that generate immediate strategic guidance.

Instead of reviewing referral leakage in a quarterly report, a strategy leader can ask:

“Which independent PCPs have shifted cardiology referrals in the past 60 days, and what is the projected financial impact under our MA contracts?”

The AI agent, operating within Databricks, does not simply retrieve a precomputed metric. Instead, it reconstructs patient journeys, validating coverage continuity before calculating leakage. It maps referral edges using identity-resolved provider hierarchies and explains drivers of change in natural language. It is not a static BI query but rather dynamic reasoning over governed data products.

Because the underlying Lakehouse preserves lineage and versioning, every answer can be traced back to source data and transformation logic. In healthcare, that auditability matters.

Referral Intelligence as a Live Agentic Use Case

Referral instability is one of the most financially sensitive signals in a health system. But traditional reporting often surfaces leakage months after behavior shifts. Within Databricks, agentic AI can:

Continuously monitor referral patterns
Detect early site-of-care migration
Identify payer steerage patterns
Flag high-influence PCPs driving network instability
Surface revenue recovery opportunities tied to specialty drift

The agent doesn’t just highlight a metric. It investigates why it moved.

Why Databricks Is Critical for Agentic AI

Agentic AI requires three things: scalable compute, reliable data integrity, and modular, recomputable transformation logic. Databricks delivers all three. Spark allows distributed computation across multi-year claims histories and complex referral graph structures, enabling analysis across billions of healthcare transactions.

Delta Lake ensures ACID guarantees, schema enforcement, and time travel so AI reasoning is grounded in stable datasets. Spark Declarative Pipelines enable modular data products, separating identity mastering, coverage modeling, referral computation, and financial validation into governed layers.

An additional governed layer within this architecture is the event resolution layer. This layer reconciles discrepancies across billing artifacts, provider identifiers, and site-of-care indicators to construct a consistent representation of clinical activity.

The importance of this layer is not merely technical cleanliness. It directly improves the reasoning capability of agentic AI. By pre-resolving claims into coherent care events, Kythera ensures that agents reason over structured clinical narratives rather than raw billing transactions. This architectural step dramatically improves analytical reliability while reducing the implicit requirement that the AI agent possess deep claims adjudication expertise. In effect, the Lakehouse transforms technically correct but analytically misleading claims artifacts into trustworthy data products that support strategic reasoning.

This modularity is essential. It allows AI agents to reason over curated, purpose-built data products rather than raw feeds. In other words, Databricks turns claims chaos into structured, trustworthy context, and context is what makes agentic AI viable.

Reducing SaaS Sprawl Through Lakehouse + AI/BI

Many health systems rely on multiple SaaS tools for referral reporting, risk and leakage analysis. Each adds cost and complexity, duplicating data pipelines. By centralizing ingestion, mastering, coverage modeling, and referral computation inside Databricks, Kythera enables AI/BI capabilities to operate directly on harmonized data.

This reduces reliance on siloed platforms while increasing revenue visibility. The Lakehouse becomes the single governed substrate, and agentic AI becomes the interface.

From Reactive Reporting to Autonomous Revenue Orchestration

The most meaningful shift is cultural. When governed, Lakehouse architecture meets agentic AI, and data teams move from reactive reporting to proactive orchestration. Instead of building reports after revenue erosion appears, AI agents investigate emerging patterns continuously. They can do things like surface early M&A exposure gaps, identify revenue recovery opportunities, and explain referral instability in plain language. And because the logic is built on versioned, lineage-aware Delta tables, the answers are defensible.

Architecture Before Intelligence

At Kythera, we’re not layering AI on broken pipelines. We’re building AI agents on top of a harmonized, governed Lakehouse architecture. In healthcare, intelligence doesn’t emerge from data alone- it emerges from the architecture that makes that data understandable.

Want to learn more? Book a meeting with the Kythera team!

‍

Autonomous AI Requires Architecture: Inside Kythera’s Lakehouse Strategy

Why Agentic AI Requires a Governed Lakehouse

Build the Foundation Before the Intelligence

From Static Dashboards to Conversational Workflows

Referral Intelligence as a Live Agentic Use Case

Why Databricks Is Critical for Agentic AI

Reducing SaaS Sprawl Through Lakehouse + AI/BI

From Reactive Reporting to Autonomous Revenue Orchestration

Architecture Before Intelligence

Ryan Leurck

Open and Closed Claims: Why Health Systems Must Master Both

Kythera Labs and Doceree Partner to Improve Data-Driven Engagement with Healthcare Professionals

Autonomous AI Requires Architecture: Inside Kythera’s Lakehouse Strategy

Learn More About Kythera