Underwriting Technology12 min read

Predictive Underwriting Models: How Health Screening Data Feeds AI

How predictive underwriting models use health screening data and machine learning to reshape life insurance risk assessment and mortality pricing.

ayhealthbenefits.com Research Team·April 4, 2026

Predictive Underwriting Models: How Health Screening Data Feeds AI

Predictive underwriting models have gone from experimental side projects to production systems that handle real policy decisions at scale. The shift happened faster than most actuaries expected. Five years ago, a carrier using machine learning in underwriting was doing something novel. Today, a carrier not using it is falling behind. What changed is the data. Specifically, health screening data from digital sources — electronic health records, prescription histories, lab databases, and increasingly, contactless biometric capture — gave these models something meaningful to learn from. Without rich input data, even a sophisticated algorithm produces mediocre risk predictions. With it, the math starts to work.

"Each digital underwriting evidence source individually reduced mortality slippage, with EHRs showing the largest single-source impact. Combined sources produced greater mortality improvement than any source alone." — RGA, "Assessing Mortality Impact of Digital Underwriting Evidence" (2025)

How predictive underwriting models actually work

The term "predictive model" gets used loosely in insurance marketing, so it's worth being specific about what's happening under the hood. At its core, a predictive underwriting model is a supervised learning system trained on historical policy data where the outcomes (claims, lapses, mortality events) are known. The model learns which combinations of input variables — health data, demographics, behavioral signals — correlate with specific risk outcomes.

RGA published a primer on predictive modeling for life underwriting that breaks this down clearly. The traditional approach uses generalized linear models (GLMs), which have been actuarial workhorses for decades. They're interpretable, well-understood, and regulators are comfortable with them. But GLMs struggle with non-linear relationships and high-dimensional feature interactions, which is exactly what health screening data produces.

Machine learning methods like gradient-boosted trees and random forests handle this better. They can find patterns across hundreds of variables simultaneously without the modeler needing to specify each interaction in advance. RGA's research on multivariable mortality modeling, published in the Longevity Bulletin, describes how survival analysis frameworks are being extended with these methods to produce more granular mortality estimates than traditional table-based approaches.

The catch is interpretability. A gradient-boosted tree might produce excellent predictions, but explaining to a regulator or applicant why a particular decision was made requires additional tooling. Explainable AI methods like SHAP values have become standard practice for bridging this gap.

Model type	Strengths	Weaknesses	Regulatory acceptance
Generalized linear models (GLMs)	Interpretable, well-understood, stable	Limited with non-linear relationships	High — long track record
Gradient-boosted trees (XGBoost, LightGBM)	Handles complex interactions, high accuracy	Less interpretable without SHAP/LIME	Growing — requires explainability layer
Random forests	Robust to overfitting, captures non-linearity	Can be slow at scale, hard to explain	Moderate — used more in research
Neural networks	Can model very complex patterns	Black box, requires large data, overfitting risk	Low — mostly experimental in insurance
Survival analysis + ML hybrid	Combines actuarial rigor with ML flexibility	Complex to implement and validate	Emerging — RGA and reinsurers exploring

What health screening data goes into these models

The quality of predictions depends entirely on what data feeds the model. Traditional underwriting relied on a narrow set: age, gender, smoking status, build (height/weight), and whatever a paramedical exam or attending physician statement revealed. Predictive models can use all of that plus substantially more.

Electronic health records provide diagnosed conditions, lab values, vital signs over time, medication lists, and visit frequency. LexisNexis Health Intelligence, which acquired Human API's health data platform in early 2025, now delivers formatted EHR summaries specifically designed for underwriting workflows. Their Medical Insights product extracts targeted health attributes — vitals, labs, flagged conditions — so underwriters and models don't have to parse raw clinical records.

Prescription drug histories from pharmacy benefit managers reveal medication patterns. Someone on three blood pressure medications is a different risk profile than someone on one. The data is consistently structured and widely available, making it one of the first data types most carriers integrate into predictive models.

Medical claims data captures procedure codes, diagnosis codes, and healthcare utilization patterns. It's less clinically detailed than EHRs but broader in coverage.

Then there's the newer category: contactless biometric data from remote photoplethysmography (rPPG). A smartphone camera captures heart rate, respiratory rate, and blood pressure indicators in real time, without any physical equipment. This data is interesting for predictive models because it's captured at the point of application — it reflects the applicant's current physiological state, not a historical record that may be months or years old.

Data source	Variables for modeling	Temporal coverage	Integration complexity
Electronic health records	Diagnoses, labs, vitals, medications, visit history	Years of history	Moderate — requires parsing, normalization
Prescription histories (Rx)	Drug names, dosages, fill dates, refill patterns	5-10 years typically	Low — structured, standardized
Medical claims	Procedure codes, diagnosis codes, utilization	3-7 years	Low to moderate
Paramedical exam	Point-in-time vitals, blood/urine labs	Single snapshot	Low — familiar format
Contactless biometric (rPPG)	HR, respiratory rate, BP indicators, HRV	Real-time capture	Low — API-based
Wearable data	Continuous HR, activity, sleep patterns	Weeks to months	High — consent, device variability

The mortality slippage problem predictive models are solving

Carriers adopted accelerated underwriting programs to speed up the application process, but skipping the paramedical exam created a known problem: mortality slippage. Some applicants who would have been rated up or declined under full underwriting slip through the accelerated path at standard rates.

Gen Re's 2024 U.S. Individual Life Accelerated Underwriting Survey, covering 38 carriers, found that 82% of participating carriers had implemented accelerated underwriting. The average throughput rate sat around 59%. Those numbers represent a massive volume of policies being issued without traditional medical evidence.

Predictive models address slippage by replacing the missing exam data with richer digital data. RGA's 2025 research tested three digital underwriting evidence sources — medical claims, their proprietary LabPiQture database, and EHRs. Each source reduced slippage individually. EHRs had the largest single-source impact. But combining all three sources outperformed any single source, which makes intuitive sense: each data type captures different aspects of health risk.

The practical implication for carriers is that predictive model performance is directly tied to data breadth. A model built on prescription data alone leaves risk on the table that EHR data or biometric data would catch. Carriers investing in multi-source data pipelines are seeing measurably better mortality outcomes in their accelerated programs.

Where the SOA and industry research stand

The Society of Actuaries has been tracking how predictive analytics are changing insurance through multiple channels. At the 2024 SOA Health Meeting, Session 1E focused specifically on AI-driven predictive analytics and their implications for stop-loss underwriting, examining how these models perform in group coverage contexts where individual risk assessment is different from individual life.

The SOA's Product Development Section published research in August 2024 on accelerated underwriting mortality slippage monitoring trends. The key finding: carriers are moving past simple pass/fail audits of their accelerated programs. Instead, they're building monitoring frameworks that track mortality outcomes by data source, underwriting path, and applicant characteristics. That shift matters. Carriers aren't just deploying predictive models anymore. They're measuring whether the models actually work.

Munich Re's late 2024 research noted that accelerated underwriting program structures have stabilized, but the data inputs keep expanding. Programs aren't getting redesigned annually anymore. What's changing is the volume and variety of health data feeding into existing frameworks.

Roots Automation's insurance AI analysis for 2026 predicts that AI-driven underwriting will move from decision-support to autonomous decision-making in lower-complexity cases. That's an important distinction. For straightforward applications where the data is clean and the risk profile is clear, the model makes the decision. For complex cases, the model flags and routes to a human underwriter. This tiered approach is where most carriers are heading.

Real-time health data as a model input

One area gaining traction is incorporating real-time physiological data at the point of application. Traditional health screening data is historical — it tells you what happened months or years ago. Real-time biometric capture through rPPG technology adds a present-tense data point.

For predictive models, this is useful because it fills a gap. An applicant's EHR might show normal blood pressure readings from a doctor visit eight months ago, but their current cardiovascular state could be different. A 30-second camera scan at the point of application captures heart rate variability, respiratory rate, and blood pressure indicators that reflect right now, not last year.

The modeling challenge is calibration. Real-time data from a single measurement has more noise than longitudinal clinical data. But when combined with historical sources, it adds signal that the model can use — particularly for applicants with sparse medical histories.

Behavioral and lifestyle signals

Beyond clinical health data, some carriers are experimenting with behavioral variables: credit-based insurance scores, motor vehicle records, and even consumer purchasing patterns. These are controversial from a fairness perspective, but they do correlate with mortality in actuarial studies.

The more interesting development is passive behavioral data from connected devices. Sleep patterns from a smartwatch, daily step counts, resting heart rate trends — this longitudinal behavioral data can reveal health trajectories that clinical snapshots miss. The problem is consent and consistency. Not everyone wears a fitness tracker, and the people who do may be systematically healthier, introducing selection bias into the model.

Current research and evidence

The academic side of predictive underwriting is producing useful work, though it tends to lag industry practice by a year or two. A 2024 paper in the Journal of Machine Learning for Health Data Sciences, titled "Machine Learning for Predictive Modeling in Life Insurance," examined how gradient-boosted models performed against traditional GLMs using EHR and claims data. The ML models consistently outperformed on discrimination (AUC) metrics, particularly for substandard risk identification.

RGA's multivariable mortality modeling research, published through the Institute and Faculty of Actuaries' Longevity Bulletin, provides a technical foundation for actuaries looking to understand how survival analysis and machine learning intersect. The paper walks through Cox proportional hazards models, their extensions, and how tree-based methods can capture interactions that Cox models miss.

Duck Creek Technologies' 2026 insurance analytics report highlights that predictive models in underwriting are moving toward real-time data streams rather than batch processing. Instead of scoring an application once with static data, carriers are building systems that can incorporate new data as it arrives — a lab result, a prescription fill, a biometric reading — and update the risk assessment dynamically.

VisioneerIT's 2026 analysis of data analytics in insurance projects that AI-driven analytics will move from decision-support to autonomous decision-making in lower-risk underwriting within the next two years. The report notes that carriers processing high volumes of term life applications are the most likely early adopters of fully automated underwriting decisions.

The future of predictive underwriting models

More data sources, faster processing, more automation for routine cases. That much is obvious. The harder questions are about the constraints.

Regulatory frameworks haven't caught up with the technology. Most state insurance departments evaluate underwriting on actuarial soundness and unfair discrimination standards written for traditional methods. How regulators handle black-box models, even with explainability layers, remains uncertain. Colorado's AI governance law, which took effect in 2024, represents one approach, but there's no federal standard.

Data privacy is another constraint. As models incorporate more personal health data, the consent and governance requirements grow. HIPAA covers clinical data, but what about rPPG readings captured by a phone app? The legal frameworks are still being drawn.

The biggest practical challenge is data quality and standardization. EHR data formats vary across health systems. Lab units differ between providers. Prescription histories from different PBMs have different structures. Carriers building predictive models spend more time on data engineering than on the actual modeling — a fact that surprises outsiders but is well-known inside the industry.

Companies like Circadify are working on the real-time data side of this problem, developing contactless biometric capture that can feed standardized physiological data directly into underwriting models through an API integration. As those pipelines mature, predictive models get access to physiological data that's hours old instead of months old.

Frequently asked questions

What is a predictive underwriting model?

A predictive underwriting model is a machine learning system trained on historical insurance data to estimate risk outcomes — mortality, morbidity, or lapse — based on applicant characteristics and health data. Unlike rule-based systems that follow explicit if-then logic, predictive models learn statistical patterns from large datasets and apply them to new applications.

How does health screening data improve predictive models?

Health screening data provides the input variables that models learn from. Richer data — EHRs, lab results, prescription histories, biometric readings — gives the model more signal to distinguish between risk levels. RGA's 2025 research showed that combining multiple digital health data sources produced better mortality outcomes than any single source, because each type captures different dimensions of health risk.

Are predictive underwriting models replacing human underwriters?

Not entirely, and likely not soon for complex cases. The industry is moving toward a tiered model: automated decisions for straightforward applications where the data is clean and the risk is clear, human review for complex cases with ambiguous data or unusual risk profiles. The SOA's 2024 research on accelerated underwriting monitoring reflects this hybrid approach.

What are the regulatory concerns with AI in underwriting?

The primary concerns are fairness (ensuring models don't discriminate based on protected characteristics), transparency (explaining how decisions are made), and data governance (managing sensitive health information appropriately). Colorado's 2024 AI governance law requires insurers to test for unfair discrimination in algorithmic decisions. Other states are watching that implementation closely before writing their own rules.

predictive underwritingmachine learning insurancehealth screening datamortality modeling

Back to Blog