DataRobot API resources > API user guide > Common use cases > Triage insurance claims

Triage insurance claims¶

This page outlines a use case that assesses claim complexity and severity as early as possible to optimize claim routing, ensure the appropriate level of attention, and improve claimant communications. This use case is captured in:

A Jupyter notebook that you can download and execute.
A UI-based business accelerator.

Business problem¶

Claim payments and claim adjustment are typically an insurance company’s largest expenses. For long-tail lines of business, such as workers’ compensation (which covers medical expenses and lost wages for injured workers), the true cost of a claim may not be known for many years until it is paid in full. However, claim adjustment activities start when a claim is made aware to the insurer. Typically when an employee gets injured at work (Accident Date), the employer (insured) decides to file a claim to its insurance company (Report Date) and a claim record is created in the insurer's claim system with all available information about the claim at the time of reporting. The claim is then assigned to a claim adjuster. This assignment could be purely random or based on roughly defined business rules. During the life cycle of a claim, assignment may be re-evaluated multiple times and re-assigned to a different claim adjuster. This process, however, has costly consequences:

It is well-known in insurance that 20% of claims account for 80% of the total claim payouts. Randomly assigning claims wastes resources.
Early intervention is critical to optimal claim results. Without the appropriate assignment of resources as early as possible, seemingly mild claims can become substantial.
Claims of low severity and complexity must wait to be processed alongside all other claims, often leading to a poor customer experience.
A typical claim adjuster can receive several hundred new claims every month, in addition to any existing open claims. When a claim adjuster is overloaded, it is unlikely they can process every assigned claim. If too much time passes, the claimant is more likely to obtain an attorney to assist in the process, driving up the cost of the claim unnecessarily.

Solution value¶

Challenge: Help insurers assess claim complexity and severity as early as possible so that:
- Claims of low severity and low complexity are routed to straight-through-processing, avoiding the wait and improving the customer experience.
- Claims of high complexity get the required attention of experienced claim adjusters and nurse case managers.
- The improved communication between claimants and the insured leads to minimized attorney involvement.
- The transfer of knowledge between experienced and junior adjusters is improved.
Desired Outcome
- Reduce loss adjustment expenses by more efficiently allocating claim resources.
- Reduce claims’ costs by effectively assigning nurse case managers and experienced adjusters to claims that they can impact the most.
How can DataRobot help?
- Machine learning models using claim- and policy-level attributes at First Notice of Loss (FNOL) can help you understand the complicated relationship between claim severity and various policy attributes at an early stage of a claim's life cycle. Model predictions are used to rank new claims from least severe to most severe. Thresholds can be determined by the business based on the perceived level of low-, medium-, high-severity or volume of claims that a claim adjuster's bandwidth can handle. You can also create thresholds based on a combination of claim severity and claim volume. Use these thresholds and model predictions to route claims in an efficient manner.

Topic	Description
Use case type	Insurance / Claim Triage
Target audience	Claim adjusters
Metrics / KPIs	False positive/negative rate Total expense savings (in terms of both labor and more accurate adjudication of claims) Customer satisfaction
Sample dataset	Download here

Problem framing¶

A machine learning model learns complex patterns from historically observed data. Those patterns can be used to make predictions on new data. In this use case, historical insurance claim data is used to build the model. When a new claim is reported, the model makes a prediction on it. Depending on how the problem is framed, the prediction can have different meanings. The goal of this claim triage use case is to have a model evaluate the workers' compensation claim severity as early as possible, ideally at the moment a claim is reported (the first notice of loss, or FNOL). The target feature is related to the total payment for a claim and the modeling unit is each individual claim. When the total payment for a claim is treated as the target, the use case is framed as a regression problem because you are predicting a quantity. The predicted total payment can then be compared with thresholds for low and high severity claims defined by business need, which classifies each claim as low-, medium-, or high-severity. Alternatively, you can frame this use case as a classification problem. To do so, apply the aforementioned thresholds to the total claim payment first and convert it to a categorical feature with levels "Low", "Medium" and "High". You can then build a classification model that uses this categorical variable as the target. The model instead predicts the probability a claim is going to be low-, medium- or high-severity. Regardless how the problem is framed, the ultimate goal is to route a claim appropriately.

ROI estimation¶

For this use case, direct return on investment (ROI) comes from improved claim handling results and expense savings. Indirect ROI stems from improved customer experience which in turn increases customer loyalty. The steps below focus on the direct ROI calculation based on the following assumptions:

10,000 claims every month
Category I: 30% (3000) of claims are routed to straight through processing (STP)
Category II: 60% (6000) of claims are handled normally
Category III: 10% (1000) of claims are handled by experienced claim adjusters
Average Category I claim severity is 250 without the model; 275 with the model
Average Category II claim severity is 10K without the model; 9500 with the model
Saved labor: 3 full-time employees with an average annual salary of 65000

Total annual ROI = 65000 x 3 + [3000 x (250-275) + 1000 x (10000 - 9500)] x 12 = $5295000

Working with data¶

The sample data for this use case is a synthetic dataset from a worker compensation insurer's claims database, organized at the individual claim level. Most claim databases in an insurance company contain transactional data, i.e., one claim may have multiple records in the claims database. When the claim is first reported, a claim is recorded in the claims systems and initial information about the claim is recorded. Depending on the insurer's practice, a case reserve may be set up. The case reserve is adjusted accordingly when claim payments are made or additional information collected indicates a need to change the case reserve. Policy-level information can be predictive as well. This type of information includes class, industry, job description, employee tenure, size of the employer, and whether there is a return to work program. Policy attributes should be joined with the claims data to form the modeling dataset, although they are ignored in this example. When it comes to claim triage, insurers would like to know as early as possible how severe a claim potentially is, ideally at the moment a claim is reported (FNOL). However, an accurate estimate of a claim's severity may not be feasible at FNOL due to insufficient information. Therefore, in practice, a series of claim triage models are needed to predict the severity of a claim at different stages of that claim's life cycle, e.g., FNOL, 30 days, 60 days, 90 days, etc. For each of the models, the goal is to predict the severity of a claim; therefore, the target feature is the total payment on a claim. The features included in the training data are the claim attributes and policy attributes at different snapshots. For example, for an FNOL model, features are limited to what is known about a claim at FNOL. For insurers still using legacy systems which may not record the true FNOL data, an approximation is often made between 0-30 days.

Features overview¶

The following table outlines the prominent features in the sample training dataset.

Feature Name	Data Type	Description	Data Source
ReportingDelay	Numeric	Number of days between the accident date and report date	Claims
AccidentHour	Numeric	Time of day that the accident occurred	Claims
Age	Numeric	Age of claimant	Claims
Weekly Rate	Numeric	Weekly salary	Claims
Gender	Categorical	Gender of the claimant	Claims
Marital Status	Categorical	Whether the claimant is married or not	Claims
HoursWorkedPerWeek	Numeric	The usual number of hours worked per week by the claimant	Claims
DependentChildren	Numeric	Claimant's number of dependent children	Claims
DependentsOther	Numeric	Claimant's number of dependents who are not children	Claims
PartTimeFullTime	Numeric	Whether the claimant works part time or full time	Claims
DaysWorkedPerWeek	Numeric	Number of days per week worked by the claimant	Claims
DateOfAccident	Date	Date that the accident occurred	Claims
ClaimDescription	Text	Text description of the accident and injury	Claims
ReportedDay	Numeric	Day of the week that the claim was reported to the insurer	Claims
InitialCaseEstimate	Numeric	Initial case estimate set by claim staff	Claims
Incurred	Numeric	target : final cost of the claim = all payments made by the insurer	Claims

Demo¶

See the notebook outlining this use case here.

Updated April 15, 2025

Was this page helpful?

Great! Let us know what you found helpful.

What can we do to improve the content?

Thanks for your feedback!