Skip to content

Click in-app to access the full platform documentation for your version of DataRobot.

Triage insurance claims

This page outlines a use case that assesses claim complexity and severity as early as possible to optimize claim routing, ensure the appropriate level of attention, and improve claimant communications. This use case is captured in:

Business problem

Claim payments and claim adjustment are typically an insurance company’s largest expenses. For long-tail lines of business, such as workers’ compensation (which covers medical expenses and lost wages for injured workers), the true cost of a claim may not be known for many years until it is paid in full. However, claim adjustment activities start when a claim is made aware to the insurer.

Typically when an employee gets injured at work (Accident Date), the employer (insured) decides to file a claim to its insurance company (Report Date) and a claim record is created in the insurer's claim system with all available information about the claim at the time of reporting. The claim is then assigned to a claim adjuster. This assignment could be purely random or based on roughly defined business rules. During the life cycle of a claim, assignment may be re-evaluated multiple times and re-assigned to a different claim adjuster.

This process, however, has costly consequences:

  • It is well-known in insurance that 20% of claims account for 80% of the total claim payouts. Randomly assigning claims wastes resources.

  • Early intervention is critical to optimal claim results. Without the appropriate assignment of resources as early as possible, seemingly mild claims can become substantial.

  • Claims of low severity and complexity must wait to be processed alongside all other claims, often leading to a poor customer experience.

  • A typical claim adjuster can receive several hundred new claims every month, in addition to any existing open claims. When a claim adjuster is overloaded, it is unlikely they can process every assigned claim. If too much time passes, the claimant is more likely to obtain an attorney to assist in the process, driving up the cost of the claim unnecessarily.

Solution value

  • Challenge: Help insurers assess claim complexity and severity as early as possible so that:

    • Claims of low severity and low complexity are routed to straight-through-processing, avoiding the wait and improving the customer experience.
    • Claims of high complexity get the required attention of experienced claim adjusters and nurse case managers.
    • The improved communication between claimants and the insured leads to minimized attorney involvement.
    • The transfer of knowledge between experienced and junior adjusters is improved.
  • Desired Outcome

    • Reduce loss adjustment expenses by more efficiently allocating claim resources.
    • Reduce claims’ costs by effectively assigning nurse case managers and experienced adjusters to claims that they can impact the most.
  • How can DataRobot help?

    • Machine learning models using claim- and policy-level attributes at First Notice of Loss (FNOL) can help you understand the complicated relationship between claim severity and various policy attributes at an early stage of a claim's life cycle. Model predictions are used to rank new claims from least severe to most severe. Thresholds can be determined by the business based on the perceived level of low-, medium-, high-severity or volume of claims that a claim adjuster's bandwidth can handle. You can also create thresholds based on a combination of claim severity and claim volume. Use these thresholds and model predictions to route claims in an efficient manner.
Topic Description
Use case type Insurance / Claim Triage
Target audience Claim adjusters
Metrics / KPIs
  • False positive/negative rate
  • Total expense savings (in terms of both labor and more accurate adjudication of claims)
  • Customer satisfaction
Sample dataset Download here

Problem framing

A machine learning model learns complex patterns from historically observed data. Those patterns can be used to make predictions on new data. In this use case, historical insurance claim data is used to build the model. When a new claim is reported, the model makes a prediction on it.

Depending on how the problem is framed, the prediction can have different meanings. The goal of this claim triage use case is to have a model evaluate the workers' compensation claim severity as early as possible, ideally at the moment a claim is reported (the first notice of loss, or FNOL). The target feature is related to the total payment for a claim and the modeling unit is each individual claim.

When the total payment for a claim is treated as the target, the use case is framed as a regression problem because you are predicting a quantity. The predicted total payment can then be compared with thresholds for low and high severity claims defined by business need, which classifies each claim as low-, medium-, or high-severity.

Alternatively, you can frame this use case as a classification problem. To do so, apply the aforementioned thresholds to the total claim payment first and convert it to a categorical feature with levels "Low", "Medium" and "High". You can then build a classification model that uses this categorical variable as the target. The model instead predicts the probability a claim is going to be low-, medium- or high-severity.

Regardless how the problem is framed, the ultimate goal is to route a claim appropriately.

ROI estimation

For this use case, direct return on investment (ROI) comes from improved claim handling results and expense savings. Indirect ROI stems from improved customer experience which in turn increases customer loyalty. The steps below focus on the direct ROI calculation based on the following assumptions:

  • 10,000 claims every month
  • Category I: 30% (3000) of claims are routed to straight through processing (STP)
  • Category II: 60% (6000) of claims are handled normally
  • Category III: 10% (1000) of claims are handled by experienced claim adjusters
  • Average Category I claim severity is 250 without the model; 275 with the model
  • Average Category II claim severity is 10K without the model; 9500 with the model
  • Saved labor: 3 full-time employees with an average annual salary of 65000

Total annual ROI = 65000 x 3 + [3000 x (250-275) + 1000 x (10000 - 9500)] x 12 = $5295000

Working with data

The sample data for this use case is a synthetic dataset from a worker compensation insurer's claims database, organized at the individual claim level. Most claim databases in an insurance company contain transactional data, i.e., one claim may have multiple records in the claims database. When the claim is first reported, a claim is recorded in the claims systems and initial information about the claim is recorded. Depending on the insurer's practice, a case reserve may be set up. The case reserve is adjusted accordingly when claim payments are made or additional information collected indicates a need to change the case reserve.

Policy-level information can be predictive as well. This type of information includes class, industry, job description, employee tenure, size of the employer, and whether there is a return to work program. Policy attributes should be joined with the claims data to form the modeling dataset, although they are ignored in this example.

When it comes to claim triage, insurers would like to know as early as possible how severe a claim potentially is, ideally at the moment a claim is reported (FNOL). However, an accurate estimate of a claim's severity may not be feasible at FNOL due to insufficient information. Therefore, in practice, a series of claim triage models are needed to predict the severity of a claim at different stages of that claim's life cycle, e.g., FNOL, 30 days, 60 days, 90 days, etc.

For each of the models, the goal is to predict the severity of a claim; therefore, the target feature is the total payment on a claim. The features included in the training data are the claim attributes and policy attributes at different snapshots. For example, for an FNOL model, features are limited to what is known about a claim at FNOL. For insurers still using legacy systems which may not record the true FNOL data, an approximation is often made between 0-30 days.

Features overview

The following table outlines the prominent features in the sample training dataset.

Feature Name Data Type Description Data Source
ReportingDelay Numeric Number of days between the accident date and report date Claims
AccidentHour Numeric Time of day that the accident occurred Claims
Age Numeric Age of claimant Claims
Weekly Rate Numeric Weekly salary Claims
Gender Categorical Gender of the claimant Claims
Marital Status Categorical Whether the claimant is married or not Claims
HoursWorkedPerWeek Numeric The usual number of hours worked per week by the claimant Claims
DependentChildren Numeric Claimant's number of dependent children Claims
DependentsOther Numeric Claimant's number of dependents who are not children Claims
PartTimeFullTime Numeric Whether the claimant works part time or full time Claims
DaysWorkedPerWeek Numeric Number of days per week worked by the claimant Claims
DateOfAccident Date Date that the accident occurred Claims
ClaimDescription Text Text description of the accident and injury Claims
ReportedDay Numeric Day of the week that the claim was reported to the insurer Claims
InitialCaseEstimate Numeric Initial case estimate set by claim staff Claims
Incurred Numeric target : final cost of the claim = all payments made by the insurer Claims


See the notebook outlining this use case here.

Updated November 3, 2022
Back to top