Skip to content

On-premise users: click in-app to access the full platform documentation for your version of DataRobot.

Fraudulent claim detection

This page outlines the use case to improve the accuracy in predicting which insurance claims are fraudulent. It is captured below as a UI-based walkthrough. It is also available as a Jupyter notebook that you can download and execute.

Business problem

Because, on average, it takes roughly 20 days to process an auto insurance claim (which often frustrates policyholders), insurance companies look for ways to increase the efficiency of their claims workflows. Increasing the number of claim handlers is expensive, so companies have increasingly relied on automation to accelerate the process of paying or denying claims. Automation can increase Straight-Through Processing (STP) by more than 20%, resulting in faster claims processing and improved customer satisfaction.

However, as insurance companies increase the speed by which they process claims, they also increase their risk of exposure to fraudulent claims. Unfortunately, most of the systems widely used to prevent fraudulent claims from being processed either require high amounts of manual labor or rely on static rules.

Solution value

While Business Rule Management Systems (BRMS) will always be required—they implement mandatory rules related to compliance—you can supplement these systems by improving the accuracy of predicting which incoming claims are fraudulent.

Using historical cases of fraud and their associated features, AI can apply learnings to new claims to assess whether they share characteristics of the learned fraudulent patterns. Unlike BRMS, which are static and have hard-coded rules, AI generates a probabilistic prediction and provides transparency on the unique drivers of fraud for each suspicious claim. This allows investigators to not only route and triage claims by their likelihood of fraud, but also enables them to accelerate the review process as they know which vectors of a claim they should evaluate. The probabilistic predictions also allow investigators to set thresholds that automatically approve or reject claims.

Problem framing

Work with stakeholders to identify and prioritize the decisions for which automation will offer the greatest business value. In this example, stakeholders agreed that achieving over 20% STP in claims payment was a critical success factor and that minimizing fraud was a top priority. Working with subject matter experts, the team developed a shared understanding of STP in claims payment and built decision logic for claims processing:

Step Best practice
Determine which decisions to automate. Automate simple claims and send the more complex claims to a human claims processor.
Determine which decisions will be based on business rules and which will be based on machine learning. Mange decisions that rely on compliance and business strategy by rules. Use machine learning for decisions that rely on experiences, including whether a claim is fraudulent and how much the payment will be.

Once the decision logic is in good shape, it is time to build business rules and machine learning models. Clarifying the decision logic reveals the true data needs, which helps decision owners see exactly what data and analytics drive decisions.

ROI estimation

One way to frame the problem is to determine how to measure ROI. Consider:

For ROI, multiple AI models are involved in an STP use case. For example, fraud detection, claims severity prediction, and litigation likelihood prediction are common use cases for models that can augment business rules and human judgment. Insurers implementing fraud detection models have reduced payments to fraud by 15% to 25% annually, saving $1 million to $3 million.

To measure:

  1. Identify the number of fraudulent claims that models detected but manual processing failed to identify (false negatives).
  2. Calculate the monetary amount that would have been paid on these fraudulent claims if machine learning had not flagged them as fraud.

    100 fraudulent claims * $20,000 each on average = $2 million per year

  3. Identify fraudulent claims that manual investigation detected but machine learning failed to detect.

  4. Calculate the monetary amount that would have been paid without manual investigation.

    40 fraudulent claims * $5,000 each on average = $0.2 million per year

The difference between these two numbers would be the ROI.

$2 million – $0.2 million = $1.8 million per year

Working with data

For illustrative purposes, this guide uses a simulated dataset that resembles insurance company data. The dataset consists of 10,746 rows and 45 columns.

Features and sample data

The target variable for this use case is whether or not a claim submitted is fraudulent. It is a binary classification problem. In this dataset 1,746 of 10,746 claims (16%) are fraudulent.

The target variable:

  • FRAUD

Data preparation

Below are examples of 44 features that can be used to train a model to identify fraud. They consist of historical data on customer policy details, claims data including free-text description, and internal business rules from national databases. These features help DataRobot extract relevant patterns to detect fraudulent claims.

Beyond the features listed below, it might help to incorporate any additional data your organization collects that could be relevant to detecting fraudulent claims. For example, DataRobot is able to process image data as a feature together with numeric, categorical, and text features. Images of vehicles after an accident may be useful to detect fraud and help predict severity.

Data from the claim table, policy table, customer table, and vehicle table are merged with customer ID as a key. Only data known before or at the time of the claim creation is used, except for the target variable. Each record in the dataset is a claim.

Sample feature list

Feature name Data type Description Data source Example
ID Numeric Claim ID Claim 156843
FRAUD Numeric Target Claim 0
DATE Date Date of Policy Policy 31/01/2013
POLICY_LENGTH Categorical Length of Policy Policy 12 month
LOCALITY Categorical Customer’s locality Customer OX29
REGION Categorical Customer’s region Customer OX
GENDER Numeric Customer’s gender Customer 1
CLAIM_POLICY_DIFF_A Numeric Internal Policy 0
CLAIM_POLICY_DIFF_B Numeric Internal Policy Policy
CLAIM_POLICY_DIFF_C Numeric Internal Policy Policy
CLAIM_POLICY_DIFF_D Numeric Internal Policy Policy
CLAIM_POLICY_DIFF_E Numeric Internal Policy Policy
POLICY_CLAIM_DAY_DIFF Numeric Number of days since policy taken Policy, Claim 94
DISTINCT_PARTIES_ON_CLAIM Numeric Number of people on claim Claim 4
CLM_AFTER_RNWL Numeric Renewal History Policy
NOTIF_AFT_RENEWAL Numeric Renewal History Policy
CLM_DURING_CAX Numeric Cancellation claim Policy 0
COMPLAINT Numeric Customer complaint Policy 0
CLM_before_PAYMENT Numeric Claim before premium paid Policy, Claim 0
PROP_before_CLM Numeric Claim History Claim 0
NCD_REC_before_CLM Numeric Claim History Claim 1
NOTIF_DELAY Numeric Delay in notification Claim 0
ACCIDENT_NIGHT Numeric Night time accident Claim 0
NUM_PI_CLAIM Numeric Number of personal injury claims Claim 0
NEW_VEHICLE_BEFORE_CLAIM Numeric Vehicle History Vehicle, Claim 0
PERSONAL_INJURY_INDICATOR Numeric Personal Injury flag Claim 0
CLAIM_TYPE_ACCIDENT Numeric Claim details Claim 1
CLAIM_TYPE_FIRE Numeric Claim details Claim 0
CLAIM_TYPE_MOTOR_THEFT Numeric Claim details Claim 0
CLAIM_TYPE_OTHER Numeric Claim details Claim 0
CLAIM_TYPE_WINDSCREEN Numeric Claim details Claim 0
LOCAL_TEL_MATCH Numeric Internal Rule Matching Claim 0
LOCAL_M_CLM_ADD_MATCH Numeric Internal Rule Matching Claim 0
LOCAL_M_CLM_PERS_MATCH Numeric Internal Rule Matching Claim 0
LOCAL_\NON\_CLM\_ADD\_MATCH\t Numeric Internal Rule Matching Claim 0
LOCAL_NON_CLM_PERS_MATCH Numeric Internal Rule Matching Claim 0
federal_TEL_MATCH Numeric Internal Rule Matching Claim 0
federal_CLM_ADD_MATCH Numeric Internal Rule Matching Claim 0
federal_CLM_PERS_MATCH Numeric Internal Rule Matching Claim 0
federal_NON_CLM_ADD_MATCH Numeric Internal Rule Matching Claim 0
federal_NON_CLM_PERS_MATCH Numeric Internal Rule Matching Claim 0
SCR_LOCAL_RULE_COUNT Numeric Internal Rule Matching Claim 0
SCR_NAT_RULE_COUNT Numeric Internal Rule Matching Claim 0
RULE MATCHES Numeric Internal Rule Matching Claim 0
CLAIM_DESCRIPTION Text Customer Claim Text Claim this via others themselves inc become within ours slow parking lot fast vehicle roundabout mall not indicating car caravan neck emergency

Modeling and insights

DataRobot automates many parts of the modeling pipeline, including processing and partitioning the dataset, as described here. That activity is not described here and instead the following describes model interpretation. Reference the DataRobot documentation to see how to use DataRobot from start to finish and how to understand the data science methodologies embedded in its automation.

Feature Impact

Feature Impact reveals that the number of past personal injury claims (NUM_PI_CLAIM) and internal rule matches (LOCAL_M_CLM_PERS_MATCH, RULE_MATCHES, SCR_LOCAL_RULE_COUNT) are among the most influential features in detecting fraudulent claims.

Feature Effects/partial dependence

The partial dependence plot in Feature Effects shows that the larger the number of personal injury claims (NUM_PI_CLAIM), the higher the likelihood of fraud. As expected, when a claim matches internal red flag rules, its likelihood of being fraud increases greatly. Interestingly, GENDER and CLAIM_TYPE_MOTOR_THEFT (car theft) are also strong features.

Word Cloud

The current data includes CLAIM_DESCRIPTION as text. A Word Cloud reveals that customers who use the term "roundabout," for example, are more likely to be committing fraud than those who use the term "emergency." (The size of a word indicates how many rows include the word; the deeper red indicates the higher association it has to claims scored as fraudulent. Blue words are terms associated with claims scored as non-fraudulent.)

Prediction Explanations

Prediction Explanations provide up to 10 reasons for each prediction score. Explanations provide Send directly to Special Investigation Unit (SIU) agents and claim handlers with useful information to check during investigation. For example, DataRobot not only predicts that Claim ID 8296 has a 98.5% chance of being fraudulent, but it also explains that this high score is due to a specific internal rule match (LOCAL_M_CLM_PERS_MATCH, RULE_MATCHES) and the policyholder’s six previous personal injury claims (NUM_PI_CLAIM). When claim advisors need to deny a claim, they can provide the reasons why by consulting Prediction Explanations.

Evaluate accuracy

There are several vizualizations that help to evaluate accuracy.

Leaderboard

Modeling results show that the ENET Blender is the most accurate model, with 0.93 AUC on cross validation. This is an ensemble of eight single models. The high accuracy indicates that the model has learned signals to distinguish fraudulent from non-fraudulent claims. Keep in mind, however, that blenders take longer to score compared to single models and so may not be ideal for real-time scoring.

The Leaderboard shows that the modeling accuracy is stable across Validation, Cross Validation, and Holdout. Thus, you can expect to see similar results when you deploy the selected model.

Lift Chart

The steep increase in the average target value in the right side of the Lift Chart reveals that, when the model predicts that a claim has a high probability of being fraudulent (blue line), the claim tends to actually be fraudulent (orange line).

Confusion matrix

The confusion matrix shows:

  • Of 2,149 claims in the holdout partition, the model predicted 372 claims as fraudulent and 1,777 claims as legitimate.
  • Of the 372 claims predicted as fraud, 275 were actually fraudulent (true positives), and 97 were not (false positives).
  • Of 1,777 claims predicted as non-fraud, 1,703 were actually not fraudulent (true negatives) and 74 were fraudulent (false negatives).

Analysts can examine this table to determine if the model is accurate enough for business implementation.

Post-processing

To convert model predictions into decisions, you determine the best thresholds to classify a whether a claim is fraudulent.

ROC Curve

Set the ROC Curve threshold depending on how you want to use model predictions and business constraints. Some examples:

If... Then...
...the main use of the fraud detection model is to automate payment ...minimize the false negatives (the number of fraudulent claims mistakenly predicted as not fraudulent) by adjusting the threshold to classify prediction scores into fraud or not.
...the main use is to automate the transfer of the suspicious claims to SIU ...minimize false positives (the number of non-fraudulent claims mistakenly predicted as fraudulent).
...you want to minimize the false negatives, but you do not want false positives to go over 100 claims because of the limited resources of SIU agents ...lower the threshold just to the point where the number of false positives becomes 100.

Payoff matrix

From the Profit Curve tab, use the Payoff Matrix to set thresholds based on simulated profit. For example:

Payoff value Description
True positive = $20,000 Average payment associated with a fraudulent claim.
False positive = -$20,000 This is assuming that a false positive means that a human investigator will not be able to spend time detecting a real fraudulent claim.
True negative = $100 Leads to auto pay of claim and saves by eliminating manual claim processing.
False negative = -$20,000 Cost of missing fraudulent claims.

DataRobot then automatically calculates the threshold that maximizes profit. You can also measure DataRobot ROI by creating the same payoff matrix for your existing business process and subtracting the max profit of the existing process from that calculated by DataRobot.

Once the threshold is set, model predictions are converted into fraud or non-fraud according to the threshold. These classification results are integrated into BRMS and become one of the many factors that determine the final decision.

Predict and deploy

After selecting the model that best learns patterns to predict fraud, you can deploy it into your desired decision environment. Decision environments are the ways in which the predictions generated by the model will be consumed by the appropriate organizational stakeholders, and how these stakeholders will make decisions using the predictions to impact the overall process.

Decision stakeholders

The following table lists potential decision stakeholders:

Stakeholder Description
Decision executors The decision logic assigns claims that require manual investigation to claim handlers (executors) and SIU agents based claim complexity. They investigate the claims referring to insights provided by DataRobot and decide whether to pay or deny. They report to decision authors the summary of claims received and their decisions each week.
Decision managers Managers monitor the KPI dashboard, which visualizes the results of following the decision logic. For example, they track the number of fraudulent claims identified and missed. They can discuss with decision authors how to improve the decision logic each week.
Decision authors Senior managers in the claims department examine the performance of the decision logic by receiving input from decision executors and decision managers. For example, decision executors will inform whether or not the fraudulent claims they receive are reasonable, and decision managers will inform whether or not the rate of fraud is as expected. Based on the inputs, decision authors update the decision logic each week.

Decision process

This use case blends augmentation and automation for decisions. Instead of claim handlers manually investigating every claim, business rules and machine learning will identify simple claims that should be automatically paid and problematic claims that should be automatically denied. Fraud likelihood scores are sent to BRMS through the API and post-processed into high, medium, and low risk, based on set thresholds, and arrive at one of the following final decisions:

Action Degree of risk
SIU High
Assign to claim handlers Medium
Auto pay Low
Auto deny Low

Routing to claims handlers includes an intelligent triage, in which claims handlers receive fewer claims and just those which are better tailored to their skills and experience. For example, more complex claims can be identified and sent to more experienced claims handlers. SIU agents and claim handlers will decide whether to pay or deny the claims after investigation.

Model deployment

Predictions are deployed through the API and sent to the BRMS.

Model monitoring

Using DataRobot MLOps, you can monitor, maintain, and update models within a single platform.

Each week, decision authors monitor the fraud detection model and retrain the model if data drift reaches a certain threshold. In addition, along with investigators, decision authors can regularly review the model decisions to ensure that data are available for future retraining of the fraud detection model. Based on the review of the model's decisions, the decision authors can also update the decision logic. For example, they might add a repair shop to the red flags list and improve the threshold to convert fraud scores into high, medium, or low risk.

DataRobot provides tools for managing and monitoring the deployments, including accuracy and data drift.

Implementation considerations

Business goals should determine decision logic, not data. The project begins with business users building decision logic to improve business processes. Once decision logic is ready, true data needs will become clear.

Integrating business rules and machine learning to production systems can be problematic. Business rules and machine learning models need to be updated frequently. Externalizing the rules engine and machine learning allows decision authors to make frequent improvements to decision logic. When the rules engine and machine learning are integrated into production systems, updating decision logic becomes difficult because it will require changes to production systems.

Trying to automate all decisions will not work. It is important to decide which decisions to automate and which decisions to assign to humans. For example, business rules and machine learning cannot identify fraud 100% of the time; human involvement is still necessary for more complex claim cases.

No-Code AI Apps

Consider building a custom application where stakeholders can interact with the predictions and record the outcomes of the investigation. Once the model is deployed, predictions can be consumed for use in the decision process. For example, this No-Code AI App is an easily shareable, AI-powered application using a no-code interface:

Notebook demo

See the notebook version of this accelerator here.


Updated February 1, 2024