Claim triage notebook¶
Claim payments and claim adjustments are typically an insurance company’s largest expenses. For long-tail lines of business, such as workers’ compensation (which covers medical expenses and lost wages for injured workers), claims can take years to be paid in full; therefore, the true cost of a claim may not be known for many years. However, claim adjustment activities start when a claim is made aware to the insurer.
This notebook aims to outline a workflow for evaluating the severity of an insurance claim in order to triage it effectively.
Download the sample training data here.
Setup¶
Import libraries¶
import datetime
import os
import datarobot as dr
import pandas as pd
Connect to DataRobot¶
Read more about different options for connecting to DataRobot from the client.
# If the config file is not in the default location described in the API Quickstart guide, '~/.config/datarobot/drconfig.yaml', then you will need to call
# dr.Client(config_path='path-to-drconfig.yaml')
Import data¶
df = pd.read_csv(
"https://s3.amazonaws.com/datarobot_public_datasets/DR_Demo_Statistical_Case_Estimates.csv"
)
df.info()
df.head()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 21691 entries, 0 to 21690 Data columns (total 16 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 ReportingDelay 21691 non-null int64 1 AccidentHour 21691 non-null int64 2 Age 21691 non-null int64 3 WeeklyRate 21691 non-null float64 4 Gender 21691 non-null object 5 MaritalStatus 21669 non-null object 6 HoursWorkedPerWeek 21691 non-null float64 7 DependentChildren 21691 non-null int64 8 DependentsOther 21691 non-null int64 9 PartTimeFullTime 21691 non-null object 10 DaysWorkedPerWeek 21691 non-null int64 11 DateOfAccident 21691 non-null object 12 ClaimDescription 21691 non-null object 13 ReportedDay 21691 non-null object 14 InitialCaseEstimate 21691 non-null int64 15 Incurred 21691 non-null float64 dtypes: float64(3), int64(7), object(6) memory usage: 2.6+ MB
ReportingDelay | AccidentHour | Age | WeeklyRate | Gender | MaritalStatus | HoursWorkedPerWeek | DependentChildren | DependentsOther | PartTimeFullTime | DaysWorkedPerWeek | DateOfAccident | ClaimDescription | ReportedDay | InitialCaseEstimate | Incurred | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 15 | 11 | 28 | 500.00 | M | S | 44.0 | 0 | 0 | F | 5 | 5/10/2005 | STRUCK SCAFFOLDING STRAIN LOWER BACK | 3Thurs | 9500 | 151.254501 |
1 | 22 | 5 | 29 | 500.00 | M | S | 38.0 | 0 | 0 | F | 5 | 28/10/2003 | STRUCK KNIFE LACERATED LEFT THUMB | 2Wed | 11500 | 442.125024 |
2 | 22 | 7 | 28 | 197.37 | M | M | 16.0 | 0 | 0 | P | 3 | 25/05/2004 | SLIPPED AND HIT STRAINED LEFT SHOULDER INJURY ... | 2Wed | 8000 | 1494.490505 |
3 | 15 | 12 | 40 | 0.00 | M | M | 0.0 | 1 | 0 | F | 5 | 21/07/1994 | HIT FALLING DOOR LACERATION LEFT SHOULDER INJU... | 4Fri | 500 | 138.900000 |
4 | 38 | 12 | 22 | 435.70 | M | M | 38.0 | 0 | 0 | F | 5 | 9/06/1992 | STRUCK FALLING OBJECT LACERATION RIGHT RING FI... | 4Fri | 320 | 296.160000 |
Check the distribution of the target feature¶
The histogram below shows the distribution of the target, Incurred
. You can observe that most claims are closed with a small payment; the median is less than $500, and the 75th percentile is $1820. This right-skewed distribution represents a typical distribution of insurance claims. This is often modeled by Gamma distribution. Autopilot automatically detects the target's distribution and recommends the appropriate optimization metric accordingly.
In this example, DataRobot recommends Gamma Deviance, which is used for modeling in this use case. Besides Gamma Deviance, DataRobot also includes a list of metrics that you can select based on your own needs.
df["Incurred"].describe()
count 2.169100e+04 mean 9.647973e+03 std 4.713051e+04 min 2.220000e-15 25% 1.802750e+02 50% 4.959266e+02 75% 1.827121e+03 max 1.563323e+06 Name: Incurred, dtype: float64
df["Incurred"].hist(bins=10)
<AxesSubplot:>
Feature engineering¶
Feature engineering has proven to improve model performance in many use cases, although it is an optional step for this workflow. This provides an opportunity to incorporate domain knowledge into the model. Feature engineering can be as simple as extracting the month from a date so that a seasonal trend can be captured. Alternatively, it can be as complicated as getting aggregated claim counts for the past x years. Therefore depending on the use case, feature engineering can be very time-consuming. For illustration purposes, feature engineering is already performed on the example dataset.
For example, ReportingDelay
= ReportDate
- AccidentDate
. In Python, the following code can be used to achieve this:
df["ReportingDelay"] = df["ReportDate"] - df["AccidentDate"]
The result above assumes that ReportDate
and AccidentDate
are both recorded in the original dataset.
Modeling¶
Create a project¶
Upload the training data to DataRobot to create a project. In this example, upload the dataframe "df" created in previous steps to DataRobot.
project_name = f"Workers Comp Claim Severity {datetime.datetime.now()}"
project = dr.Project.create(df, project_name=project_name)
Initiate Autopilot¶
project.analyze_and_model(target="Incurred", worker_count=-1)
Project(Workers Comp Claim Severity 2022-10-18 13:01:08.535262)
The output for the following snippet displays the features included in the "Informative Features" feature list.
flists = project.get_featurelists()
flist = next(x for x in flists if x.name == "Informative Features")
flist.features
['ReportingDelay', 'AccidentHour', 'Age', 'WeeklyRate', 'Gender', 'MaritalStatus', 'HoursWorkedPerWeek', 'DependentChildren', 'DependentsOther', 'PartTimeFullTime', 'DaysWorkedPerWeek', 'DateOfAccident (Year)', 'ClaimDescription', 'ReportedDay', 'InitialCaseEstimate', 'Incurred', 'DateOfAccident (Day of Month)', 'DateOfAccident (Day of Week)', 'DateOfAccident (Month)']
Interpret results¶
Once the data is uploaded to DataRobot, and a project has started, you can start exploring in the GUI. Use the URL below will lead you to the project.
project.open_in_browser
<bound method BrowserMixin.open_in_browser of Project(Workers Comp Claim Severity 2022-10-18 12:55:11.203002)>
Exploratory Data Analysis¶
Navigate to the Data tab to learn more about your data.
Click each feature to see information such as the summary statistics (min, max, mean, std) of numeric features or a histogram that represents the relationship of a feature with the target.