Skip to content

Insurance settings

The following sections describe the weighting features available in advanced experiment setup. These settings are typically used by the insurance industry.

Experiments built using the offset, exposure, and/or count of events parameters produce the same DataRobot insights as projects that do not. However, DataRobot excludes offset, exposure, and count of events columns from the predictive set. That is, the selected columns are not part of the Coefficients, Individual Prediction Explanations, or Feature Impact visualizations; they are treated as special columns throughout the experiment. While the exposure, offset, and count of events columns do not appear in these displays as features, their values have been used in training.

Exposure

In regression problems, Exposure can be used to weight features in order to handle observations that are not of equal duration. It's commonly used in insurance use cases to introduce a measure of period duration. For example, in a use case where each row represents a policy-year, a policy that was applicable for half of the year will have an Exposure parameter of 0.5. DataRobot handles a feature selected for exposure as a special column, adding it to raw predictions when building or scoring a model. The selected column(s) must be present in any dataset later uploaded for predictions.

Only optimization metrics with the log link function (Poisson, Gamma, or Tweedie deviance) can make use of exposure values in modeling. For these optimization metrics, DataRobot log transforms the value of the field you specify as an exposure (you do not need to do it). If you select otherwise, DataRobot returns an informative message. See below for more training and prediction application details.

Count of events

The Count of events parameter improves modeling of a zero-inflated target by adding information on the frequency of non-zero events. Frequency x Severity (two-stage) models handle it as a special column. The frequency stage uses the column to model the frequency of non-zero events. The severity stage normalizes the severity of non-zero events in the column and uses that value as the target. This improves interpretability of frequency and severity coefficients. The column is not used for making predictions on new data.

The Count of events parameter is used in two-stage models—that is, Frequency-Severity and Frequency-Cost blueprints. Stages for each are described below.

Frequency-Severity models

  1. Model the frequency of events using Count of events as the target.

  2. Model the severity of non-zero events, where the target is the normalized target column (target divided by Count of events), and the Count of events column is used as the weight.

Frequency-Cost

  1. Model the frequency of events using Count of events as the target.

  2. Model the severity of events using the original target and predictions from stage 1 as an offset.

The first stage of both these two stage models, Frequency, is always a poisson regression model. If you supply a count feature, that value is the stage one target. Otherwise, DataRobot creates a 0/1 target.

Offset

In regression and binary classification problems, the Offset parameter sets feature(s) that should be treated as a fixed component for modeling (coefficient of 1 in generalized linear models or gradient boosting machine models). Offsets are often used to incorporate pricing constraints or to boost existing models. DataRobot handles a feature selected for offset as a special column, adding it to raw predictions when building or scoring a model; the selected column(s) must be present in any dataset later uploaded for predictions.

  • For regression problems, if the optimization metric is Poisson, Gamma, or Tweedie deviance, DataRobot uses the log link function, in which case offsets should be log transformed in advance. Otherwise, DataRobot uses the identity link function and no transformation is needed for offsets.

  • For binary classification problems, DataRobot uses the logit link function, in which case offsets should be logit transformed in advance.

See below for more training and prediction application details.

Offset explained

Applying offset is helpful when working with projects that rely on data that has a fixed component and a variable component. Offsets lets you limit a model to predicting on only the variable component. This is especially important when the fixed component varies. When you set the offset parameter, DataRobot marks the feature as such and makes predictions without considering the fixed value.

Two examples:

  1. Residual modeling is a commonly used method when important risk factors (for example, underwriting cycle, year, age, loss maturity, etc.) contribute strongly to the outcome, and mask all other effects, potentially leading to a highly biased result. Setting Offsets deals with the data bias issue. Using a feature set as an offset is the equivalent of running the model against the residuals of the selected feature set. By modeling on residuals, you can tell the model to focus on telling you new information, rather than what you already know. With offsets, DataRobot focuses on the "other" factors when model building, while still incorporating the main risk factors in the final predictions.

  2. The constraint issue in insurance can arise due to market competition or regulation. Some examples are: discounts on multicar or home-auto package policies being limited to a 20% maximum, suppressing rates for youthful drivers, or suppressing rates for certain disadvantaged territories. In these types of cases, some of the variables can be set to a specific value and added to the model predictions as offsets.

Offset and exposure in modeling

During training, offset and exposure are incorporated into modeling using the following logic:

Project metric Modeling logic
RMSE Y-offset ~ X
Poisson/Tweedie/Gamma/RMSLE ln(Y/Exposure) - offset ~ X

When making predictions, the following logic is applied:

Project metric Prediction calculation logic
RMSE model(X) + offset
Poisson/Tweedie/Gamma/RMSLE exp(model(X) + offset) * exposure