Skip to content

On-premise users: click in-app to access the full platform documentation for your version of DataRobot.

Predictions with feature transformations

When making predictions with feature transformations, DataRobot assigns rows by forecast distance, builds separate models for each distance, and then makes row-by-row predictions. This method uses time series wrangling for transparent and flexible feature engineering.

When to use

Use this method when:

  • The dataset is larger than 10GB.
  • Forecasting is not needed, but you want predictions based on forecast distance.
  • You want full transparency of the transformation process.
  • You want access to the repository of time series blueprints.

Note

Predictions with feature transformations are only supported for regression experiments.

How to use

To use this method:

  1. Use time series wrangling to prepare your dataset.
  2. Complete the partitioning part of the configuration described in the basic time-aware modeling setup.
  3. Enable time series modeling and configure the parameters for feature derivation without the full time series feature derivation process.
graph TB
  A[Upload data] --> B["Configure wrangling (feature engineering)"]
  B --> C[Create experiment];
  C --> D[Enable date/time partitioning];
  D --> E[Set ordering feature];
  E -. optional .-> F[Set backtest partitions];
  F -. optional .-> G[Set sampling];
  G --> H[Enable time series modeling]
  H --> I[Disable automated feature engineering]
  I --> J[Set model selection criteria]
  J --> K[Start modeling]

Enable time-aware predictions

Use any of the options below to access the toggle that allows you to create experiments that launch time-aware predictions or time series modeling:

  • Select Go to time series modeling settings from the Date/time partitioning setup page for time relevant data.

  • Select Time series modeling in the Experiment summary panel.

  • Select Time series modeling in the top tabs.

All options open the settings to the Time series modeling tab. From there, toggle on Enable time series modeling.

Disable feature engineering

Making predictions with time-aware data allows you to model on much larger datasets than those supported by traditional time series modeling. This is achieved by creating time series wrangling recipes and applying them to the data. The recipes allow you to customize the feature transformations to your needs, and because feature derivation has already been applied, you must disable the DataRobot automated process. To do so, toggle the option off.

Set modeling parameters

Once feature engineering is disabled, additional settings become available. You must set at least one of the following to complete the configuration and begin modeling. Both forecasting distance and series ID define the criteria DataRobot uses to group rows, build models from those rows, and make predictions. Forecasting offsets set a value for DataRobot to add to the baseline model when building or scoring models.

Setting one or more of these parameters allows you to build time series blueprints. Without a "special" categorical column identified, DataRobot cannot use the time series blueprints and only those available to basic date/time partitioned predictions are available.

Note

When selecting feature values for these parameters, DataRobot make all dataset features available. If you are unsure whether a specific feature is appropriate, visit the dataset preview.

Parameter Description
Forecasting distances Select a numeric feature that specifies the number of rows into the future each row should predict. The values in the feature you select will provide the distance offset for each row. If not set, DataRobot defaults to a uniform forecast distance of 1. Use time series wrangling to create a Forecast Distance column, as needed.
Forecasting offsets Select one or more features that should be treated as a fixed component for modeling. Applying an offset to the baseline model during model training adds those values to raw predictions. The values must be pure numeric (currency, date, length, etc. are not acceptable). One model will be trained for each offset selected, and the selected column(s) must be present in any dataset later uploaded for predictions.
Series ID Select a series identifier that groups rows by the series they belong to.
Offsets explained

The Offset parameter adjusts the model intercept (linear model) or margin (tree-based model) for each sample; it accepts multiple features. Applying an offset is helpful when working with projects that rely on data that has a fixed component and a variable component. Offsets let you limit a model to predicting on only the variable component. This is especially important when the fixed component varies. When you set the Offset parameter, DataRobot marks the feature as such and makes predictions without considering the fixed value.

Offsets are often used to incorporate pricing constraints or to boost existing models. Two examples:

  1. Residual modeling is a commonly used method when important risk factors (for example, underwriting cycle, year, age, loss maturity, etc.) contribute strongly to the outcome, and mask all other effects, potentially leading to a highly biased result. Setting Offsets deals with the data bias issue. Using a feature set as an offset is the equivalent of running the model against the residuals of the selected feature set. By modeling on residuals, you can tell the model to focus on telling you new information, rather than what you already know. With offsets, DataRobot focuses on the "other" factors when model building, while still incorporating the main risk factors in the final predictions.

  2. The constraint issue in insurance can arise due to market competition or regulation. Some examples are: discounts on multicar or home-auto package policies being limited to a 20% maximum, suppressing rates for youthful drivers, or suppressing rates for certain disadvantaged territories. In these types of cases, some of the variables can be set to a specific value and added to the model predictions as offsets.

In modeling without date/time with feature transformation, if there are three forecast offsets [base1, base2, base3], DataRobot creates three models—model[base1], model[base2], model[base3]. Each would require a separate experiment. With transformations, the three models can be created in a single experiment.

Forecasting distances example

If the values for the feature Return_Pct were [1,2,3,1,2,3], all rows with value of 1 in that column are grouped and a model is trained on that data. The same applies for rows with values of 2 and values of 3. That value is used as an offset for the forecast point and is also used as a feature for modeling.


Updated January 30, 2025