Skip to content

On-premise users: click in-app to access the full platform documentation for your version of DataRobot.

Time series forecasting

Time-series modeling is a recommended practice for data science problems where conditions may change over time. With this method, the validation set is made up of observations from a time window outside of (and more recent than) the time window used for model training. Time-aware modeling can make predictions on a single row, or, with its core time series functionality, can extract patterns from recent history and forecast multiple events into the future.

When to use

Use this method when:

  • The dataset is larger than 10GB.
  • Forecasting is not needed but you want predictions based on forecast distance.
  • Full transparency of the transformation process is desired.

How to use

To use this method:

  1. Complete the partitioning part of the configuration described in the basic time-aware modeling setup.
  2. Enable time series modeling.
  3. Configure optional features such as as illustrated below.

    graph TB
      A[Upload data/create experiment] --> |Automated feature derivation|B[Enable date/time partitioning];
      B --> C[Set ordering feature];
      C -. optional .-> D[Set backtest partitions];
      D -. optional .-> E[Set sampling];
      E --> F[Enable time series modeling]
      F -. optional .-> G[Set series ID]
      G -. optional .-> H[Customize windows]
      H -. optional .-> I[Set optional features]
      I --> J[Start modeling]
    

Enable time series modeling

Use any of the options below to access the toggle that allows you to create experiments that launch time-aware predictions or time series modeling:

  • Select Go to time series modeling settings from the Date/time partitioning setup page for time relevant data.

  • Select Time series modeling in the Experiment summary panel.

  • Select Time series modeling in the top tabs.

All options open the settings to the Time series modeling tab. From there, toggle on Enable time series modeling.

Settings are inherited from the Data partitioning tab (ordering feature and backtests). See and complete, as needed:

Time series modeling provides an additional set of options for configuring your time series experiment:

Set series ID

If duplicate time stamps are detected in the data, DataRobot provides options for configuring multiseries modeling. Multiseries modeling allows you to model datasets that contain duplicate timestamps by handling them as multiple, individual time-series datasets. Select a series identifier to indicate which series each row belongs to.

Customize window settings

DataRobot provides default window settings, the Feature Derivation Window (FDW) and Forecast Window (FW), based on the characteristics of the dataset. These settings determine how DataRobot derives features for the modeling dataset by defining the basic framework used for the feature derivation process. They can generally be left as-is.

The table below briefly describes the elements of the window setting section of the screen:

Important

If you do decide to modify these values, see the detailed guidance for the meaning and implication of each window.

Option Description
1 Feature Derivation Window (FDW) Configures the periods of data that DataRobot uses to derive features for the modeling dataset.
2 Exclude listed features from derivation Excludes specified features from automated time-based feature engineering (for example, if you have extracted your own time-oriented features and do not want further derivation performed on them). Toggle the option on and select features from the dropdown.
3 Forecast Window Sets the time range of forecasts that the model outputs after the forecast point.
4 Windows summary Provides a graphical representation of the window settings. Any changes to window values are immediately reflected in the visual.

Set additional optional features

Three additional optional experiment settings are available:

Use Set features that are known in advance to exclude features for which you know their value at modeling time. When a feature is identified with this option, DataRobot will not create lags when deriving modeling data. By informing DataRobot that some variables are known in advance and providing them at prediction time, forecast accuracy is significantly improved. If a feature is flagged as known, however, you must provide its future value at prediction time or predictions will fail. To use this option, toggle it on and select features from the dropdown.

Use Include events calendar to upload or generate an event file that specifies dates or events that require additional attention. DataRobot will use the file to automatically create features based on the listed events. You can choose a local file or one stored in the data registry. Or, click Generate calendar to let DataRobot generate a file of events based on a selected region.

Use Train models that support partial history when series history is only partially known (historical rows are partially available within the feature derivation window) or when a series has not been seen in the training data. When checked, Autopilot will run blueprints optimized for incomplete historical data, eliminating models with less accurate results for partial history support.

Start forecast modeling

After you are satisfied with the modeling settings (which are summarized in the Experiment summary), click Start modeling. When the process begins, DataRobot analyzes the target and creates time-based features to use for modeling. You can control the number of workers applied to the experiment from the queue in the right panel. Increase or decrease workers for your experiment as needed.

From there, you can also view the jobs that are running, queued, and failed. Expand the queue, if necessary, to see the running jobs and assigned workers.

See the section of troubleshooting tips for assistance, if needed.

What's next?

After you start modeling, DataRobot populates the Leaderboard with models as they complete. You can:

See the following sections for more information on derived modeling data:


Updated January 30, 2025