Time-aware predictions¶
Time-aware predictions use supervised learning to make forecasts and predictions from the other features of the dataset. It assigns rows to backtests chronologically and then makes row-by-row predictions.
When to use¶
Note
The configuration described below can be used alone for time-aware predictions. It is also the starting point for time-aware predictions with feature transformations and time series forecasts.
Use only this method when:
- A time-relevant feature is present in the dataset.
- Forecasting is not needed; you are predicting the target value on each individual row.
- No feature engineering is needed.
How to use¶
To use this method:
- Begin the basic experiment setup.
- Configure date/time partitioning.
- Optionally, modify time-aware parameters.
graph TB
A[Upload data/create experiment] --> B[Enable date/time partitioning];
B --> C[Set ordering feature];
C -. optional .-> D[Set backtest partitions];
D -. optional .-> E[Set sampling];
E --> F[Start modeling]
Enable time-aware modeling¶
Follow the steps below to enable time-aware modeling.
1. Enable date/time partitioning¶
To enable time-aware modeling, first set the partitioning method to Date/time. To do this do one of the following:
- Open the Data partitioning tab.
- Click on Partitioning in the experiment summary panel, which opens the Data partitioning tab.
From the Data partitioning tab, select Date/time as the partitioning method:
2. Set ordering feature¶
After setting the partitioning method to date/time, set the ordering feature—the primary date/time feature DataRobot uses for modeling.
Note
All other settings can be changed or left at the default. The Experiment summary, in the right-hand panel, updates as setup continues.
Select the ordering feature. If only one qualifying feature is detected, the field is autofilled. If multiple features are available, click in the box to display a list of all qualifying features. If a feature is not listed, it was not detected as type date
and cannot be used.
Once set, DataRobot:
-
Detects and reports the date and/or time format (standard GLIBC strings) for the selected feature:
-
Computes and then loads a feature-over-time histogram of the ordering feature. Note that if your dataset qualifies for multiseries modeling, this histogram represents the average of the time feature values across all series plotted against the target feature.
3. Set backtest partitions¶
Once the ordering feature is set, backtest configuration becomes available. Backtests are the time-aware equivalent of cross-validation, but based on time periods or durations instead of random rows. DataRobot sets defaults based on the characteristics of the dataset and can generally be left as-is—they will result in robust models.
Use the links in the table below to change the default settings:
Field | Description | |
---|---|---|
1 | Backtests | Sets the number and duration of backtesting partitions. |
2 | Row assignment | Sets how rows are assigned to backtests and the sampling method. |
3 | Partitioning log / Reset | Provides a downloadable log that reports on partition creation and provides a reset link. |
Learn more
The following sections provide detailed explanations of each concept described in this configuration:
Change number of backtests¶
First, change the number of backtests if desired.
The default number of backtests is dependent on the project parameters, but you can configure up to 20. Before setting the number of backtests, use the histogram to validate that the training and validation sets of each fold will have sufficient data to train a model.
Backtest requirements
For date/time partitioning: * Without time series modeling: Backtests require at least 20 rows in each validation and holdout fold and at least 100 rows in each training fold. If you set a number of backtests that results in any of the partitions not meeting those criteria, DataRobot only runs the number of backtests that do meet the minimums (and marks the display with an asterisk). * With time series modeling: Backtests require at least 4 rows in validation and holdout and at least 20 rows in the training fold. If you set a number of backtests that results in any of the partitions not meeting those criteria, the project could fail. See the time series partitioning reference for more information.
Change partition durations¶
Next, configure the backtesting partitions. If you don't modify any settings, DataRobot disperses rows to backtests equally. However, you can customize backtest gap, training, validation, and holdout data either:
- To apply globally to all backtests in the experiment, use the dropdowns.
- To apply changes to individual backtests, click the bars below the visualization. Individual settings override global settings. Once you modify settings for an individual backtest, any changes to the global settings are not applied to the edited backtest.
As you add backtests to the experiment configuration, the period of training data used shortens. The validation and gap remain set to the duration set in the dropdowns (unless changed individually per backtest).
Review the default partition settings and click to make changes if needed:
The following provides an overview of the application of each partition, but review the linked material for more detail.
Partition | Description |
---|---|
Default validation duration | Sets the size of the partition used for testing—data that is not part of the training set that is used to evaluate a model’s performance. |
Default gap duration | Sets spaces in time, representing gaps between model training and model deployment. Initially set to zero, DataRobot does not process a gap in testing. When set, DataRobot excludes the data that falls in the gap from use in training or evaluation of the model. |
Note how the changes are reflected in the testing representation:
Set backtest partitions individually¶
Regardless of which partition you are setting (training, validation, or gap) elements of the editing screens function the same (holdout is a bit different). To change an individual backtest's duration, first, hover on the color band to see specifics of the specific duration settings:
Then, to modify the duration for an individual backtest, click in the backtest to open the inputs:
Backtests are based on either start and end dates or start date and duration. Gaps—toggle Add gap between partitions to on to enabled—are derived from the date or duration settings. That is, the gap is created by leaving time steps between the training end and validation start (which, for no gap are the same).
To customize based on start and end dates:
- With a gap, set the start and duration for both training and validation.
- Without a gap, set the training start, training duration, and validation duration.
To customize based on start date and duration:
- With a gap, set the training and validation start and end.
- Without a gap, set the training start, training end, validation start, and validation end.
In all cases, DataRobot verifies entries and reports any required changes:
Then, reports valid, set time windows under the input boxes:
Click Save changes when configuration is complete.
Once you have made changes to a data element, DataRobot adds an EDITED label to the backtest. Use the Reset to defaults link to manually reset durations or number of backtests to the default settings.
Change holdout¶
By default, DataRobot creates a holdout fold for training models in your project.
Notes on changing holdout
- Generally speaking, although very experiment-dependent, holdout is roughly 10% of the total duration with possible rounding to some "natural" time frame like n weeks, n months. (This is dependent on whether it is simple time-aware or time series.) View the partitioning log for a description of DataRobot's calculations.
- You can only set holdout in the holdout backtest, you cannot change the training data size in that backtest. DataRobot automatically configures the training partition of the holdout backtest.
- If, during the initial partitioning detection, the backtest configuration of the ordering (date/time) feature, series ID, or target results in insufficient rows to cover both validation and holdout, DataRobot automatically disables holdout. If other partitioning settings are changed (validation or gap duration, start/end dates, etc.), holdout is not affected unless manually disabled.
To modify the holdout length, click the holdout backtest to open the inputs and enter new values:
- To customize based on start and end dates, enter holdout start and end dates.
- To customize based on start date and duration enter holdout start and duration.
Note that the training time span and gap settings are configured automatically and cannot be changed on the holdout backtest:
Note
In some cases, however, you may want to create a project without a holdout set. To do so, uncheck the Add holdout box. Any insights that provide an option to switch between validation and holdout will not show the holdout option.
4. Set sampling¶
After completing backtests, set the row assignment and sampling method.
Row assignment¶
By default, DataRobot ensures that each backtest has the same duration, either the default or the values set from the dropdown(s) or via the bars in the visualization. If you want the backtest to use the same number of rows, instead of the same length of time, use the Equal rows per backtest toggle:
Note
Time series projects also have an option to set row assignment (number of rows or duration) for the training data that is used during feature engineering. Configure this setting in the training window format section.
-
When Equal rows per backtest is checked (which sets the partitions to row-based assignment), only the training end date is applicable.
-
For time series experiments, when Equal rows per backtest is checked, the dates displayed are informative only (that is, they are approximate) and they include padding that is set by the feature derivation and forecast point windows.
Sampling method¶
Once you have selected the mechanism/mode for assigning data to backtests, select the sampling method, either Random or Latest, to set how to assign rows from the dataset.
Setting the sampling method is particularly useful if a dataset is not distributed equally over time. For example, if data is skewed to the most recent date, the results of using 50% of random rows versus 50% of the latest will be quite different. By selecting the data more precisely, you have more control over the data that DataRobot trains on.
Reset defaults¶
When you make and save changes to any of the backtest settings, the backtest is marked with a badge (EDITED). Use the Reset to defaults link to manually reset durations or number of backtests to the default settings. Note that for time series experiments, this action does not reset any window settings.
What's next?¶
For time-aware predictive modeling, when you're satisfied with the modeling settings, click Start modeling to begin the Quick mode Autopilot modeling process. Alternatively, continue the configuration to make: