There are two AI experimentation "types" available in Workbench:
Time-aware modeling, described on this page, models using time-relevant data to make row-by-row predictions, time series forecasts, or current value predictions "nowcasts".
Predictive modeling, described here, makes row-by-row predictions based on your data.
There is extensive material available about the fundamentals of time aware modeling. While the instructions largely represent the workflow as applied in DataRobot Classic, the reference material describing the framework, feature derivation process, and more are still applicable.
Experiments are the individual "projects" within a Use Case. They allow you to vary data, targets, and modeling settings to find the optimal models to solve your business problem. Within each experiment, you have access to its Leaderboard and model insights, as well as experiment summary information.
Date/time partitioning, the basis for building time-aware projects, and time series modeling are on by default.
Feature flag: Enable Date/Time Partitioning (OTV) in Workbench, Enable Workbench for Time Series Projects
Follow the steps below to create a new experiment from within a Use Case.
You can also start modeling directly from a dataset by clicking the Start modeling button. The Set up new experiment page opens. From there, the instructions follow the flow described below.
Create a feature list¶
Support for feature lists in Workbench is on by default.
Feature flag: Enable Feature Lists in Workbench Preview
Before modeling, you can create a custom feature list from the Datasets tab. If you select that list during modeling setup, DataRobot creates the modeling data using only the features in that list.
To create a new list:
- From the Use Case, select the dataset you plan to model with to open the data preview.
Click the dropdown at the top of the page and select + New feature list to open the Features view.
Select the checkbox next to each feature you want to include in your custom list. Then, click Create feature list, enter a name and description (optional), and click Save changes.
DataRobot automatically creates new feature lists after the feature derivation process. Once modeling completes, you can train new models using the time-aware lists. Learn more about feature lists and the data tab here.
From within a Use Case, click Add new and select Add experiment. The Set up new experiment page opens, which lists all data previously loaded to the Use Case.
Add data to the experiment, either by adding new data (1) or selecting a dataset that has already been loaded to the Use Case (2).
Once the data is loaded to the Use Case (option 2 above), click to select the dataset you want to use in the experiment. Workbench opens a preview of the data.
From here, you can:
|Click to return to the data listing and choose a different dataset.|
|Click the icon to proceed and set the target.|
|Click Next to proceed and set the target.|
Once you have proceeded to target selection, Workbench prepares the dataset for modeling (EDA 1). When the process finishes, to set the target, either:
Scroll through the list of features to find your target. If it is not showing, expand the list from the bottom of the display:
Once located, click the entry in the table to use the feature as the target.
Type the name of the target feature you would like to predict in the entry box. DataRobot lists matching features as you type:
After the target is defined, Workbench displays a histogram providing information about the target feature's distribution and, in the right panel, a summary of the experiment settings. From here, you can build models with the default settings for predictive modeling.
Customize basic settings¶
Prior to enabling time series modeling, you can change several of the basic modeling settings. These options are common to both predictive and time-aware modeling.
Changing experiment parameters is a good way to iterate on a Use Case. Before starting to model, you can change a variety of settings:
|Positive class||For binary classification projects only. The class to use when a prediction scores higher than the classification threshold.|
|Modeling mode||the modeling mode, which influences the blueprints DataRobot chooses to train.|
|Optimization metric||The optimization metric to one different from DataRobot's recommendation.|
|Training feature list||The subset of features that DataRobot uses to build models.|
After changing any or all of the settings described, click Start modeling to begin the Quick mode modeling process or customize more advanced settings.
Change modeling mode¶
By default, DataRobot builds experiments using Quick Autopilot. However, you can change the modeling mode to train specific blueprints or all applicable repository blueprints.
The following table describes each of the modeling modes:
|Quick (default)||Using a sample size of 64%, Quick Autopilot runs a subset of models, based on the specified target feature and performance metric, to provide a base set of models that build and provide insights quickly.|
|Manual||Manual mode gives you full control over which blueprints to execute. After EDA2 completes, DataRobot redirects you to the blueprint repository where you can select one or more blueprints for training.|
Change optimization metric¶
The optimization metric defines how DataRobot scores your models. After you choose a target feature, DataRobot selects an optimization metric based on the modeling task. Typically, the metric DataRobot chooses for scoring models is the best selection for your experiment. To build models using a different metric, overriding the recommended metric, use the Optimization metric dropdown:
See the reference material for a complete list and descriptions of available metrics.
Change feature list (pre-modeling)¶
Feature lists control the subset of features that DataRobot uses to build models. Workbench defaults to the Informative Features list, but you can modify that before modeling. To change the feature list, click the Feature list dropdown and select a different list:
You can also change the selected list on a per-model basis once the experiment finishes building.
Configure additional settings¶
Choose the Additional settings tab to set more advanced modeling capabilities. Note that the Time series modeling tab will be available or greyed out depending on whether DataRobot found any date/time features in the dataset.
Configure the following, as required by your business use case.
You can complete the time-aware configuration or the additional settings in either order.
Monotonic feature constraints¶
Monotonic constraints control the influence, both up and down, between variables and the target. In some use cases (typically insurance and banking), you may want to force the directional relationship between a feature and the target (for example, higher home values should always lead to higher home insurance rates). By training with monotonic constraints, you force certain XGBoost models to learn only monotonic (always increasing or always decreasing) relationships between specific features and the target.
Using the monotonic constraints feature requires creating special feature lists, which are then selected here. Note also that when using Manual mode, available blueprints are marked with a MONO badge to identify supporting models.
Weight sets a single feature to use as a differential weight, indicating the relative importance of each row. It is used when building or scoring a model—for computing metrics on the Leaderboard—but not for making predictions on new data. All values for the selected feature must be greater than 0. DataRobot runs validation and ensures the selected feature contains only supported values.
Enable time-aware modeling¶
There are two types of time-aware modeling:
Simple Date/Time partitioning, which assigns rows to backtests chronologically.
Time series modeling for forecasting multiple future values of the target.
Create out-of-time validation (OTV) predictions by clicking on Partitioning in the experiment summary panel, which opens Data partitioning tab.
To model time-relevant data without forecasting, predicting the target value on each individual row instead, select Date/Time as the partitioning method.
When you select Date/time, you are prompted to enter an ordering feature—the primary date/time feature DataRobot uses for modeling. If only one qualifying feature is detected, the field is autofilled. If multiple features are available, click in the box to display a list of all qualifying features. If a feature is not listed, it was not detected as type
date and cannot be used.
Select the ordering feature. Once selected, DataRobot detects and reports the date and/or time format (standard GLIBC strings) for the selected feature:
The Experiment summary updates as your setup continues.
Backtest configuration becomes available. DataRobot sets defaults based on the characteristics of the dataset and can generally be left as-is—they will result in robust models. Alternatively, you can change the default settings.
Change default settings¶
You can change one or more of the default date/time partitioning settings. Controls are described in the table below.
|Backtests||Sets the backtesting partitions. Any changes to these values are represented in the graphics below the entry boxes.
|Use equal rows per backtest||Sets whether each backtest uses the same number of rows (enabled) or the same duration (disabled).|
|Partition sampling method||Sets how to assign rows from the dataset, which is useful if a dataset is not distributed equally over time.|
|Partitioning log||Provides a downloadable log that reports on partition creation.|
|Go to time series modeling settings||Open the Time series modeling tab to enable forecasting and access additional options.|
You can also use the graphics below the entry boxes to edit individual backtests.
When you're satisfied with the modeling settings, click Start modeling to begin the Quick mode Autopilot modeling process.
Time series modeling¶
Time series modeling provides an additional set of options, such as identifying series IDs (as applicable), initiating the creation of a derived feature set for modeling, and other particulars of forecasting.
To create experiments that launch time-aware modeling, you can select:
Go to time series modeling settings from the Date/time partitioning setup page for time relevant data.
Time series modeling in the Experiment summary panel.
Either option opens the settings to the Time series modeling tab. From there, set the toggle to on to enable time series modeling. Note that the Experiment summary updates as your setup continues.
The following settings are available to configure your project:
- Select an ordering feature
- Set the series ID (for multiseries projects)
- Customize window settings
- Set additional optional features
Select an ordering feature¶
The ordering feature is the primary date/time feature that DataRobot will use for modeling. If only one qualifying feature is detected, the field is autofilled. If multiple features are available, click in the box to display a list of all qualifying features and select one. If a feature is not listed, it was not detected as type
date and cannot be used.
After selecting a feature, DataRobot:
Detects and reports the date and/or time format (standard GLIBC strings) for the selected feature:
Computes and then loads a feature-over-time histogram of the ordering feature ("Date" in this example) plotted against the target feature ("Sales" in this example). Note that if your dataset qualifies for multiseries modeling, this histogram represents the average of the time feature values across all series plotted against the target feature.
Set series ID¶
If duplicate time stamps are detected in the data, DataRobot provides options for configuring multiseries modeling. Multiseries modeling allows you to model datasets that contain duplicate timestamps by handling then as multiple, individual time-series datasets. Select a series identifier to indicate which series each row belongs to.
Customize window settings¶
DataRobot provides default window settings, the Feature Derivation Window (FDW) and Forecast Window (FW), based on the characteristics of the dataset. These settings determine how DataRobot derives features for the modeling dataset by defining the basic framework used for the feature derivation process. They can generally be left as-is.
The table below briefly describes the elements of the window setting section of the screen:
If you do decide to modify these values, see the detailed guidance for the meaning and implication of each window.
|Feature Derivation Window (FDW)||Configures the periods of data that DataRobot uses to derive features for the modeling dataset.|
|Exclude listed features from derivation||Excludes specified features from automated time-based feature engineering (for example, if you have extracted your own time-oriented features and do not want further derivation performed on them). Toggle the option on and select features from the dropdown.|
|Forecast Window||Sets the time range of forecasts that the model outputs after the forecast point.|
|Windows summary||Provides a graphical representation of the window settings. Any changes to window values are immediately reflected in the visual.|
Set additional optional features¶
Two additional optional experiment settings are available:
Use Set features that are known in advance to exclude features for which you know their value at modeling time. When a feature is identified with this option, DataRobot will not create lags when deriving modeling data. By informing DataRobot that some variables are known in advance and providing them at prediction time, forecast accuracy is significantly improved. If a feature is flagged as known, however, you must provide its future value at prediction time or predictions will fail. To use this option, toggle it on and select features from the dropdown.
Use Include events calendar to upload or generate an event file that specifies dates or events that require additional attention. DataRobot will use the file to automatically create features based on the listed events. You can choose a local file or one stored in the data registry. Or, click Generate calendar to let DataRobot generate a file of events based on a selected region.
After you are satisfied with the modeling settings (which are summarized in the Experiment summary), simply click Start modeling. When the process begins, DataRobot analyzes the target and creates time-based features to use for modeling.
After you start modeling, DataRobot populates the Leaderboard with models as they complete. You can:
- Begin model evaluation on any available model.
- Use the View experiment info option to view a variety of information about the model.
- Display derived modeling data, which is the data that was used to build the model.
See the following sections for more information on derived modeling data: