Skip to content

On-premise users: click in-app to access the full platform documentation for your version of DataRobot.

Time-aware basic modeling

This page describes the basic set up of supervised time-aware experiments, which can then be used for predictions and forecasts, with or without feature engineering applied. Once this setup is complete, you can make:

Date and date range representation

DataRobot uses date points to represent dates and date ranges within the data, applying the following principles:

  • All date points adhere to ISO 8601, UTC (e.g., 2016-05-12T12:15:02+00:00), an internationally accepted way to represent dates and times, with some small variation in the duration format. Specifically, there is no support for ISO weeks (e.g., P5W).
  • Models are trained on data between two ISO dates. DataRobot displays these dates as a date range, but inclusion decisions and all key boundaries are expressed as date points. When you specify a date, DataRobot includes start dates and excludes end dates.
  • Once changes are made to formats using the date partitioning column, DataRobot converts all charts, selectors, etc. to this format for the project.
  • When changing partition year/month/day settings, the month and year values rebalance to fit the larger class (for example, 24 months becomes two years) when possible. However, because DataRobot cannot account for leap years or days in a month as it relates to your data, it cannot convert days into the larger container.

Basic experiment setup

Follow the steps below to create a new experiment from within a Use Case.

Note

You can also start modeling directly from a dataset by clicking the Start modeling button. The Set up new experiment page opens. From there, the instructions follow the flow described below.

Create a feature list

Before modeling, you can create a custom feature list from the data explore page. You can then select that list during modeling setup to create the modeling data using only the features in that list.

DataRobot automatically creates new feature lists after the feature derivation process. Once modeling completes, you can train new models using the time-aware lists. Learn more about feature lists post-modeling here.

Add experiment

From within a Use Case, click Add and select Experiment. The Set up new experiment page opens, which lists all data previously loaded to the Use Case.

Add data

Add data to the experiment, either by adding new data (1) or selecting a dataset that has already been loaded to the Use Case (2).

Once the data is loaded to the Use Case (option 2 above), click to select the dataset you want to use in the experiment. Workbench opens a preview of the data.

From here, you can:

Option
1 Click to return to the data listing and choose a different dataset.
2 Click the icon to proceed and set the learning type and target.
3 Click Next to proceed and set the learning type and target.

Start modeling setup

Once you have proceeded, Workbench prepares the dataset for modeling (EDA 1).

Note

From this point forward in experiment creation, you can either continue setting up your experiment (Next) or you can exit. If you click Exit, you are prompted to discard changes or to save all progress as a draft. In either case, on exit you are returned to the point where you began experiment setup and EDA1 processing is lost. If you chose Exit and save draft, the draft is available in the Use Case directory.

If you open a Workbench draft in DataRobot Classic and make changes that introduce features not supported in Workbench, the draft will be listed in your Use Case but will not be accessible except through the classic interface.

Set learning type

When EDA1 finishes, Workbench progresses to the modeling setup. First, set the learning type.

Learning type Description
Supervised Builds models using the other features of your dataset to make forecasts and predictions; this is the default learning type.
Clustering (unsupervised) Using no target and unlabeled data, builds models that group similar data and identify segments.
Anomaly detection (unsupervised) Using no target and unlabeled data, builds that detect abnormalities in the dataset.

Set target

Availability information

Availability of multilabel (multicategorical) modeling is dependent on your DataRobot package. If it is not enabled for your organization, contact your DataRobot representative for more information.

If using supervised mode, set the target, either by:

Scroll through the list of features to find your target. If it is not showing, expand the list from the bottom of the display:

Once located, click the entry in the table to use the feature as the target.

Type the name of the target feature you would like to predict in the entry box. DataRobot lists matching features as you type:

Depending on the number of values for a given target feature, DataRobot automatically determines the experiment type—either regression or classification. Classification experiments can be binary (binary classification), more than two classes (multiclass), or multilabel. The following table describes how DataRobot assigns a default problem type for numeric and non-numeric target data types:

Target data type Number of unique target values Default problem type Use multiclass/multilabel classification?
Numeric 2 Classification No
Numeric 3+ Regression Yes, optional
Non-numeric 2 Binary classification No
Non-numeric 3-100 Classification Yes, automatic
Non-numeric, numeric 100+ Aggregated classification Yes, automatic

With a target selected, Workbench displays a histogram providing information about the target feature's distribution and, in the right pane, a summary of the experiment settings.

From here you can:

If using the default settings, click Start modeling to begin the Quick mode Autopilot modeling process.

After the target is defined, Workbench displays a histogram providing information about the target feature's distribution and, in the right panel, a summary of the experiment settings. From here, you can build models with the default settings for predictive modeling.

If DataRobot detected a column with time features (variable type "Date") in the dataset, as reported in the Experiment summary, you can build time-aware models.

Customize basic settings

Prior to enabling time-aware modeling, you can change several of the basic modeling settings. These options are common to both predictive and time-aware modeling.

Changing experiment parameters is a good way to iterate on a Use Case. Before starting to model, you can change a variety of settings:

  Setting To change...
1 Positive class For binary classification projects only. The class to use when a prediction scores higher than the classification threshold.
2 Modeling mode The modeling mode, which influences the blueprints DataRobot chooses to train.
3 Optimization metric The optimization metric to one different from DataRobot's recommendation.
4 Training feature list The subset of features that DataRobot uses to build models.

After changing any or all of the settings described, click Next to customize more advanced settings and enable time-aware modeling.

Change modeling mode

By default, DataRobot builds experiments using Quick Autopilot; however, you can change the modeling mode to train specific blueprints or all applicable repository blueprints.

The following table describes each of the modeling modes:

Modeling mode Description
Quick Autopilot (default) Using a sample size of 64%, Quick Autopilot runs a subset of models, based on the specified target feature and performance metric, to provide a base set of models that build and provide insights quickly.
Manual Manual mode gives you full control over which blueprints to execute. After EDA2 completes, DataRobot redirects you to the blueprint repository where you can select one or more blueprints for training.
Comprehensive Autopilot Runs all repository blueprints on the maximum Autopilot sample size to ensure more accuracy for models.

Change optimization metric

The optimization metric defines how DataRobot scores your models. After you choose a target feature, DataRobot selects an optimization metric based on the modeling task. Typically, the metric DataRobot chooses for scoring models is the best selection for your experiment. To build models using a different metric, overriding the recommended metric, use the Optimization metric dropdown:

See the reference material for a complete list and descriptions of available metrics.

Change feature list (pre-modeling)

Feature lists control the subset of features that DataRobot uses to build models. Workbench defaults to the Informative Features list, but you can modify that before modeling. To change the feature list, click the Feature list dropdown and select a different list:

You can also change the selected list on a per-model basis once the experiment finishes building.

Set additional automation

Before moving to advanced settings or beginning modeling, you can configure other automation settings.

After the target is set and the basic settings display, expand Show additional automation settings to see additional options.

Train on GPUs

Premium

GPU workers are a premium feature. Contact your DataRobot representative for information on enabling the feature.

For datasets that include text and/or images and require deep learning models, you can select to train on GPUs to speed up training time. While some of these models can be run on CPUs, others require GPUs to achieve reasonable response time. When Allow training on GPUs is selected, DataRobot detects blueprints that contain certain tasks and includes GPU-supported blueprints in the Autopilot run. Both GPU and CPU variants are available in the repository, allowing a choice of which worker type to train on; GPU variant blueprints are optimized to train faster on GPU workers. Notes about working with GPUs:

  • Once the Leaderboard populates, you can easily identify GPU-based models using filtering.
  • When retraining models, the resulting model is also trained using GPUs.
  • When using Manual mode, you can identify GPU-supported blueprints by filtering in the blueprint repository.
  • If you did not initially select to train with GPUs, you can add GPU-supported blueprints via the repository or by rerunning modeling.
  • Models trained on GPUs are marked with a badge on the Leaderboard:

GPU task support

For some blueprints, there are two versions available in the repository, allowing DataRobot to train on either CPU or GPU workers. Each version is optimized to train on a particular worker type and are marked with an identifying badge—CPU or GPU. Blueprints with the GPU badge will always be trained on a GPU worker. All other blueprints will be trained on a CPU worker.

Consider the following when working with GPU blueprints:

  • GPU blueprints will only be present in the repository when image or text features are available in the training data.
  • In some cases, DataRobot trains GPU blueprints as part of Quick or full Autopilot. To train additional blueprints on GPU workers, you can run them manually from the repository or retrain using Comprehensive mode. (Learn about modeling modes here.)

Feature considerations

  • Due to the inherent differences in the implementation of floating point arithmetic on CPUs and GPUs, using a GPU-trained model in environments without a GPU may lead to inconsistencies. Inconsistencies will vary depending on model and dataset, but will likely be insignificant.

  • Training on GPUs can be non-deterministic. It is possible that training the same model on the same partition results in a slightly different model, scoring differently on the test set.

  • GPUs are only used for training; they are not used for prediction or insights computation.

  • There is no GPU support for custom tasks or custom models.

You can also change the selected list on a per-model basis once the experiment finishes building.

Configure additional settings

Choose the Additional settings tab to set more advanced modeling capabilities. Note that the Time series modeling tab will be available or grayed out depending on whether DataRobot found any date/time features in the dataset.

Configure the following, as required by your business use case.

Note

You can complete the time-aware configuration or the additional settings in either order.

Monotonic feature constraints

Monotonic constraints control the influence, both up and down, between variables and the target. In some use cases (typically insurance and banking), you may want to force the directional relationship between a feature and the target (for example, higher home values should always lead to higher home insurance rates). By training with monotonic constraints, you force certain XGBoost models to learn only monotonic (always increasing or always decreasing) relationships between specific features and the target.

Using the monotonic constraints feature requires creating special feature lists, which are then selected here. Note also that when using Manual mode, available blueprints are marked with a MONO badge to identify supporting models.

Weight

Weight sets a single feature to use as a differential weight, indicating the relative importance of each row. It is used when building or scoring a model—for computing metrics on the Leaderboard—but not for making predictions on new data. All values for the selected feature must be greater than 0. DataRobot runs validation and ensures the selected feature contains only supported values.

What's next?

Once basic configuration is complete, you can continue to make:


Updated January 30, 2025