Skip to content

Click in-app to access the full platform documentation for your version of DataRobot.

Create and manage experiments

Experiments are the individual "projects" within a Use Case. They allow you to vary data, targets, and modeling settings to find the optimal models to solve your business problem. Within each experiment, you have access to its Leaderboard and model insights, as well as experiment summary information. After selecting a model, you can, from within the experiment:

See the associated FAQ for important additional information.

Create basic

Follow the steps below to create a new experiment from within a Use Case.

Note

You can also start modeling directly from a dataset by clicking the Start modeling button. The Set up new experiment page opens. From there, the instructions follow the flow described below.

Add experiment

From within a Use Case, click Add new and select Add experiment. The Set up new experiment page opens, which lists all data previously loaded to the Use Case.

Add data

Add data to the experiment, either by adding new data (1) or selecting a dataset that has already been loaded to the Use Case (2).

Once the data is loaded to the Use Case (as in option 2 above), click to select the dataset you want to use in the experiment. Workbench opens a preview of the data:

From here you can:

Option Description
1
Click to return to the data listing and choose a different dataset.
2
Click the icon to proceed and set the target.
3
Click Next to proceed and set the target.

Select target

Once you have proceeded to target selection, Workbench prepares the dataset for modeling (EDA 1). When the process finishes, set the target either by:

Scroll through the list of features to find your target. If it is not showing, expand the list from the bottom of the display:

Once located, click the entry in the table to use the feature as the target.

Type the name of the target feature you would like to predict in the entry box. DataRobot lists matching features as you type:

Once a target is entered, Workbench displays a histogram providing information about the target feature's distribution and, in the right pane, a summary of the experiment settings.

From here, you are ready to build models with the default settings. Or, you can modify the default settings and then begin. If using the default settings, click Start modeling to begin the Quick mode Autopilot modeling process.

Customize settings

Changing experiment parameters is a good way to iterate on a Use Case. Before starting to model, you can:

Once you have reset any or all of the above, click Start modeling to begin the Quick mode modeling process.

Modify partitioning

Partitioning describes the method DataRobot uses to “clump” observations (or rows) together for evaluation and model building. Workbench defaults to five-fold, stratified sampling with a 20% holdout fold.

Availability information

Date/time partitioning for building time-aware projects is off by default. Contact your DataRobot representative or administrator for information on enabling the feature.

Feature flag: Enable Date/Time Partitioning (OTV) in Workbench

To change the partitioning method or validation type:

  1. Click the icon for Additional settings, Next, or the Partitioning field in the summary:

  2. If there is a date feature available, your experiment is eligible for out-of-time validation partitioning, which allows DataRobot to build time-aware models. In that case, additional information becomes available in the summary.

  3. Set the fields that you want to change. The fields available depend on the selected partitioning method.

    • Random assigns observations (rows) randomly to the training, validation, and holdout sets.
    • Stratified randomly assigns rows to training, validation, and holdout sets, preserving (as close as possible to) the same ratio of values for the prediction target as in the original data.
    • Date/time assigns rows to backtests chronologically instead of, for example, randomly. This is the only valid partitioning method for time-aware projects.

  Field Description
1
Validation type Sets the method used on data to validate models.
  • Cross-validation. Separates the data into two or more “folds” and creates one model per fold, with the data assigned to that fold used for validation and the rest of the data used for training.
  • Training-validation-holdout. For larger datasets, partitions data into three distinct sections—training, validation, and holdout— with predictions based on a single pass over the data.
2
Cross-validation folds Sets the number of folds used with the cross-validation method. A higher number increases the size of training data available in each fold; consequently increasing the total training time.
3
Holdout percentage Sets the percentage of data that Workbench “hides” when training. The Leaderboard shows a Holdout value, which is calculated using the trained model's predictions on the holdout partition.

When you select Date/time, you are prompted to enter an ordering feature.—the feature used to order rows in the dataset. Click in the box to view the date/time features that DataRobot detected during EDA1. They are also listed below the box, allowing you to select the ordering feature there. If a feature is not listed, it was not detected as type date and cannot be used.

Once an ordering feature is selected, DataRobot detects and reports the date and/or time format (standard GLIBC strings) for the selected feature:

Backtest configuration becomes available. DataRobot sets defaults based on the characteristics of the dataset and can generally be left as-is—they will result in robust models.

  Field Description
1
Backtests Sets the backtesting partitions. Any changes to these values are represented in the graphics below the entry boxes.
  • Number of backtests. Sets the number of backtests for the project, which is the time-aware equivalent of cross-validation (but based on time periods or durations instead of random rows).
  • Validation length. Sets the size of the partition used for testing—data that is not part of the training set that is used to evaluate a model’s performance.
  • Gap length. Sets spaces in time, representing gaps between model training and model deployment.
2
Use equal rows per backtest Sets whether each backtest uses the same number of rows (enabled) or the same duration (disabled).
3
Partition sampling method Sets how to assign rows from the dataset, which is useful if a dataset is not distributed equally over time.
4
Partitioning log Provides a downloadable log that reports on partition creation.

You can also use the graphics below the entry boxes to edit individual backtests.

Change the configuration

You can make changes to the project's target or feature list before you begin modeling by returning to the Target page. To return, click the target icon, the Back button, or the Target field in the summary:

Change feature list

Feature lists control the subset of features that DataRobot uses to build models. Workbench defaults to the Informative Features list, but you can modify that prior to model building. To change, click on the Feature list dropdown and select a different list:

You can also change the selected list on a per-model basis once the experiment finishes building.


Updated May 26, 2023
Back to top