Skip to content

Click in-app to access the full platform documentation for your version of DataRobot.

Evaluate experiments

Once you start modeling, Workbench begins to construct your model Leaderboard. The Leaderboard is a ranked list of models that provides a summary of information, including scoring information, for each model built in an experiment.

Once you start modeling, Workbench begins to construct your model Leaderboard, a list of models ranked by performance, to help with quick model evaluation. The Leaderboard provides a summary of information, including scoring information, for each model built in an experiment. From the Leaderboard, you can click a model to access visualizations for further exploration. Using these tools can help to assess what to do in your next experiment.

After Workbench completes Quick mode on the 64% sample size phase, the most accurate model is selected and trained on 100% of the data. That model is marked with the Prepared for Deployment badge.

Manage the Leaderboard

There are several controls available, described in the next sections, for navigating the Leaderboard.

View experiment info

Click View experiment info to view a summary of information about the experiment. These are the parameters used to build the models on this Leaderboard.

Field Reports...
Created A time stamp indicating the creation of the experiment's Leaderboard as well as the user who initiated the model run.
Dataset The name, number of features, and number of rows in the modeling dataset. This is the same information available from the data preview page.
Target The feature selected as the basis for predictions, the resulting project type, and the optimization metric used to define how to score the experiment's models. You can change the metric the Leaderboard is sorted by, but the metric displayed in the summary is the one used for the build.
Partitioning Details of the partitioning done for the experiment, either the default or modified.

Filter models

Filtering makes viewing and focusing on relevant models easier. Click Filter models to set the criteria for the models that Workbench displays on the Leaderboard. The choices available for each filter are dependent on the experiment and/or model type—they were used in at least one Leaderboard model—and will potentially change as models are added to the experiment. For example:

Filter Displays models that...
Labeled models Have been assigned the listed tag, either starred models or models recommended for deployment.
Feature list Were built with the selected feature list.
Sample size Were trained on the selected sample size.
Model family Are part of the selected model family:
  • GBM (Gradient Boosting Machine), such as Light Gradient Boosting on ElasticNet Predictions, eXtreme Gradient Boosted Trees Classifier
  • GLMNET (Lasso and ElasticNet regularized generalized linear models), such as Elastic-Net Classifier, Generalized Additive2
  • RI (Rule induction), such as RuleFit Classifier
  • RF (Random Forest), such as RandomForest Classifier or Regressor
  • NN (Neural Network), such as Keras

Sort models by

By default, the Leaderboard sorts models based on the score of the validation partition, using the selected optimization metric. You can, however, use the Sort models by control to change the basis of the display parameter when evaluating models.

Note that although Workbench built the project using the most appropriate metric for your data, it computes many applicable metrics on each of the models. After the build completes, you can redisplay the Leaderboard listing based on a different metric. It will not change any values within the models, it will simply reorder the model listing based on their performance on this alternate metric.

See the page on optimization metrics for detailed information on each.


Workbench provides simple, quick shorthand controls:

Icon Action
Reruns Quick mode with a different feature list If you select a feature list that has already been run, Workbench will replace and deleted models or make no changes.
Duplicates the experiment, with an option to reuse just the dataset, or the dataset and settings.
Deletes the experiment and its models. If the experiment is being used by an application, you cannot delete it.
Slides the Leaderboard panel closed to make additional room for, for example, viewing insights.


Model insights help to interpret, explain, and validate what drives a model’s predictions. Available insights are dependent on experiment type, but may include:

To see a model's insights, click on the model in the left-pane Leaderboard.

Feature Impact

Feature Impact provides a high-level visualization that identifies which features are most strongly driving model decisions. It is available for all model types and is an on-demand feature, meaning that for all but models prepared for deployment, you must initiate a calculation to see the results.

ROC Curve

For classification experiments, the ROC Curve tab provides the following tools for exploring classification, performance, and statistics related to a selected model at any point on the probability scale:

Lift Chart

To help visualize model effectiveness, the Lift Chart depicts how well a model segments the target population and how well the model performs for different ranges of values of the target variable.


For regression experiments, the Residuals tab helps to clearly understand a model's predictive performance and validity. It allows you to gauge how linearly your models scale relative to the actual values of the dataset used. It provides multiple scatter plots and a histogram to assist your residual analysis:

  • Predicted vs. Actual
  • Residual vs. Actual
  • Residual vs. Predicted
  • Residuals histogram

Accuracy Over Time

For time-aware projects, Accuracy Over Time helps to visualize how predictions change over time. By default, the view shows predicted and actual vs. time values for the training and validation data of the most recent (first) backtest. This is the backtest model DataRobot uses to deploy and make predictions. (In other words, the model used to generate the error metric for the validation set.)

The visualization also has a time-aware Residuals tab that plots the difference between actual and predicted values. It helps to visualize whether there is an unexplained trend in your data that the model did not account for and how the model errors change over time.


The Stability tab provides an at-a-glance summary of how well a model performs on different backtests. It helps to measure performance and gives an indication of how long a model can be in production (how long it is "stable") before needing retraining. The values in the chart represent the validation scores for each backtest and the holdout.

Train on new settings

Once the Leaderboard is populated, you can retrain an existing model to create a new Leaderboard model. To retrain a model:

  1. Select a model from the Leaderboard by clicking on it.
  2. Change a model characteristic by clicking the pencil icon (). You can change either the feature list or training data.

    a. Select a new feature list. You cannot change the feature list for the model prepared for deployment because it is a "frozen". (See the FAQ for a workaround to add feature lists.)

    b. Change the sample size (non-time aware) or training period (if date/time partitioning was used). The resulting window depends on the partitioning method.

Click the pencil icon to change the sample size and optionally enforce a frozen run.

Click the pencil icon to change the training period size and optionally enforce a frozen run. While you can change the training range and sampling rate, you cannot change the duration of the validation partition once models have been built.


Consider retraining your model on the most recent data before final deployment.

The New Training Period box has multiple selectors, described in the table below:

Selection Description
Frozen run toggle Freeze the run ("freeze" parameter settings from a model’s early, smaller-sized run).
Training mode Rerun the model using a different training period. Before setting this value, see the details of row count vs. duration and how they apply to different folds.
Snap to "Snap to" predefined points to facilitate entering values and avoid manually scrolling or calculation.
Enable time window sampling Train on a subset of data within a time window for a duration or start/end training mode. Check to enable and specify a percentage.
Sampling method Select the sampling method used to assign rows from the dataset.
Summary graphic View a summary of the observations and testing partitions used to build the model.
Final Model View an image that changes as you adjust the dates, reflecting the data to be used in the model you will make predictions with (see the note about final models).

Once you have set a new value, click Train new models. DataRobot builds the new model and displays it on the Leaderboard.

Compliance documentation

DataRobot automates many critical compliance tasks associated with developing a model and, by doing so, decreases time-to-deployment in highly regulated industries. You can generate, for each model, individualized documentation to provide comprehensive guidance on what constitutes effective model risk management. Then, you can download the report as an editable Microsoft Word document (.docx). The generated report includes the appropriate level of information and transparency necessitated by regulatory compliance demands.

The model compliance report is not prescriptive in format and content, but rather serves as a guide in creating sufficiently rigorous model development, implementation, and use documentation. The documentation provides evidence to show that the components of the model work as intended, the model is appropriate for its intended business purpose, and it is conceptually sound. As such, the report can help with completing the Federal Reserve System's SR 11-7: Guidance on Model Risk Management.

To generate a compliance report:

  1. Select a model from the Leaderboard.
  2. From the Model actions dropdown, select Generate compliance report.

  3. Workbench prompts for a download location and, once selected, generates the report in the background as you continue experimenting.

Manage experiments

At any point after models have been built, you can manage an individual experiment from within its Use Case. Click on the three dots to the right of the experiment name to delete it. To share the experiment, use the Use Case Manage members tool to share the experiment and other associated assets.

Updated May 26, 2023
Back to top