Skip to content

On-premise users: click in-app to access the full platform documentation for your version of DataRobot.

Manage the Leaderboard

Once you start modeling, Workbench begins to construct your model Leaderboard, a list of models ranked by performance, to help with quick model evaluation. The Leaderboard provides a summary of information, including scoring information, for each model built in an experiment. From the Leaderboard, you can click a model to access visualizations for further exploration. Using these tools can help to assess what to do in your next experiment.

DataRobot populates the Leaderboard as it builds, initially displaying up to 50 models. Click Load more models to load 50 more models with each click.

After Workbench completes Quick mode on the 64% sample size phase, the most accurate model is selected and trained on 100% of the data. That model is marked with the Prepared for Deployment badge.

Why isn't the prepared for deployment model at the top of the Leaderboard?

When Workbench prepares a model for deployment, it trains the model on 100% of the data. While the most accurate was selected to be prepared, it was selected based on a 64% sample size. As a part of preparing the most accurate model for deployment, Workbench unlocks Holdout, resulting in the prepared model being trained on different data from the original. If you do not change the Leaderboard to sort by Holdout, the validation score in the left bar can make it appear as if the prepared model is not the most accurate.

This page describes the summary information available for models and experiments as well as the controls available for working with the Leaderboard model listing:

Models are also available for comparison and further investigation. Click on any model to access two "flavors" of Leaderboard available, described on the following pages:

Tab Description
Experiment Provides insights to understand and evaluate models from a single experiment.
Comparison Allows you to compare up to three models of the same type (for example, binary, regression) from any number of experiments within a single Use Case. Access the comparison tool from the tab or from the dropdown on the experiment name in the breadcrumbs.

Model information

As soon as a model completes, you can select it from the Leaderboard listing to open the Model Overview where you can:

  • See specific details about training scores and settings.
  • Retrain models on new feature lists or sample sizes. Note that you cannot change the feature list on the model prepared for deployment as it is "frozen".
  • Access model insights.

Model build failure

If a model failed to build, you will see that in the job queue as Autopilot runs. Once it completes, the model(s) are still listed in the Leaderboard but the entry indicates the failure. Click the model to display a log of issues that resulted in failure.

Use the Delete failed model button to remove the model from the Leaderbaord.

View experiment info

Click View experiment info to open the Experiment information panel, where you can access the following tabs:

Tab Description
Setup Provides summary information about the experiment setup and configuration.
Data Displays the data available for model training, filterable by the contents of a selected feature list.
Feature lists Provides tools to view and create feature lists.
Blueprint repository Provides access to additional blueprints for training.

Setup tab

The Setup tab reports the parameters used to build the models on the Leaderboard.

Field Reports...
Created A time stamp indicating the creation date of the experiment as well as the user who initiated the model run.
Dataset The name, number of features, and number of rows in the modeling dataset. This is the same information available from the data preview page.
Target The feature selected as the basis for predictions, the resulting project type, and the optimization metric used to define how to score the experiment's models. You can change the metric the Leaderboard is sorted by, but the metric displayed in the summary is the one used for the build.
Partitioning Partitioning details for the experiment, either the default or modified.
Additional settings Advanced settings that were configured from the Additional settings tab.

Data tab

The Data tab provides a variety of information about the data that was used to build models, including:

By default, the display includes all features in the dataset. You can view analytics only for features specific to a feature list by toggling Filter by feature list and then selecting a list:

Identify target leakage

When EDA2 is calculated, DataRobot checks for target leakage, which refers to a feature whose value cannot be known at the time of prediction, leading to overly optimistic models. A badge is displayed next to these features so that you can easily identify and exclude them from any new feature lists.

Click on the arrow or Actions menu next to a column name to change the sort order.

Importance scores

The green bars displayed in the Importance column are a measure of how much a feature, by itself, is correlated with the target variable. Hover on the bar to see the exact value.

What is importance?

The Importance bars show the degree to which a feature is correlated with the target. These bars are based on "Alternating Conditional Expectations" (ACE) scores. ACE scores are capable of detecting non-linear relationships with the target, but as they are univariate, they are unable to detect interaction effects between features. Importance is calculated using an algorithm that measures the information content of the variable; this calculation is done independently for each feature in the dataset. The importance score has two components—Value and Normalized Value:

  • Value: This shows the metric score you should expect (more or less) if you build a model using only that variable. For Multiclass, Value is calculated as the weighted average from the binary univariate models for each class. For binary classification and regression, Value is calculated from a univariate model evaluated on the validation set using the selected project metric.
  • Normalized Value: Value normalized; scores up to 1 (higher scores are better). 0 means accuracy is the same as predicting the training target mean. Scores of less than 0 mean the ACE model prediction is worse than the target mean model (overfitting).

These scores represent a measure of predictive power for a simple model using only that variable to predict the target. (The score is adjusted by exposure if you set the Exposure parameter.) Scores are measured using the project's accuracy metric.

Features are ranked from most important to least important. The length of the green bar next to each feature indicates its relative importance—the amount of green in the bar compared to the total length of the bar, which shows the maximum potential feature importance (and is proportional to the Normalized Value)—the more green in the bar, the more important the feature. Hovering on the green bar shows both scores. These numbers represent the score in relation to the project metric for a model that uses only that feature (the metric selected when the project was run). Changing the metric on the Leaderboard has no effect on the tooltip scores.

Feature lists tab

Feature lists control the subset of features that DataRobot uses to build models and make predictions. They allow you to, for example, exclude features that are causing target leakage or make predictions faster by removing unimportant features.

When you select the Feature lists tab, the display shows both DataRobot's automatically created lists and any custom feature lists ("Top4" in this example).


The following actions are available for feature lists from the actions menu to the right of the Created by column:

Action Description
View features Explore insights for a feature list. This selection opens the Data tab with the filter set to the selected list.
Edit name and description (Custom lists only) Opens a dialog to change the list name and change or add a description.
Download Downloads the features contained in that list as a CSV file.
Rerun modeling Opens the Rerun modeling modal to allow selecting a new feature list, training with GPU workers, and restarting Autopilot.
Delete (Custom lists only) Permanently deletes the selected list from the experiment.

Custom feature lists can be created prior to modeling from the data explorer or after modeling from the Data or Feature lists tabs. See the custom feature list reference for information on creating new lists. Note that lists created from an experiment are:

  • Used, within an experiment, for retraining models or training new models from the blueprint repository.
  • Available only within that experiment, not across all experiments in the Use Case.
  • Not available in the data explorer.

Blueprint repository tab

The blueprint repository is a library of modeling blueprints available for a selected experiment. Blueprints illustrate the tasks used to build a model, not the model itself. Model blueprints listed in the repository have not necessarily been built yet, but could be as they are of a type that is compatible with the experiment's data and settings.

There are two ways to access the blueprint repository:

Filter models

Filtering makes viewing and focusing on relevant models easier. Click Filter models to set the criteria for the models that Workbench displays on the Leaderboard. The choices available for each filter are dependent on the experiment and/or model type—they were used in at least one Leaderboard model—and will potentially change as models are added to the experiment. For example:

Filter Displays models that...
Labeled models Have been assigned the listed tag, either starred models or models recommended for deployment.
Feature list Were built with the selected feature list.
Sample size (random or stratified partitioning) Were trained on the selected sample size.
Training period (date/time partitioning) Were trained on backtests defined by the selected duration mechanism.
Model family Are part of the selected model family:
  • GBM (Gradient Boosting Machine), such as Light Gradient Boosting on ElasticNet Predictions, eXtreme Gradient Boosted Trees Classifier
  • GLMNET (Lasso and ElasticNet regularized generalized linear models), such as Elastic-Net Classifier, Generalized Additive2
  • RI (Rule induction), such as RuleFit Classifier
  • RF (Random Forest), such as RandomForest Classifier or Regressor
  • NN (Neural Network), such as Keras
Properties Were built using GPUs.

Sort models by

By default, the Leaderboard sorts models based on the score of the validation partition, using the selected optimization metric. You can, however, use the Sort models by control to change the basis of the display parameter when evaluating models.

Note that although Workbench built the project using the most appropriate metric for your data, it computes many applicable metrics on each of the models. After the build completes, you can redisplay the Leaderboard listing based on a different metric. It will not change any values within the models, it will simply reorder the model listing based on their performance on this alternate metric.

See the page on optimization metrics for detailed information on each.

Controls

Workbench provides simple, quick shorthand controls:

Icon Action
Reruns Autopilot with a different feature list, a different modeling mode, or additional automation settings (for example, GPU support) applied. If you select a feature list that has already been run, Workbench will replace any deleted models or make no changes.
Duplicates the experiment, with an option to reuse just the dataset, or the dataset and settings.
Deletes the experiment and its models. If the experiment is being used by an application, you cannot delete it.
Slides the Leaderboard panel closed to make additional room for, for example, viewing insights.

Manage experiments

At any point after models have been built, you can manage an individual experiment from within its Use Case. Click the Actions menu to the right of the experiment name to delete it. To share the experiment, use the Use Case Manage members tool to share the experiment and other associated assets.

What's next?

After reviewing experiment information, you can:


Updated August 26, 2024