Skip to content

Click in-app to access the full platform documentation for your version of DataRobot.

Leaderboard reference

The Leaderboard provides a wealth of summary information for each model built in a project. When models complete, DataRobot lists them on the Leaderboard with scoring and build information. The text below a model provides a brief description of the model type and version or whether it uses unaltered open-source code. Badges, tags, and columns, described below, provide quick model identifying and scoring information.

Tags and indicators

The following table describes the tags and indicators:

Display/name Description

BASELINE
Applicable to time series projects only. Indicates a baseline model built using the MASE metric.

Beta
Indicates a model from which you can export the coefficients and transformation parameters necessary to verify steps and make predictions outside of DataRobot. Blueprints that require complex preprocessing will not have the Beta tag because you can't export their preprocessing in a simple form (ridit transform for numerics, for example). Also note that when a blueprint has coefficients but is not marked with the Beta tag, it indicates that the coefficients are not exact (e.g., they may be rounded).

BIAS MITIGATION
Indicates that the model had bias mitigation techniques applied. The badge is added to the top three Autopilot Leaderboard models that DataRobot automatically attempted to mitigate bias for and any models to which mitigation techniques were manually applied.

BPxx
Blueprint ID
Displays a blueprint ID that represents an instance of a single model type (including version) and feature list. Models that share these characteristics have the same blueprint ID regardless of the sample size used to build them. Blender models indicate the blueprints used to create them (for example, BP6+17+20).

FAST & ACCURATE
Deprecated, applicable to projects created prior to v6.1. Indicates that this is the most accurate individual model on the Leaderboard that passes a set prediction speed guideline. If no models meet the guideline, the badge is not applied. The badge is available for OTV but not time series projects.

Frozen Run
Indicates that the model was produced using the frozen run feature. The badge also indicates the sample percent of the original model.

Insights
Indicates that the model appears on the Insights page.

Mxx
Model ID
Displays a unique ID for each model on the Leaderboard. The model ID represents a single instance of a model type, feature list, and sample size. Use the model ID to differentiate models when the blueprint ID is the same.

MONO
Indicates that the model either was built with, or supports but was not built with, monotonic constraints.

MOST ACCURATE
Deprecated, applicable to projects created prior to v6.1. Indicates that, based on the validation or cross-validation results, this model is the most accurate model overall on the Leaderboard (in most cases, a blender).

NEW SERIES OPTIMIZED
Indicates a model that supports unseen series modeling (new series support).

PREPARED FOR DEPLOYMENT
Indicates that the model has been through the Autopilot recommendation stages and is ready for deployment.

Rating tables
Indicates that the model has rating tables available for download.

RECOMMENDED FOR DEPLOYMENT
Indicates that this is the model DataRobot recommends for deployment, based on model accuracy and complexity.

REF
Indicates that the model is a reference model. A reference model uses no special preprocessing; it is a basic model that you can use to measure performance increase provided by an advanced model.

SCORING CODE
Indicates that the model has Scoring Code available for download.

SEGMENT CHAMPION
Indicates that the model is the chosen segment champion in a multiseries segmented modeling project.

SHAP
Indicates that the model was built with SHAP-based Prediction Explanations. If no badge, the model provides XEMP-based explanations.

TUNED
Indicates that the model has been tuned.

Upper Bound Running Time
Indicates that the model exceeded the Upper Bound Running Time.

See also information on the model recommendation calculations.

Model icons

In addition to the tags, DataRobot displays a badge (icon) to the left of the model name indicating the type:

  • : specially tuned DataRobot implementation of a model
  • : blender model
  • : Eureqa model
  • : Keras model
  • : Light Gradient Boosting Machine model
  • : Python model
  • : R model
  • : Spark model
  • : TensorFlow model
  • : Vowpal Wabbit (VW) model
  • : XGBoost model
  • : custom model, built with Jupyter Notebooks (deprecated)

Text below the model provides a brief description of the model type and version, or whether it uses unaltered open source code.

Model type and performance

Some models sacrifice prediction speed to improve prediction accuracy. These models are best suited to batch predictions (one-time or recurring), where prediction time and reliability aren't critical factors.

Some use cases require a model to make low-latency (or real-time) predictions. For these performance-sensitive use cases, it is best to avoid deploying the following model types as they prioritize accuracy over prediction speed and prediction memory usage:

Columns and tools

Leaderboard columns give you at-a-glance information about a model's "specs":

The following table describes the Leaderboard columns and tools:

Column Description
Model Name and Description Provides the model name (type) as well as identifiers and description.
Feature List Lists the name of the Feature List used to create the model. Click the Feature List label to get a count of the number of features in the list.
Sample Size Displays the sample size used to create the model. Click the Sample Size label to see the number of rows the sample size represents, or set the display to only selected sample sizes. By default, DataRobot displays all sample sizes run for a project. When a project includes an External predictions model, sample size displays N/A.
Validation Displays the Validation score of the model. This is the score derived from the first cross-validation fold. Some scores may be marked with an asterisk, indicating in-sample predictions.
Cross-Validation Displays the Cross-Validation score, if run. If the dataset is greater than 50,000 rows, DataRobot does not automatically start a cross-validation run. You can click the Run link to run cross-validation manually. Some scores may be marked with an asterisk, indicating in-sample predictions. If the dataset is larger than 800MB, cross-validation is not allowed.
Holdout Displays a lock icon that indicates whether holdout is unlocked for the model. When unlocked, some scores may be marked with an asterisk, indicating use of in-sample predictions to derive the score.
Metric Sets (and displays the selection of) an accuracy metric for the Leaderboard. Models display in order of their scoring (best to worst) for the metric chosen before the model building process. Click the orange arrow to access a dropdown that allows you to change the optimization metric.
Menu Provides quick access to comparing models, adding and deleting models, and creating blender models.
Search Searches for a model, as described below.
Add New Model Adds a model based on specific criteria that you set from the dialog.
Filter Filters by a variety of selection criteria. Alternatively, click a Leaderboard tag to filter by the selected tag.
Export Allows you to download the Leaderboard's contents as a CSV file, as described below.

Tag and filter models

The Leaderboard offers filtering capabilities to make viewing and focusing on relevant models easier.

  • Tag or "star" one or more models on the Leaderboard, making it easier to refer back to them when navigating through the application. To star a model, hold the pointer over it and a star appears, which you can then click to select:

    To unselect the model, click again on the star.

  • Use the Filters option to only display models meeting the criteria you select.

  • Combine any of the filters with search filtering. First, search for a model type or blueprint number, for example, and then select Filters to find only those models of that type meeting the additional criteria.

Use Leaderboard filters

Use the Filters selection box to modify the Leaderboard display to match only those models matching the selected criteria. Available fields, and the settings for that field, are dependent on the project and/or model type. For example, non-date/time models offer sample size filtering while time-aware models offer training period:

Note

Filters are inclusive. That is, results show models that match any of the filters, not all filters. Also, options available for selection only include those in which at least one model matching the criteria is on the Leaderboard.

The following table describes all available Leaderboard filters.

Tag Filters on...
Model importance Models that are manually marked with a star on the Leaderboard.
Sample size Selected sample size or N/A for External predictions models. Non time-aware only.
Training period Time periods, either duration or start/end date. Time-aware only.
Feature list Any feature list, manually or automatically created, that was used in at least one of the project's models.
Model family Models grouped by tasks, an extended functionality of the model icon badge.
Model characteristics Displayed model badges.

Blueprint ID
All models that have the same ID—representing an instance of a single model type (including version).

Model ID
A single, unique ID for a model on the Leaderboard.
Build method The method that added models to the Leaderboard.
  • Autopilot: Models created using full, Quick, or Comprehensive Autopilot.
  • Repository: Models added manually from the Repository.
  • Composable ML: Custom models built using the blueprint editor.
  • Advanced Tuning: Manually tuned models.
  • Eureqa child: Manually added to the Leaderboard via Eureqa solutions.

Model characteristics options

The following list includes the model characteristics available to search on. See the table above for brief descriptions or the linked pages for complete details.

Use Repository filters

The Filters option is also available from the model Repository page:

The following table describes all available Repository filters.

Tag Filters on...
Blueprint characteristics Blueprints based on the functionality they support. Options are Reference, Monotonic, Baseline, External Predictions, and SHAP.
Blueprint family The mathematical technique or algorithm the blueprint uses.
Blueprint type Blueprint origin, either DataRobot, Eureqa, or Composable ML.
Blueprint ID Models that have the same ID—representing an instance of a single model type (including version) and feature list.

Search the Leaderboard

In addition to the Filter method, the Leaderboard provides a method to further limit the display to only those models matching your search criteria.

Export the Leaderboard

The Leaderboard allows you to download its contents as a CSV file. To do so, click the Export button on the action bar:

Doing so prompts a preview screen:

This screen displays the Leaderboard contents (1), which you can copy, and lets you rename the .csv file (2). Note that:

  • .csv is the only available file type for exporting the Leaderboard.
  • Holdout scores are only included in the report if holdout has been unlocked.

Click Download to export the contents.

Blender models

A blender (or ensemble) model increases accuracy by combining the predictions of two or more models. DataRobot can add several regular and advanced blenders to the Leaderboard automatically.

Blender category Description
Regular blenders

Blends data from the top three Leaderboard models.

DataRobot can add the following regular models to the Leaderboard automatically:

  • Average (AVG) Blend
  • Generalized Linear Model (GLM) Blend
  • Elastic Net (ENET) Blend
Advanced blenders

A blend of the top eight Leaderboard models, using backwards stagewise selection to eliminate models when it benefits the blend's cross-validation score.

DataRobot can add the following advanced models to the Leaderboard automatically:

  • Advanced Average (AVG) Blend
  • Advanced Generalized Linear Model (GLM) Blend
  • Advanced Elastic Net (ENET) Blend

Note

Depending on the project's dataset, DataRobot may only run a subset of the blenders listed above. For example, DataRobot doesn't include Advanced GLM blenders for multiclass projects, and if the data doesn't call for more sophisticated blends, DataRobot excludes both GLM and ENET blenders.

If you want to run specific blender models that DataRobot doesn't generate automatically (e.g., TensorFlow or LightGBM), you can manually create blender models.

To improve response times for blender models, DataRobot stores predictions for all models trained at the highest sample size used by Autopilot (typically 64%) and creates blenders from those results. Storing only the largest sample size (and therefore predictions from the best performing models) limits the disk space required.

DataRobot has special logic in place for NLP and image fine-tuner models. For example, fine-tuners do not support stacked predictions. As a result, when blending stacked and non-stack-enabled models, the available blender methods are: AVG, MED, MIN, or MAX. DataRobot does not support other methods in this case because they may introduce target leakage.

Asterisked scores

Availability information

Asterisked partitions do not apply to time series or multiseries projects.

Sometimes, the Leaderboard's Validation, Cross-Validation, or Holdout score displays an asterisk. Hover over the score for a tooltip explaining the reason for the asterisk:

Note

The following training set percentage values are examples based on the default data partitioning settings recommended by DataRobot (without downsampling). The default data partitions are 5-fold CV with 20% holdout or, for larger datasets, TVH 16% validation and 20% holdout. If you customize the data partitioning settings, the thresholds for training into validation change. For example, if you select a 10-fold CV with 20% holdout, your maximum training set sample size will be 72%, not 64%.

By default, DataRobot uses up to 64% of the data for the training set. This is the largest sample size that does not include any data from the validation or holdout sets (16% of the data is reserved for the validation set and 20% for the holdout set). When model building finishes, you can manually train at larger sample sizes (for example, 80% or 100%). If you train above 64%, but under 80%, the model trains on data from the validation set. If you train above 80%, the model trains on data from the holdout set.

As a result, if you train above 64%, DataRobot marks the Validation score with an asterisk to indicate that some in-sample predictions were used for that score. If you train above 80%, the Holdout score is also asterisked to indicate the use of in-sample predictions to derive the score.

N/A scores

Sometimes, the Leaderboard's Validation, Cross-Validation, or Holdout score displays N/A instead of a score. "Not available" scores occur if your project trains models into the validation or holdout sets and meets any of the following criteria:

  • The dataset exceeds 800MB resulting in a slim run project containing models that do not have stacked predictions.
  • The project is date/time partitioned (both OTV and time series), and all models do not have stacked predictions.
  • The project is multiclass with greater than ten classes.
  • The project uses Eureqa modeling, as Eureqa models do not have stacked predictions.

Updated July 20, 2022
Back to top