The Leaderboard provides a wealth of summary information for each model built in a project. When models complete, DataRobot lists them on the Leaderboard with scoring and build information. The text below a model provides a brief description of the model type and version or whether it uses unaltered open-source code. Badges, tags, and columns, described below, provide quick model identifying and scoring information.
Tags and indicators¶
The following table describes the tags and indicators:
|Applicable to time series projects only. Indicates a baseline model built using the MASE metric.|
|Indicates a model from which you can export the coefficients and transformation parameters necessary to verify steps and make predictions outside of DataRobot. Blueprints that require complex preprocessing will not have the Beta tag because you can't export their preprocessing in a simple form (ridit transform for numerics, for example). Also note that when a blueprint has coefficients but is not marked with the Beta tag, it indicates that the coefficients are not exact (e.g., they may be rounded).|
|Indicates that the model had bias mitigation techniques applied. The badge is added to the top three Autopilot Leaderboard models that DataRobot automatically attempted to mitigate bias for and any models to which mitigation techniques were manually applied.|
|Displays a blueprint ID that represents an instance of a single model type (including version) and feature list. Models that share these characteristics within the same project have the same blueprint ID regardless of the sample size used to build them. Use the model ID to differentiate models when the blueprint ID is the same. Blender models indicate the blueprints used to create them (for example, BP6+17+20).|
FAST & ACCURATE
|Only projects created prior to v6.1. Indicates that this is the most accurate individual model on the Leaderboard that passes a set prediction speed guideline. If no models meet the guideline, the badge is not applied. The badge is available for OTV but not time series projects.|
|Indicates that the model was produced using the frozen run feature. The badge also indicates the sample percent of the original model.|
|Indicates that the model appears on the Insights page.|
|Displays a unique ID for each model on the Leaderboard. The model ID represents a single instance of a model type, feature list, and sample size within a single project. Use the model ID to differentiate models when the blueprint ID is the same.|
|Indicates that the model either was built with, or supports but was not built with, monotonic constraints.|
|Only projects created prior to v6.1. Indicates that, based on the validation or cross-validation results, this model is the most accurate model overall on the Leaderboard (in most cases, a blender).|
NEW SERIES OPTIMIZED
|Indicates a model that supports unseen series modeling (new series support).|
PREPARED FOR DEPLOYMENT
|Indicates that the model has been through the Autopilot recommendation stages and is ready for deployment.|
|Indicates that the model has rating tables available for download.|
RECOMMENDED FOR DEPLOYMENT
|Indicates that this is the model DataRobot recommends for deployment, based on model accuracy and complexity.|
|Indicates that the model is a reference model. A reference model uses no special preprocessing; it is a basic model that you can use to measure performance increase provided by an advanced model.|
|Indicates that the model has Scoring Code available for download.|
|Indicates that the model is the chosen segment champion in a multiseries segmented modeling project.|
|Indicates that the model was built with SHAP-based Prediction Explanations. If no badge, the model provides XEMP-based explanations.|
|Indicates that the model has been tuned.|
Upper Bound Running Time
|Indicates that the model exceeded the Upper Bound Running Time.|
* You cannot rely on blueprint or model IDs to be the same across projects. Model IDs represent the order in which the models were added to the queue when built; because different projects can have different models or a different order of models, these numbers can differ across projects. Similarly for blueprint IDs where blueprint IDs can be different based on different generated blueprints. If you want to check matching blueprints across projects, check the blueprint diagram—if the diagrams match, the blueprints are the same.
See also information on the model recommendation calculations.
In addition to the tags, DataRobot displays a badge (icon) to the left of the model name indicating the type:
- : specially tuned DataRobot implementation of a model
- : blender model
- : Eureqa model
- : Keras model
- : Light Gradient Boosting Machine model
- : Python model
- : R model
- : Spark model
- : TensorFlow model
- : XGBoost model
Text below the model provides a brief description of the model type and version, or whether it uses unaltered open source code.
Model type and performance¶
Some models sacrifice prediction speed to improve prediction accuracy. These models are best suited to batch predictions (one-time or recurring), where prediction time and reliability aren't critical factors.
Some use cases require a model to make low-latency (or real-time) predictions. For these performance-sensitive use cases, it is best to avoid deploying the following model types as they prioritize accuracy over prediction speed and prediction memory usage:
- Keras models
- Blender models (or ensemble models)
- Advanced tuned models
- Models generated using Comprehensive Autopilot mode
Columns and tools¶
Leaderboard columns give you at-a-glance information about a model's "specs":
The following table describes the Leaderboard columns and tools:
|Model Name and Description||Provides the model name (type) as well as identifiers and description.|
|Feature List||Lists the name of the Feature List used to create the model. Click the Feature List label to get a count of the number of features in the list.|
|Sample Size||Displays the sample size used to create the model. Click the Sample Size label to see the number of rows the sample size represents, or set the display to only selected sample sizes. By default, DataRobot displays all sample sizes run for a project. When a project includes an External predictions model, sample size displays N/A.|
|Validation||Displays the Validation score of the model. This is the score derived from the first cross-validation fold. Some scores may be marked with an asterisk, indicating in-sample predictions.|
|Cross-Validation||Displays the Cross-Validation score, if run. If the dataset is greater than 50,000 rows, DataRobot does not automatically start a cross-validation run. You can click the Run link to run cross-validation manually. Some scores may be marked with an asterisk, indicating in-sample predictions. If the dataset is larger than 800MB, cross-validation is not allowed.|
|Holdout||Displays a lock icon that indicates whether holdout is unlocked for the model. When unlocked, some scores may be marked with an asterisk, indicating use of in-sample predictions to derive the score.|
|Metric||Sets (and displays the selection of) an accuracy metric for the Leaderboard. Models display in order of their scoring (best to worst) for the metric chosen before the model building process. Click the orange arrow to access a dropdown that allows you to change the optimization metric.|
|Menu||Provides quick access to comparing models, adding and deleting models, and creating blender models.|
|Search||Searches for a model, as described below.|
|Add New Model||Adds a model based on specific criteria that you set from the dialog.|
|Filter||Filters by a variety of selection criteria. Alternatively, click a Leaderboard tag to filter by the selected tag.|
|Export||Allows you to download the Leaderboard's contents as a CSV file, as described below.|
Tag and filter models¶
The Leaderboard offers filtering capabilities to make viewing and focusing on relevant models easier.
Tag or "star" one or more models on the Leaderboard, making it easier to refer back to them when navigating through the application. To star a model, hold the pointer over it and a star appears, which you can then click to select:
To unselect the model, click again on the star.
Use the Filters option to only display models meeting the criteria you select.
Combine any of the filters with search filtering. First, search for a model type or blueprint number, for example, and then select Filters to find only those models of that type meeting the additional criteria.
Use Leaderboard filters¶
Use the Filters selection box to modify the Leaderboard display to match only those models matching the selected criteria. Available fields, and the settings for that field, are dependent on the project and/or model type. For example, non-date/time models offer sample size filtering while time-aware models offer training period:
Filters are inclusive. That is, results show models that match any of the filters, not all filters. Also, options available for selection only include those in which at least one model matching the criteria is on the Leaderboard.
The following table describes all available Leaderboard filters.
|Model importance||Models that are manually marked with a star on the Leaderboard.|
|Sample size||Selected sample size or N/A for External predictions models. Non time-aware only.|
|Training period||Time periods, either duration or start/end date. Time-aware only.|
|Feature list||Any feature list, manually or automatically created, that was used in at least one of the project's models.|
|Model family||Models grouped by tasks, an extended functionality of the model icon badge.|
|Model characteristics||Displayed model badges.|
|All models that have the same ID—representing an instance of a single model type (including version).|
|A single, unique ID for a model on the Leaderboard.|
|Build method||The method that added models to the Leaderboard.
Model characteristics options¶
The following list includes the model characteristics available to search on. See the table above for brief descriptions or the linked pages for complete details.
- Additional insights
- Bias mitigation
- Exportable coefficients
- External predictions
- Monotonic constraints
- New series optimized
- Rating table
- Reference model
- Scoring code
Use Repository filters¶
The Filters option is also available from the model Repository page:
The following table describes all available Repository filters.
|Blueprint characteristics||Blueprints based on the functionality they support. Options are Reference, Monotonic, Baseline, External Predictions, and SHAP.|
|Blueprint family||The mathematical technique or algorithm the blueprint uses.|
|Blueprint type||Blueprint origin, either DataRobot, Eureqa, or Composable ML.|
|Blueprint ID||Models that have the same ID—representing an instance of a single model type (including version) and feature list.|
Search the Leaderboard¶
In addition to the Filter method, the Leaderboard provides a method to further limit the display to only those models matching your search criteria.
Export the Leaderboard¶
The Leaderboard allows you to download its contents as a CSV file. To do so, click the Export button on the action bar:
Doing so prompts a preview screen:
This screen displays the Leaderboard contents (1), which you can copy, and lets you rename the .csv file (2). Note that:
- .csv is the only available file type for exporting the Leaderboard.
- Holdout scores are only included in the report if holdout has been unlocked.
Click Download to export the contents.
A blender (or ensemble) model can increase accuracy by combining the predictions of between two and eight models. Use the Create blenders from top models advanced option to enable DataRobot to add the following blenders to the Leaderboard automatically.
- Average (AVG) Blend
- Generalized Linear Model (GLM) Blend
- Elastic Net (ENET) Blend
Depending on the project's dataset, DataRobot may only run a subset of the blenders listed above.
If you did not select the Create blenders from top models option prior to model building, you can manually create blender models when Autopilot has finished.
To improve response times for blender models, DataRobot stores predictions for all models trained at the highest sample size used by Autopilot (typically 64%) and creates blenders from those results. Storing only the largest sample size (and therefore predictions from the best performing models) limits the disk space required.
DataRobot has special logic in place for natural language processing (NLP) and image fine-tuner models. For example, fine-tuners do not support stacked predictions. As a result, when blending stacked and non-stack-enabled models, the available blender methods are: AVG, MED, MIN, or MAX. DataRobot does not support other methods in this case because they may introduce target leakage.
Asterisked partitions do not apply to time series or multiseries projects.
Sometimes, the Leaderboard's Validation, Cross-Validation, or Holdout score displays an asterisk. Hover over the score for a tooltip explaining the reason for the asterisk:
The following training set percentage values are examples based on the default data partitioning settings recommended by DataRobot (without downsampling). The default data partitions are 5-fold CV with 20% holdout or, for larger datasets, TVH 16% validation and 20% holdout. If you customize the data partitioning settings, the thresholds for training into validation change. For example, if you select a 10-fold CV with 20% holdout, your maximum training set sample size will be 72%, not 64%.
By default, DataRobot uses up to 64% of the data for the training set. This is the largest sample size that does not include any data from the validation or holdout sets (16% of the data is reserved for the validation set and 20% for the holdout set). When model building finishes, you can manually train at larger sample sizes (for example, 80% or 100%). If you train above 64%, but under 80%, the model trains on data from the validation set. If you train above 80%, the model trains on data from the holdout set.
As a result, if you train above 64%, DataRobot marks the Validation score with an asterisk to indicate that some in-sample predictions were used for that score. If you train above 80%, the Holdout score is also asterisked to indicate the use of in-sample predictions to derive the score.
Sometimes, the Leaderboard's Validation, Cross-Validation, or Holdout score displays N/A instead of a score. "Not available" scores occur if your project trains models into the validation or holdout sets and meets any of the following criteria:
- The dataset exceeds 800MB resulting in a slim run project containing models that do not have stacked predictions.
- The project is date/time partitioned (both OTV and time series), and all models do not have stacked predictions.
- The project is multiclass with greater than ten classes.
- The project uses Eureqa modeling, as Eureqa models do not have stacked predictions.