Evaluate experiments (Experiments tab)¶
Once you start modeling, Workbench begins to construct your model Leaderboard, a list of models ranked by performance, to help with quick model evaluation. The Leaderboard provides a summary of information, including scoring information, for each model built in an experiment. From the Leaderboard, you can click a model to access visualizations for further exploration. Using these tools can help to assess what to do in your next experiment.
There are two "flavors" of Leaderboard available.
- This page describes the Experiment tab, which helps to understand and evaluate models from a single experiment.
- See also the Comparison tab page, which allows you to compare up to three models of the same type (for example, binary, regression) from any number of experiments within a single Use Case. Access the comparison tool from the tab or from the dropdown on the experiment name in the breadcrumbs.
Manage the Leaderboard¶
There are several controls available, described in the next sections, for navigating the Leaderboard.
View experiment info¶
The Data and Feature lists tabs are Preview options that are on by default.
Feature flag: Enable Data and Feature Lists tabs in Workbench
Click View experiment info to access tabs that:
- Provide summary of information about the experiment's setup.
- Display the data used to build models for the experiment.
- Show Feature lists built for the experiment and available for model training.
- Open the blueprint repository, which provides access to additional blueprints for training.
The Setup tab reports the parameters used to build the models on this Leaderboard.
|Created||A time stamp indicating the creation of the experiment as well as the user who initiated the model run.|
|Dataset||The name, number of features, and number of rows in the modeling dataset. This is the same information available from the data preview page.|
|Target||The feature selected as the basis for predictions, the resulting project type, and the optimization metric used to define how to score the experiment's models. You can change the metric the Leaderboard is sorted by, but the metric displayed in the summary is the one used for the build.|
|Partitioning||Details of the partitioning done for the experiment, either the default or modified.|
|Additional settings||Advanced settings that were configured from the Additional settings tab.|
By default, the display includes all features in the dataset. You can view analytics only for features specific to a feature list by toggling Filter by feature list and then selecting a list:
Click on the arrow or three dots next to a column name to change the sort order.
Feature lists tab¶
Click the Feature lists tab to view all feature lists associated with the experiment. The display shows both DataRobot's automatically created lists and any custom feature lists that were created prior to model training.
The following actions are available for feature lists:
|View features||Explore insights for a feature list. This selection opens the Data tab with the filter set to the selected list.|
|Edit name and description||Provides a dialog to change the list name. You cannot change these values for a DataRobot-created list.|
|Download||Downloads the features contained in that list as a CSV file.|
|Rerun modeling||Opens the Rerun modeling modal to allow selecting a new feature list and restarting Autopilot.|
Blueprints repository tab¶
The blueprint repository is a library of modeling blueprints available for a selected experiment. Blueprints illustrate the tasks used to build a model, not the model itself. Model blueprints listed in the repository have not necessarily been built yet, but could be as they are of a type that is compatible with the experiment's data and settings.
There are two ways to access the blueprint repository:
From a Leaderboard model's Blueprint tab.
Click the View experiment info link and select the Blueprint repository tab.
Filtering makes viewing and focusing on relevant models easier. Click Filter models to set the criteria for the models that Workbench displays on the Leaderboard. The choices available for each filter are dependent on the experiment and/or model type—they were used in at least one Leaderboard model—and will potentially change as models are added to the experiment. For example:
|Filter||Displays models that...|
|Labeled models||Have been assigned the listed tag, either starred models or models recommended for deployment.|
|Feature list||Were built with the selected feature list.|
|Sample size (random or stratified partitioning)||Were trained on the selected sample size.|
|Training period (date/time partitioning)||Were trained on backtests defined by the selected duration mechanism.|
|Model family||Are part of the selected model family:
Sort models by¶
By default, the Leaderboard sorts models based on the score of the validation partition, using the selected optimization metric. You can, however, use the Sort models by control to change the basis of the display parameter when evaluating models.
Note that although Workbench built the project using the most appropriate metric for your data, it computes many applicable metrics on each of the models. After the build completes, you can redisplay the Leaderboard listing based on a different metric. It will not change any values within the models, it will simply reorder the model listing based on their performance on this alternate metric.
See the page on optimization metrics for detailed information on each.
Workbench provides simple, quick shorthand controls:
|Reruns Quick mode with a different feature list. If you select a feature list that has already been run, Workbench will replace any deleted models or make no changes.|
|Duplicates the experiment, with an option to reuse just the dataset, or the dataset and settings.|
|Deletes the experiment and its models. If the experiment is being used by an application, you cannot delete it.|
|Slides the Leaderboard panel closed to make additional room for, for example, viewing insights.|
Model insights help to interpret, explain, and validate what drives a model’s predictions. Available insights are dependent on experiment type, but may include the insights listed in the predictive modeling insights table below. Availability of sliced insights is also model-dependent.
The following insights are Public Preview in Workbench.
- Sliced insights are off by default. Contact your DataRobot representative or administrator for information on enabling them. Feature flag: Slices in Workbench
- Feature Effects is off by default. Contact your DataRobot representative or administrator for information on enabling it. Feature flag: Slices in Workbench
- SHAP Prediction Explanations are on by default. Feature flag: SHAP in Workbench
|Insight||Description||Problem type||Sliced insights?|
|Blueprint||Provides a graphical representation of data preprocessing and parameter settings.||All|
|Feature Effects||Conveys how changes to the value of each feature change model predictions||All||✔|
|Feature Impact||Shows which features are driving model decisions.||All||✔|
|Lift Chart||Depicts how well a model segments the target population and how capable it is of predicting the target.||All||✔|
|Residuals||Provides scatter plots and a histogram for understanding model predictive performance and validity.||Regression||✔|
|ROC Curve||Provides tools for exploring classification, performance, and statistics related to a model.||Classification||✔|
|SHAP Prediction Explanations||Estimates how much each feature contributes to a given prediction, with values based on difference from the average.||Classification, regression|
To see a model's insights, click on the model in the left-pane Leaderboard. Note that different insights are available for time-aware experiments.
Accuracy Over Time¶
For time-aware projects, Accuracy Over Time helps to visualize how predictions change over time. By default, the view shows predicted and actual vs. time values for the training and validation data of the most recent (first) backtest. This is the backtest model DataRobot uses to deploy and make predictions. (In other words, the model used to generate the error metric for the validation set.)
The visualization also has a time-aware Residuals tab that plots the difference between actual and predicted values. It helps to visualize whether there is an unexplained trend in your data that the model did not account for and how the model errors change over time.
Blueprints are ML pipelines containing preprocessing steps (tasks), modeling algorithms, and post-processing steps that go into building a model. The Blueprint tab provides a graphical representation of the blueprint, showing each step. Click on any task in the blueprint to see more detail, including more complete model documentation (by clicking DataRobot Model Docs from inside the blueprint’s task).
Additionally, you can access the blueprint repository from the Blueprint tab:
The Feature Effects insight shows the effect of changes in the value of each feature on model predictions—how does a model "understand" the relationship between each feature and the target? It is an on-demand feature, dependent on the Feature Impact calculation, which is prompted for when first opening the visualization. The insight is communicated in terms of partial dependence, an illustration of how changing a feature's value, while keeping all other features as they were, impacts a model's predictions.
Feature Impact provides a high-level visualization that identifies which features are most strongly driving model decisions. It is available for all model types and is an on-demand feature, meaning that for all but models prepared for deployment, you must initiate a calculation to see the results.
- Hover on feature names and bars for additional information.
- Use Sort by to change the display to sort by impact or feature name.
To help visualize model effectiveness, the Lift Chart depicts how well a model segments the target population and how well the model performs for different ranges of values of the target variable.
- Hover on any point to display the predicted and actual scores for rows in that bin.
- Use the controls to change the criteria for the display.
Period Accuracy gives you the ability to specify which are the more important periods within your training dataset, which DataRobot can then provide aggregate accuracy metrics for and surface those results on the Leaderboard.
Period Accuracy lets you define periods within your dataset and then compare their metric scores against the metric score of the model as a whole. In other words, you can specify which are the more important periods within your training dataset and DataRobot can then provide aggregate accuracy metrics for that period and surface those results on the Leaderboard. Periods are defined in a separate CSV file that identifies which rows to group based on the project’s data/time feature. Once uploaded, and with the insight calculated, DataRobot provides a table of period-based results and an “over time” histogram for each period.
For regression experiments, the Residuals tab helps to clearly understand a model's predictive performance and validity. It allows you to gauge how linearly your models scale relative to the actual values of the dataset used. It provides multiple scatter plots and a histogram to assist your residual analysis:
- Predicted vs. Actual
- Residual vs. Actual
- Residual vs. Predicted
- Residuals histogram
For classification experiments, the ROC Curve tab provides the following tools for exploring classification, performance, and statistics related to a selected model at any point on the probability scale:
SHAP Prediction Explanations¶
For non-time series projects, SHAP Prediction Explanations illustrate what drives predictions on a row-by-row basis. They provide a quantitative indicator of the effect variables have on the predictions, answering why a model made a certain prediction.
SHAP Prediction Explanations estimate how much each feature contributes to a given prediction differing from the average. They are intuitive, unbounded (computed for all features), fast, and, due to the open source nature of SHAP, transparent. Not only does SHAP provide the benefit of helping you better understand model behavior—and quickly—it also allows you to easily validate if a model adheres to business rules.
Use SHAP to understand, for each model decision, which features are key. What drives a particular customer's decision to buy—age? gender? buying habits?—what is the magnitude on the decision for each factor?
The Stability tab provides an at-a-glance summary of how well a model performs on different backtests. It helps to measure performance and gives an indication of how long a model can be in production (how long it is "stable") before needing retraining. The values in the chart represent the validation scores for each backtest and the holdout.
DataRobot automates many critical compliance tasks associated with developing a model and, by doing so, decreases time-to-deployment in highly regulated industries. You can generate, for each model, individualized documentation to provide comprehensive guidance on what constitutes effective model risk management. Then, you can download the report as an editable Microsoft Word document (
.docx). The generated report includes the appropriate level of information and transparency necessitated by regulatory compliance demands.
The model compliance report is not prescriptive in format and content, but rather serves as a template in creating sufficiently rigorous model development, implementation, and use documentation. The documentation provides evidence to show that the components of the model work as intended, the model is appropriate for its intended business purpose, and it is conceptually sound. As such, the report can help with completing the Federal Reserve System's SR 11-7: Guidance on Model Risk Management.
To generate a compliance report:
- Select a model from the Leaderboard.
From the Model actions dropdown, select Generate compliance report.
Workbench prompts for a download location and, once selected, generates the report in the background as you continue experimenting.
At any point after models have been built, you can manage an individual experiment from within its Use Case. Click on the three dots to the right of the experiment name to delete it. To share the experiment, use the Use Case Manage members tool to share the experiment and other associated assets.
After selecting a model, you can, from within the experiment: