Evaluate experiments¶
Once you start modeling, Workbench begins to construct your model Leaderboard, a list of models ranked by performance, to help with quick model evaluation. The Leaderboard provides a summary of information, including scoring information, for each model built in an experiment. From the Leaderboard, you can click a model to access visualizations for further exploration. Using these tools can help to assess what to do in your next experiment.
After Workbench completes Quick mode on the 64% sample size phase, the most accurate model is selected and trained on 100% of the data. That model is marked with the Prepared for Deployment badge.
Manage the Leaderboard¶
There are several controls available, described in the next sections, for navigating the Leaderboard.
View experiment info¶
Availability information
The Data and Feature lists tabs are Preview options that are on by default.
Feature flag: Enable Data and Feature Lists tabs in Workbench
Click View experiment info to view:
- A summary of information about the experiment's setup.
- The data used to build models for the experiment.
- Feature lists built for the experiment and available for model training.
- The blueprint repository, which provides access to additional blueprints for training.
Setup tab¶
The Setup tab reports the parameters used to build the models on this Leaderboard.
Field | Reports... |
---|---|
Created | A time stamp indicating the creation of the experiment's Leaderboard as well as the user who initiated the model run. |
Dataset | The name, number of features, and number of rows in the modeling dataset. This is the same information available from the data preview page. |
Target | The feature selected as the basis for predictions, the resulting project type, and the optimization metric used to define how to score the experiment's models. You can change the metric the Leaderboard is sorted by, but the metric displayed in the summary is the one used for the build. |
Partitioning | Details of the partitioning done for the experiment, either the default or modified. |
Data tab¶
The Data tab provides summary analytics of the data used in the project. To view exploratory data insights, click the dataset preview link.
By default, the display includes all features in the dataset. You can view analytics only for features specific to a feature list by toggling Filter by feature list and then selecting a list:
Click on the arrow or three dots next to a column name to change the sort order.
Feature lists tab¶
Click the Feature lists tab to view all feature lists associated with the experiment. The display shows both DataRobot's automatically created lists and any custom feature lists that were created prior to model training.
The following actions are available for feature lists:
Action | Description |
---|---|
View features | Explore insights for a feature list. This selection opens the Data tab with the filter set to the selected list. |
Edit name and description | Provides a dialog to change the list name. You cannot change these values for a DataRobot-created list. |
Download | Downloads the features contained in that list as a CSV file. |
Rerun modeling | Opens the Rerun modeling modal to allow selecting a new feature list and restarting Autopilot. |
Blueprints repository tab¶
The blueprint repository is a library of modeling blueprints available for a selected experiment. Blueprints illustrate the tasks used to build a model, not the model itself. Model blueprints listed in the repository have not necessarily been built yet, but could be as they are of a type that is compatible with the experiment's data and settings.
There are two ways to access the blueprint repository:
-
From a Leaderboard model's Blueprint tab.
-
Click the View experiment info link and select the Blueprint repository tab.
Filter models¶
Filtering makes viewing and focusing on relevant models easier. Click Filter models to set the criteria for the models that Workbench displays on the Leaderboard. The choices available for each filter are dependent on the experiment and/or model type—they were used in at least one Leaderboard model—and will potentially change as models are added to the experiment. For example:
Filter | Displays models that... |
---|---|
Labeled models | Have been assigned the listed tag, either starred models or models recommended for deployment. |
Feature list | Were built with the selected feature list. |
Sample size | Were trained on the selected sample size. |
Model family | Are part of the selected model family:
|
Sort models by¶
By default, the Leaderboard sorts models based on the score of the validation partition, using the selected optimization metric. You can, however, use the Sort models by control to change the basis of the display parameter when evaluating models.
Note that although Workbench built the project using the most appropriate metric for your data, it computes many applicable metrics on each of the models. After the build completes, you can redisplay the Leaderboard listing based on a different metric. It will not change any values within the models, it will simply reorder the model listing based on their performance on this alternate metric.
See the page on optimization metrics for detailed information on each.
Controls¶
Workbench provides simple, quick shorthand controls:
Icon | Action |
---|---|
![]() |
Reruns Quick mode with a different feature list If you select a feature list that has already been run, Workbench will replace and deleted models or make no changes. |
![]() |
Duplicates the experiment, with an option to reuse just the dataset, or the dataset and settings. |
![]() |
Deletes the experiment and its models. If the experiment is being used by an application, you cannot delete it. |
![]() |
Slides the Leaderboard panel closed to make additional room for, for example, viewing insights. |
Insights¶
Model insights help to interpret, explain, and validate what drives a model’s predictions. Available insights are dependent on experiment type, but may include the insights listed in the table below. Availability of sliced insights is also model-dependent.
Availability information
Sliced insights slices in Workbench are off by default. Contact your DataRobot representative or administrator for information on enabling the feature.
Feature flag: Slices in Workbench
Insight | Description | Problem type | Sliced insights? |
---|---|---|---|
Accuracy Over Time | Visualizes how predictions change over time. | Time-aware | |
Blueprint | Provides a graphical representation of data preprocessing and parameter settings. | All | |
Feature Effects | Conveys how changes to the value of each feature change model predictions | All | ✔ |
Feature Impact | Shows which features are driving model decisions. | All | ✔ |
Lift Chart | Depicts how well a model segments the target population and how capable it is of predicting the target. | All | ✔ |
Period Accuracy | Shows model performance over periods within the training dataset. | Time-aware | |
Residuals | Provides scatter plots and a histogram for understanding model predictive performance and validity. | Regression | ✔ |
ROC Curve | Provides tools for exploring classification, performance, and statistics related to a model. | Classification | ✔ |
Stability | Provides a summary of how well a model performs on different backtests. | Time-aware |
To see a model's insights, click on the model in the left-pane Leaderboard.
Accuracy Over Time¶
For time-aware projects, Accuracy Over Time helps to visualize how predictions change over time. By default, the view shows predicted and actual vs. time values for the training and validation data of the most recent (first) backtest. This is the backtest model DataRobot uses to deploy and make predictions. (In other words, the model used to generate the error metric for the validation set.)
The visualization also has a time-aware Residuals tab that plots the difference between actual and predicted values. It helps to visualize whether there is an unexplained trend in your data that the model did not account for and how the model errors change over time.
Blueprint¶
Blueprints are ML pipelines containing preprocessing steps (tasks), modeling algorithms, and post-processing steps that go into building a model. The Blueprint tab provides a graphical representation of the blueprint, showing each step. Click on any task in the blueprint to see more detail, including more complete model documentation (by clicking DataRobot Model Docs from inside the blueprint’s task).
Additionally, you can access the blueprint repository from the Blueprint tab:
Feature Effects¶
The Feature Effects insight shows the effect of changes in the value of each feature on model predictions—how does a model "understand" the relationship between each feature and the target? It is an on-demand feature, dependent on the Feature Impact calculation, which is prompted for when first opening the visualization. The insight is communicated in terms of partial dependence, an illustration of how changing a feature's value, while keeping all other features as they were, impacts a model's predictions.
Feature Impact¶
Feature Impact provides a high-level visualization that identifies which features are most strongly driving model decisions. It is available for all model types and is an on-demand feature, meaning that for all but models prepared for deployment, you must initiate a calculation to see the results.
Lift Chart¶
To help visualize model effectiveness, the Lift Chart depicts how well a model segments the target population and how well the model performs for different ranges of values of the target variable.
Period Accuracy¶
Period Accuracy gives you the ability to specify which are the more important periods within your training dataset, which DataRobot can then provide aggregate accuracy metrics for and surface those results on the Leaderboard.
Period Accuracy lets you define periods within your dataset and then compare their metric scores against the metric score of the model as a whole. In other words, you can specify which are the more important periods within your training dataset and DataRobot can then provide aggregate accuracy metrics for that period and surface those results on the Leaderboard. Periods are defined in a separate CSV file that identifies which rows to group based on the project’s data/time feature. Once uploaded, and with the insight calculated, DataRobot provides a table of period-based results and an “over time” histogram for each period.
Residuals¶
For regression experiments, the Residuals tab helps to clearly understand a model's predictive performance and validity. It allows you to gauge how linearly your models scale relative to the actual values of the dataset used. It provides multiple scatter plots and a histogram to assist your residual analysis:
- Predicted vs. Actual
- Residual vs. Actual
- Residual vs. Predicted
- Residuals histogram
ROC Curve¶
For classification experiments, the ROC Curve tab provides the following tools for exploring classification, performance, and statistics related to a selected model at any point on the probability scale:
Stability¶
The Stability tab provides an at-a-glance summary of how well a model performs on different backtests. It helps to measure performance and gives an indication of how long a model can be in production (how long it is "stable") before needing retraining. The values in the chart represent the validation scores for each backtest and the holdout.
Compliance documentation¶
DataRobot automates many critical compliance tasks associated with developing a model and, by doing so, decreases time-to-deployment in highly regulated industries. You can generate, for each model, individualized documentation to provide comprehensive guidance on what constitutes effective model risk management. Then, you can download the report as an editable Microsoft Word document (.docx
). The generated report includes the appropriate level of information and transparency necessitated by regulatory compliance demands.
The model compliance report is not prescriptive in format and content, but rather serves as a guide in creating sufficiently rigorous model development, implementation, and use documentation. The documentation provides evidence to show that the components of the model work as intended, the model is appropriate for its intended business purpose, and it is conceptually sound. As such, the report can help with completing the Federal Reserve System's SR 11-7: Guidance on Model Risk Management.
To generate a compliance report:
- Select a model from the Leaderboard.
-
From the Model actions dropdown, select Generate compliance report.
-
Workbench prompts for a download location and, once selected, generates the report in the background as you continue experimenting.
Manage experiments¶
At any point after models have been built, you can manage an individual experiment from within its Use Case. Click on the three dots to the right of the experiment name to delete it. To share the experiment, use the Use Case Manage members tool to share the experiment and other associated assets.