Evaluate experiments¶
Once you start modeling, Workbench begins to construct your model Leaderboard, a list of models ranked by performance, to help with quick model evaluation. The Leaderboard provides a summary, including scoring information, for each model built in an experiment. From the Leaderboard, you can click a model to access visualizations for further exploration. These tools can help you determine what to do in your next experiment.
After Workbench completes Quick mode, the most accurate model is selected and trained on 100% of the data. That model is marked with the Prepared for Deployment badge.
Manage the Leaderboard¶
There are several controls available, described in the following sections, for navigating the Leaderboard and exploring models:
View experiment info¶
Availability information
The Data and Feature lists tabs are Preview options that are on by default.
Feature flag: Enable Data and Feature Lists tabs in Workbench
Click View experiment info to access the following tabs:
Tab | Description |
---|---|
Setup | Provides summary of information about the experiment. |
Original data | Displays the data provided for model training, filterable by the contents of a selected feature list. |
Derived modeling data | Displays the data used for model training, after the feature derivation process was applied, filterable by the contents of any applicable feature list. |
Feature lists | Displays feature lists associated with the dataset used in the experiment; all lists are comprised exclusively of derived modeling data. |
Blueprint repository | Provides access to additional blueprints for training. |
Setup tab¶
The Setup tab reports the parameters used to build the models on this Leaderboard.
Field | Reports... |
---|---|
Created | A time stamp indicating the creation of the experiment as well as the user who initiated the model run. |
Dataset | The name, number of features, and number of rows in the modeling dataset. This is the same information available from the data preview page. |
Target | The feature selected as the basis for predictions, the resulting experiment type, and the optimization metric used to define how to score the experiment's models. You can change the metric the Leaderboard is sorted by, but the metric displayed in the summary is the one used for the build. |
Partitioning | Details of the date/time partitioning done for the experiment, including the ordering feature, backtest setting, and sampling method. It also provides a backtest summary and partitioning log, described in more detail below. |
Time series modeling | Details of the time series modeling setup including ordering, series, excluded, and known in advance features, as well as window settings and events calendar information. |
Additional settings | Advanced settings that were configured from the Additional settings tab. |
Partitioning details¶
The partitioning section's backtest summary and derivation log provide insight into how DataRobot handled the data in preparation for modeling.
-
Click View backtests for a visual summary of the observations and testing partitions used to build the model. You cannot change the partitions from this window, but you can make modifications to retrain the model from the Model Overview.
-
Click View partitioning log to see (and optionally export) a record of how DataRobot created backtests, assigned data based on window settings, and applied other setup criteria.
Original data tab¶
The Original data tab provides summary analytics of the data used in the experiment setup. To view exploratory data insights, click the dataset preview link.
By default, the display includes all features in the dataset prior to feature derivation. You can view analytics for features specific to a single feature list by enabling Filter by feature list and selecting a list.
When viewing the list, you can click on the arrow or three dots next to a column name to change the sort order.
Derived modeling data tab¶
The Derived modeling data tab is populated after the feature derivation process runs. It shows basic data analytics as well as information that is not available in the Original data tab. Use the Filter by feature list toggle, described in the Original data section, to update the display. Values update to the selected list, as applicable.
Element | Description | |
---|---|---|
1 |
Window setting summaries | Summarizes the FDW and FW used in model training. |
2 |
Basic counts | Reports the number of features and rows in the selected list. |
3 |
View more | Shows additional time series-specific experiment setup settings. |
4 |
Preview derivation log | Shows the first 100 lines of a log that records the feature derivation and reduction process and provides an option to download the full log. |
5 |
Importance score | Indicates the degree to which a feature is correlated with the target. |
Feature lists tab¶
DataRobot automatically constructs time-aware features based on the characteristics of the data. Multiple periodicities can result in several possibilities when constructing the features and, in some cases, it is better to not transform the target by differencing. The choice that yields the optimal accuracy often depends on the data.
After constructing time series features for the data, DataRobot automatically creates multiple feature lists. (Feature lists control the subset of features that DataRobot uses to build models.) Then, at project start, DataRobot automatically runs blueprints using several feature lists, selecting the list that best suits the model type.
Click the Feature lists tab to view all feature lists associated with the experiment. The display shows both DataRobot's automatically created time series feature lists and any custom feature lists that were created prior to model training.
The following actions are available for feature lists:
Action | Description |
---|---|
View features | Displays insights for a feature list. This selection opens the Data tab with the filter set to the selected list. |
Edit name and description | Provides a dialog to change the list name. You cannot change these values for a DataRobot-created list. |
Download | Downloads the features contained in that list as a CSV file. |
Rerun modeling | Opens the Rerun modeling modal to allow selecting a new feature list and restarting Autopilot. |
Blueprints repository tab¶
The blueprint repository is a library of modeling blueprints available for a selected experiment. Blueprints illustrate the tasks used to build a model, not the model itself. Model blueprints listed in the repository have not necessarily been built yet, but could be as they are of a type that is compatible with the experiment's data and settings.
There are two ways to access the blueprint repository:
-
From a Leaderboard model's Blueprint tab.
-
Click the View experiment info link and select the Blueprint repository tab.
Filter models¶
Filtering makes viewing and focusing on relevant models easier. Click Filter models to set the criteria for the models that Workbench displays on the Leaderboard. The choices available for each filter are dependent on the experiment and/or model type—they were used in at least one Leaderboard model—and will potentially change as models are added to the experiment. For example:
Filter | Displays models that... |
---|---|
Labeled models | Have been assigned the listed tag, either starred models or models recommended for deployment. |
Feature list | Were built with the selected feature list. |
Training period | Were trained on the selected training period, either a specific duration or start/end dates. |
Model family | Are part of the selected model family. |
Sort models by¶
By default, the Leaderboard sorts models based on the score of the validation partition, using the selected optimization metric. You can, however, control the basis of the display parameter when evaluating models.
Note that although Workbench built the experiment using the most appropriate metric for your data, it computes many applicable metrics on each of the models. After the build completes, you can redisplay the Leaderboard listing based on a different metric. It doesn't change any values within the models; it simply reorders the model listing based on model performance in this alternate metric.
See the page on optimization metrics for detailed information on each.
Controls¶
Workbench provides simple, quick shorthand controls:
Icon | Action |
---|---|
![]() |
Reruns Quick mode with a different feature list. If you select a feature list that has already run, Workbench replaces the deleted models or make no changes. |
![]() |
Duplicates the experiment with an option to reuse just the dataset, or the dataset and settings. |
![]() |
Deletes the experiment and its models. If the experiment is being used by an application, you cannot delete it. |
![]() |
Slides the Leaderboard panel closed to make additional room for, for example, viewing insights. |
Model insights¶
Model insights help to interpret, explain, and validate what drives a model’s predictions. Available insights are dependent on experiment type, but may include the insights listed in the table below. Availability of sliced insights is also model-dependent.
Availability information
- Sliced insights in Workbench are off by default. Contact your DataRobot representative or administrator for information on enabling the feature. Feature flag: Slices in Workbench
- Feature Effects is off by default. Contact your DataRobot representative or administrator for information on enabling it. Feature flag: Slices in Workbench
Insight | Description | Problem type | Sliced insights? |
---|---|---|---|
Accuracy Over Time | Visualizes how predictions change over time. | Time-aware | |
Blueprint | Provides a graphical representation of data preprocessing and parameter settings. | All | |
Feature Effects | Conveys how changes to the value of each feature change model predictions | All | ✔ |
Feature Impact | Shows which features are driving model decisions. | All | ✔ |
Forecasting Accuracy | Depicts how well a model predicts at each forecast distance in the experiment's forecast window. | Time series | |
Forecast vs Actual | Predicts multiple values for each point in time (forecast distances). | Time series | |
Lift Chart | Depicts how well a model segments the target population and how capable it is of predicting the target. | All | ✔ |
Period Accuracy | Shows model performance over periods within the training dataset. | Time-aware | |
ROC Curve | Provides tools for exploring classification, performance, and statistics related to a model. | Classification | ✔ |
Series Insights | Provides series-specific information for multiseries experiments. | Time series | |
Stability | Provides a summary of how well a model performs on different backtests. | Time-aware |
To see a model's insights, click on the model in the left-pane Leaderboard.
Accuracy Over Time¶
For time-aware experiments, Accuracy Over Time helps to visualize how predictions change over time. By default, the view shows predicted and actual vs. time values for the training and validation data of the most recent (first) backtest. This is the backtest model DataRobot uses to deploy and make predictions. (In other words, the model used to generate the error metric for the validation set.) For time series experiments, you can control the series (if applicable) and forecast distance used in the display. Note that series-based experiments are sometimes compute-on-demand, depending on projected space and memory requirements.
The visualization also has a time-aware Residuals tab that plots the difference between actual and predicted values. It helps to visualize whether there is an unexplained trend in your data that the model did not account for and how the model errors change over time. For time series experiments, you can additionally set the forecast distance used in the display.
Blueprint¶
Blueprints are ML pipelines containing preprocessing steps (tasks), modeling algorithms, and post-processing steps that go into building a model. The Blueprint tab provides a graphical representation of the blueprint, showing each step. Click on any task in the blueprint to see more detail, including more complete model documentation (by clicking DataRobot Model Docs from inside the blueprint’s task).
Additionally, you can access the blueprint repository from the Blueprint tab:
Feature Effects¶
The Feature Effects insight shows the effect of changes in the value of each feature on model predictions—how does a model "understand" the relationship between each feature and the target? It is an on-demand feature, dependent on the Feature Impact calculation, which is prompted for when first opening the visualization. The insight is communicated in terms of partial dependence, an illustration of how changing a feature's value, while keeping all other features as they were, impacts a model's predictions.
Feature Impact¶
Feature Impact provides a high-level visualization that identifies which features are most strongly driving model decisions. It is available for all model types and is an on-demand feature, meaning that for all but models prepared for deployment, you must initiate a calculation to see the results.
- Hover on feature names and bars for additional information.
- Use Sort by to change the display to sort by impact or feature name.
You can view impact for the original features:
And impact for derived modeling data:
Forecasting Accuracy¶
The Forecasting Accuracy tab provides a visual indicator of how well a model predicts at each forecast distance in the experiment's forecast window. It is available for all time series experiments (both single series and multiseries). Use it to help determine, for example, how much harder it is to accurately forecast four days out as opposed to two days out. The chart depicts how accuracy changes as you move further into the future.
Forecast vs Actual¶
Forecast vs. Actual allows you to compare how different predictions behave from different forecast points to different times in the future. Use the chart to help answer what, for your needs, is the best distance to predict. Forecasting out only one day may provide the best results, but it may not be the most actionable for your business. Forecasting the next three days out, however, may provide relatively good accuracy and give your business time to react to the information provided. If the experiment included calendar data, those events are displayed on this chart, providing insight into the effects of those events. Note that series-based experiments are sometimes compute-on-demand, depending on projected space and memory requirements.
Lift Chart¶
To help visualize model effectiveness, the Lift Chart depicts how well a model segments the target population and how well the model performs for different ranges of values of the target variable.
- Hover on any point to display the predicted and actual scores for rows in that bin.
- Use the controls to change the criteria for the display.
Period Accuracy¶
Period Accuracy gives you the ability to specify which are the more important periods within your training dataset, which DataRobot can then provide aggregate accuracy metrics for and surface those results on the Leaderboard.
Period Accuracy lets you define periods within your dataset and then compare their metric scores against the metric score of the model as a whole. In other words, you can specify which are the more important periods within your training dataset, and DataRobot can then provide aggregate accuracy metrics for that period and surface those results on the Leaderboard. Periods are defined in a separate CSV file that identifies which rows to group based on the experiment’s data/time feature. Once uploaded, and with the insight calculated, DataRobot provides a table of period-based results and an “over time” histogram for each period.
ROC Curve¶
For classification experiments, the ROC Curve tab provides the following tools for exploring classification, performance, and statistics related to a selected model at any point on the probability scale:
Series Insights¶
The Series Insights tab for multiseries experiments provides series-specific information in both charted and tabular format:
To speed processing, Series Insights visualizations are initially computed for the first 1000 series (sorted by ID). You can, however, Compute accuracy scores for the remaining series data. Use the Plot distribution controls and binning to change the display.
Stability¶
The Stability tab provides an at-a-glance summary of how well a model performs on different backtests. It helps to measure performance and gives an indication of how long a model can be in production (how long it is "stable") before needing retraining. The values in the chart represent the validation scores for each backtest and the holdout.
Compliance documentation¶
DataRobot automates many critical compliance tasks associated with developing a model and, by doing so, decreases time-to-deployment in highly regulated industries. You can generate, for each model, individualized documentation to provide comprehensive guidance on what constitutes effective model risk management. Then, you can download the report as an editable Microsoft Word document (.docx
). The generated report includes the appropriate level of information and transparency necessitated by regulatory compliance demands.
The model compliance report is not prescriptive in format and content, but rather serves as a template in creating sufficiently rigorous model development, implementation, and use documentation. The documentation provides evidence to show that the components of the model work as intended, the model is appropriate for its intended business purpose, and it is conceptually sound. As such, the report can help with completing the Federal Reserve System's SR 11-7: Guidance on Model Risk Management.
To generate a compliance report:
- Select a model from the Leaderboard.
-
From the Model actions dropdown, select Generate compliance report.
-
Workbench prompts for a download location and, once selected, generates the report in the background as you continue experimenting.
Manage experiments¶
At any point after models have been built, you can manage an individual experiment from within its Use Case. Click on the three dots to the right of the experiment name to delete it. To share the experiment, use the Use Case Manage members tool to share the experiment and other associated assets.
What's next?¶
After selecting a model, you can, from within the experiment: