Skip to content

On-premise users: click in-app to access the full platform documentation for your version of DataRobot.

Evaluate experiments

From the Leaderboard, you can click a model to access insights for further exploration. These tools can help you determine what to do in your next experiment.

Model insights

Model insights help to interpret, explain, and validate what drives a model’s predictions. Available insights are dependent on experiment type, but may include the insights listed in the table below. Availability of sliced insights is also model-dependent.

Availability information

  • Sliced insights in Workbench are off by default. Contact your DataRobot representative or administrator for information on enabling the feature. Feature flag: Slices in Workbench
  • Feature Effects is off by default. Contact your DataRobot representative or administrator for information on enabling it. Feature flag: Slices in Workbench
Insight Description Problem type Sliced insights?
Accuracy Over Time Visualizes how predictions change over time. Time-aware
Blueprint Provides a graphical representation of data preprocessing and parameter settings. All
Feature Effects Conveys how changes to the value of each feature change model predictions All
Feature Impact Shows which features are driving model decisions. All
Forecasting Accuracy Depicts how well a model predicts at each forecast distance in the experiment's forecast window. Time series
Forecast vs Actual Predicts multiple values for each point in time (forecast distances). Time series
Lift Chart Depicts how well a model segments the target population and how capable it is of predicting the target. All
Period Accuracy Shows model performance over periods within the training dataset. Time-aware
ROC Curve Provides tools for exploring classification, performance, and statistics related to a model. Classification
Series Insights Provides series-specific information for multiseries experiments. Time series
Stability Provides a summary of how well a model performs on different backtests. Time-aware

To see a model's insights, click on the model in the left-pane Leaderboard.

Accuracy Over Time

For time-aware experiments, Accuracy Over Time helps to visualize how predictions change over time. By default, the view shows predicted and actual vs. time values for the training and validation data of the most recent (first) backtest. This is the backtest model DataRobot uses to deploy and make predictions. (In other words, the model used to generate the error metric for the validation set.) For time series experiments, you can control the series (if applicable) and forecast distance used in the display. Note that series-based experiments are sometimes compute-on-demand, depending on projected space and memory requirements.

The visualization also has a time-aware Residuals tab that plots the difference between actual and predicted values. It helps to visualize whether there is an unexplained trend in your data that the model did not account for and how the model errors change over time. For time series experiments, you can additionally set the forecast distance used in the display.

Blueprint

Blueprints are ML pipelines containing preprocessing steps (tasks), modeling algorithms, and post-processing steps that go into building a model. The Blueprint tab provides a graphical representation of the blueprint, showing each step. Click on any task in the blueprint to see more detail, including more complete model documentation (by clicking DataRobot Model Docs from inside the blueprint’s task).

Additionally, you can access the blueprint repository from the Blueprint tab:

Coefficients

For supported models (linear and logistic regression), the Coefficients tab provides the relative effects of the 30 most important features, sorted (by default) in descending order of impact on the final prediction. Variables with a positive effect are displayed in red; variables with a negative effect are shown in blue. The Coefficients chart determines the following to help assess model results:

  • Which features were chosen to form the prediction in the particular model?
  • How important is each of these features?
  • Which features have positive impact and which have negative impact?

Note that the Coefficients tab is only available for a limited number of models because it is not always possible to derive the coefficients for complex models in short analytical form.

From the insight you can export the parameters and coefficients that DataRobot uses to generate predictions with the selected model.

Feature Effects

The Feature Effects insight shows the effect of changes in the value of each feature on model predictions—how does a model "understand" the relationship between each feature and the target? It is an on-demand feature, dependent on the Feature Impact calculation, which is prompted for when first opening the visualization. The insight is communicated in terms of partial dependence, an illustration of how changing a feature's value, while keeping all other features as they were, impacts a model's predictions.

Feature Impact

Feature Impact provides a high-level visualization that identifies which features are most strongly driving model decisions. It is available for all model types and is an on-demand feature, meaning that for all but models prepared for deployment, you must initiate a calculation to see the results.

  • Hover on feature names and bars for additional information.
  • Use Sort by to change the display to sort by impact or feature name.

You can view impact for the original features:

And impact for derived modeling data:

Forecasting Accuracy

The Forecasting Accuracy tab provides a visual indicator of how well a model predicts at each forecast distance in the experiment's forecast window. It is available for all time series experiments (both single series and multiseries). Use it to help determine, for example, how much harder it is to accurately forecast four days out as opposed to two days out. The chart depicts how accuracy changes as you move further into the future.

Forecast vs Actual

Forecast vs. Actual allows you to compare how different predictions behave from different forecast points to different times in the future. Use the chart to help answer what, for your needs, is the best distance to predict. Forecasting out only one day may provide the best results, but it may not be the most actionable for your business. Forecasting the next three days out, however, may provide relatively good accuracy and give your business time to react to the information provided. If the experiment included calendar data, those events are displayed on this chart, providing insight into the effects of those events. Note that series-based experiments are sometimes compute-on-demand, depending on projected space and memory requirements.

Lift Chart

To help visualize model effectiveness, the Lift Chart depicts how well a model segments the target population and how well the model performs for different ranges of values of the target variable.

  • Hover on any point to display the predicted and actual scores for rows in that bin.
  • Use the controls to change the criteria for the display.

Period Accuracy

Period Accuracy gives you the ability to specify which are the more important periods within your training dataset, which DataRobot can then provide aggregate accuracy metrics for and surface those results on the Leaderboard.

Period Accuracy lets you define periods within your dataset and then compare their metric scores against the metric score of the model as a whole. In other words, you can specify which are the more important periods within your training dataset, and DataRobot can then provide aggregate accuracy metrics for that period and surface those results on the Leaderboard. Periods are defined in a separate CSV file that identifies which rows to group based on the experiment’s data/time feature. Once uploaded, and with the insight calculated, DataRobot provides a table of period-based results and an “over time” histogram for each period.

ROC Curve

For classification experiments, the ROC Curve tab provides the following tools for exploring classification, performance, and statistics related to a selected model at any point on the probability scale:

Series Insights

The Series Insights tab for multiseries experiments provides series-specific information in both charted and tabular format:

To speed processing, Series Insights visualizations are initially computed for the first 1000 series (sorted by ID). You can, however, Compute accuracy scores for the remaining series data. Use the Plot distribution controls and binning to change the display.

Stability

The Stability tab provides an at-a-glance summary of how well a model performs on different backtests. It helps to measure performance and gives an indication of how long a model can be in production (how long it is "stable") before needing retraining. The values in the chart represent the validation scores for each backtest and the holdout.

Compliance documentation

DataRobot automates many critical compliance tasks associated with developing a model and, by doing so, decreases time-to-deployment in highly regulated industries. You can generate, for each model, individualized documentation to provide comprehensive guidance on what constitutes effective model risk management. Then, you can download the report as an editable Microsoft Word document (.docx). The generated report includes the appropriate level of information and transparency necessitated by regulatory compliance demands.

The model compliance report is not prescriptive in format and content, but rather serves as a template in creating sufficiently rigorous model development, implementation, and use documentation. The documentation provides evidence to show that the components of the model work as intended, the model is appropriate for its intended business purpose, and it is conceptually sound. As such, the report can help with completing the Federal Reserve System's SR 11-7: Guidance on Model Risk Management.

To generate a compliance report:

  1. Select a model from the Leaderboard.
  2. From the Model actions dropdown, select Generate compliance report.

  3. Workbench prompts for a download location and, once selected, generates the report in the background as you continue experimenting.

What's next?

After selecting a model, you can, from within the experiment:


Updated April 6, 2024