Skip to content

On-premise users: click in-app to access the full platform documentation for your version of DataRobot.

Evaluate

The Evaluate tabs provide key plots and statistics needed to judge and interpret a model’s effectiveness:

Leaderboard tab Description Source
Accuracy Over Space Provides a spatial residual mapping within an individual model. Validation, Cross-Validation, Holdout (selectable)
Accuracy over Time Visualizes how predictions change over time. Computed separately for each backtest and the Holdout fold and can be viewed in the UI. Plots can be computed on both Validation and Training data.
Advanced Tuning Visualizes how predictions change over time. Internal grid search set
Anomaly Assessment Plots data for the selected backtest and provides SHAP explanations for up to 500 anomalous points. Computed separately for each backtest and the Holdout fold and can be viewed in the UI. Plots can be computed on both Validation and Training data.
Anomaly over Time Plots how anomalies occur across the timeline of your data. Computed separately for each backtest and the Holdout fold and can be viewed in the UI. Plots can be computed on both Validation and Training data.
Confusion Matrix for multiclass projects Compares actual data values with predicted data values in multiclass projects. Validation, Cross-Validation, or Holdout (selectable). For binary classification projects, use the confusion matrix on the ROC Curve tab.
Feature Fit Removed. See Feature Effects.
Forecasting Accuracy Provides a visual indicator of how well a model predicts at each forecast distance in the project’s forecast window. Computed separately for each backtest and the Holdout fold; only the validation subset of each fold is scored. Validation predictions are filtered by the forecast distance and the metrics are computed on the filtered predictions. UI/API does not provide access to individual backtests but rather to validation (backtest 0=most recent backtest), backtesting (averaged across all backtests), and Holdout.
Forecast vs Actual Compares how different predictions behave at different forecast points to different times in the future. Computed separately for each backtest and the Holdout fold and can be viewed in the UI. Plots can be computed on both Validation and training data.
Lift Chart Depicts how well a model segments the target population and how capable it is of predicting the target. Validation, Cross-Validation, Holdout (selectable)
Period Accuracy View model performance over periods within the training dataset. Validation, Holdout (selectable). Computed separately for each backtest and Holdout.
Residuals Clearly visualizes the predictive performance and validity of a regression model. Validation, Cross-Validation, Holdout (selectable)
ROC Curve Explores classification, performance, and statistics related to a selected model at any point on the probability scale. Validation data
Series Insights (clustering) Provides information on the cluster to which each series belongs, along with series information, including rows and dates. Histograms for each cluster show the number of series, the number of total rows, and the percentage of the dataset that belongs to that cluster. Computed for each series in the clustering backtest.
Series Insights (multiseries) Provides series-specific information. Computed separately for each backtest and the Holdout fold; only the validation subset of each fold is scored. Validation predictions are filtered by the forecast distance and the metrics are computed on the filtered predictions. UI/API does not provide access to individual backtests but rather to validation (backtest 0=most recent backtest), backtesting (averaged across all backtests), and Holdout.
Stability Provides an at-a-glance summary of how well a model performs on different backtests. Computed separately for each backtest and the Holdout fold; only the validation subset of each fold is scored.
Training Dashboard Provides an understanding about training activity, per iteration, for Keras-based models. Training, but validated on an internal holdout of the training data.

Updated August 31, 2023