Skip to content

Click in-app to access the full platform documentation for your version of DataRobot.

Champion and challenger comparisons

Availability information

MLOps challenger model comparisons is off by default. Contact your DataRobot representative or administrator for information on enabling the feature for DataRobot MLOps.

Feature flag: Enable MLOps challenger model comparisons

MLOps allows you to compare challenger models against your deployed model (the "champion") and other challenger models to ensure that your deployment uses the superior model for your needs. Depending on the results, you can replace the champion model with a better performing challenger.

The challenger model comparisons feature lets you inspect the composition, reliability, and behavior of the models. DataRobot provides comparison visualizations to help you determine which models are best suited as champion and challenger. You can train these models on different datasets. For example, for a challenger model, you can use an updated snapshot of the same data source as used for the champion. Although you can train on different datasets, DataRobot renders visualizations based on a comparison dataset you specify, so that predictions are scored on the same dataset and partition. This ensures that the models are reasonably comparable.

Generate comparisons

After you enable challengers and add one or more challengers to a deployment, you can generate in-depth comparison data and visualizations.

  1. On the Deployments page, locate and expand the deployment with the champion and challenger models you want to compare.

  2. Select the Challengers tab.

  3. On the Challengers Summary tab, add a challenger model if you have not done so yet and replay the predictions for challengers, if necessary.

  4. Select the Model Comparison tab.

    The following table describes the elements of the Model Comparison tab.

    Element Description
    Model 1 Defaults to the champion model—the currently deployed model. Click to select a different model to compare.
    Model 2 Defaults to the first challenger model in the list. Click to select a different model to compare. If the list doesn't contain a model you want to compare to Model 1, click the Challengers Summary tab to add a new challenger.
    Open model package Click to view the model's details. The details display in the Model Packages tab in the Model Registry.
    Promote to champion If the model being compared is the best model, click to replace the deployed model with this model.
    Add comparison dataset Select a dataset for generating insights on both models. Be sure to select a dataset that is out-of-sample for both models (see stacked predictions). Holdout and validation partitions for Model 1 and Model 2 are available as options if these partitions exist for the original model. By default, the holdout partition for Model 1 is selected. You can specify a different dataset by clicking + Add comparison dataset and then choosing a local file or a snapshotted dataset from the AI Catalog.
    Prediction environment Select a prediction environment for scoring both models.
    Model Insights Compare model predictions, metrics, and more.
  5. Scroll to the Model Insights section of the Challengers page and click Compute insights.

    Once generated, Model Insights displays the following tabs:

You can generate new insights using a different dataset by clicking + Add comparison dataset, then selecting Compute insights again.

Compare accuracy

After DataRobot has computed insights for the models, you can compare their accuracy.

Under Model Insights, click the Accuracy tab to compare metrics:

The two columns show the metrics for each model. Highlighted numbers represent the better values. In this example, the champion, Model 1, is better than Model 2 for most of the metrics shown.

Compare dual lift

A dual lift chart is a mechanism for visualizing how two competing models perform against each other—how they over- or under-predict along the distribution of the predictions. This might help you decide whether to promote the challenger to champion, for example, if you care more about one end of the spectrum.

To view the dual lift chart for the models being compared, under Model Insights, click the Dual lift tab:

The color of the curves on the graph match the colors assigned to the models when they were added to the deployment as champions or challengers.

Click the + icon next to the model names to hide or show the curves being compared. The o icon represents the actual values. Click the o icon to hide or show the curve representing the actual values.

Compare lift

A lift chart depicts how well a model segments the target population and how capable it is of predicting the target, letting you visualize the model's effectiveness.

To view the lift chart for the models being compared, under Model Insights, click the Lift tab:

The curves display in the colors assigned to the models when added.

Compare using an ROC curve


The ROC tab is only available for classification projects.

An ROC curve plots the true positive rate against the false positive rate for a given data source. Use the ROC curve to explore classification, performance, and statistics for the models you're comparing.

To view the ROC curves for the models being compared, under Model Insights, click the ROC tab:

The curves display in the colors assigned to the models when added. You can update the prediction thresholds for the models by clicking the pencil icons.

Compare predictions

Click the Predictions Difference tab to compare the predictions of the two models on a row-by-row basis.

The top section of the Predictions Difference tab displays a histogram that shows the percentages (and numbers of rows) of predictions that fall within a range you specify using the Prediction Match Threshold field.

The histogram displays:

  • % predictions between the +/- precision match threshold limit (shown in green)
  • % predictions more than the upper precision match threshold limit (shown in red)
  • % predictions more than the lower precision match threshold limit (shown in red)

The number of bins depends on the Prediction Match Threshold, which defaults to .0025. In this example, each bin is .005 (.0025 plus the absolute value of -.0025) which results in 6 bins.

If there are a great many matches, you can toggle Scale y-axis to ignore perfect matches to focus on mismatches.

The bottom section of the Predictions Difference tab shows the 1000 most divergent predictions (in terms of absolute value).

The Difference column shows how far apart the predictions are.

Replace champion with challenger

After comparing models, if you find a model that outperforms the deployed model, you can set it as the new champion.

  1. Evaluate the comparison model insights to determine the best performing model.

  2. If a challenger model out-performs the deployed model, click Promote to champion.

  3. Select a Replacement Reason and click Accept and Replace.

The challenger model is now the champion (deployed) model.

Updated December 11, 2021
Back to top