Multiclass Prediction Explanations¶
Multiclass Prediction Explanations are off by default. Contact your DataRobot representative or administrator for information on enabling the feature.
Feature flag: Enable Multiclass Prediction Explanations
Prediction Explanations provide a quantitative indicator of the effect variables have on predictions—why did a given model make a certain prediction. In DataRobot, there are two methodologies for computing Prediction Explanations: SHAP (SHapley Values) and XEMP (eXemplar-based Explanations of Model Predictions). They each have their benefits, described here.
In XEMP-based projects, one significant difference between methodologies is the ability to generate Prediction Explanations for multiclass classification projects. Explanations are available both from a Leaderboard model or from a deployment.
Explanations from the Leaderboard¶
In multiclass projects, DataRobot returns a prediction value for each class—multiclass Prediction Explanations describe why DataRobot determined that prediction value for any class request explanations for. So if you have classes
C, with values of
0.5 respectively, you can request the explanations for why DataRobot assigned class
A a prediction value of
View explanations preview¶
Access XEMP-based Prediction Explanations from a Leaderboard model's Understand > Prediction Explanations tab.
Use the Class dropdown to view training data-based explanations for the class. Each class has its own distribution chart (1) and its own set of samples (2).
Deep dive: Multiclass preview
Preview data is available for a subset of the most frequent model classes. The selection is derived from the Lift Chart distribution and typically represents the top 20 classes. Although multiclass supports an unlimited number of classes, the display supports just the 20 available in the the Lift Chart.
There are models that don't have a Lift Chart calculated. Most often this happens for slim run projects (for example, GB+ dataset sizes or multiclass projects with >10 classes) trained into validation (>64% for default parameters). In those types of cases, although the chart isn't available DataRobot can still calculate explanations. This is not unique to multiclass projects, multiclass just has additional corner cases when there can be no distribution chart for some classes—when that class is rare and wasn't present in training data, for example.
When calculating the multiclass preview, DataRobot selects a limited number of classes to display (there can be up to 1000) in support of better UX and faster calculation times. As a result, the available display is a selection of those classes that do have Lift Chart calculations (DataRobot calculates 20 classes for a multiclass model). If the model doesn't have any Lift Charts data, DataRobot selects the first 20 classes alphabetically.
You can calculate explanations either for the full training data set or for new data. The process is generally the same as for classification and regression projects, with a few multiclass-specific differences. This is because DataRobot calculates explanations separately for each class. Clicking the calculator opens a modal that controls which classes explanations are generated for:
The Classes setting controls the method for selecting which classes are used in explanation computation. The Number of classes setting configures the number of classes, for each row, DataRobot computes explanations for. For example, consider a dataset with 6 classes. Choosing Predicted data and 3 classes will generate explanations for the the three classes—of the 6—with the highest prediction values. To maximize response and readability, the maximum number of classes to compute explanations for is 10. (This is a different value than what is supported in the prediction preview chart.)
The Classes options include:
|Predicted||Selects classes based on prediction value. For each row in the prediction dataset, compute explanations for the number of classes set by the Classes value.|
|Actual||Compute explanations from classes that are known values. For each row, explain the class that is the "ground truth." This option is only available when using the training dataset.|
|List of classes||Selects specific classes from a list of classes. For each row, explain only the classes identified in the list.|
Once explanations are computed, hover on the info icon () to see a summary of the computed explanations:
Click the download icon () to export all of a dataset's predictions and corresponding explanations in CSV format. Explanations for multiclass projects contain additional fields for each explained class—a class label and a list of explanations (based on your computation settings) for each.
Consider this sample output:
- Each row has each predicted class explained (1).
- The first class column is the top predicted class.
- If you've used the List of classes option, the output shows just those classes. This is useful if you want a specific class explained, that is, are less interested in predicted values.
When a dataset shows prediction percentages that are close in value, the explanations become very important to understanding why DataRobot predicted a given class—to help understand the predicted class and the challenger class(es).
Explanations from a deployment¶
When you calculate predictions from a deployment (Deployments > Predictions > Make Predictions), DataRobot adds the Classes and Number of classes fields to the options available for non-multiclass projects: