Skip to content

On-premise users: click in-app to access the full platform documentation for your version of DataRobot.

Confusion Matrix

Availability information

Support for the Confusion Matrix in Workbench is on by default.

Feature flag: Unlimited multiclass

The multiclass Confusion Matrix helps evaluate model performance in multiclass experiments. "Confusion matrix" refers to how a model can confuse two or more classes by consistently mislabeling (confusing) one class as another. The matrix compares actual with predicted values, making it easy to see if any mislabeling has occurred and with which values. (There is also a confusion matrix available for binary classification experiments, which can be accessed from the ROC Curve tab.)

See the multiclass considerations.

The Confusion Matrix provides two visualizations:

Both matrices compare predicted and actual class values, which are based on the results of the training data used to build the experiment, and help to illustrate mislabeling of classes. The multiclass Confusion Matrix provides an overview of every class found for the selected target, while the Selected Class Confusion Matrix analyzes a specific class. From these comparisons, you can determine how well DataRobot models are performing. Wikipedia provides thorough details for help understanding confusion matrices.

Multiclass matrix overview

The multiclass matrix, a heat map, provides a 10-cell by 10-cell overview, color-coded by frequency of occurrences, of every class (value) that DataRobot recognized for the selected target. Some tools for working with the matrix display include a multi-option toolbar (1), page scrolling (2), and two Total columns (3) to help understand the selected page in the context of the entire training set and the prevalence of a class across the dataset.

Cell ordering is dependent on the settings selected in the toolbar at the top of the insight.

Selector Description
Data source Sets the partition from the training data that is used to build the matrix, with options dependent on the experiment type— validation or holdout for non-time aware, and backtest selection for OTV.
Sort classes by Sets the method used to sort and orient the matrix (name, frequency, scores), as well as the sort order (ascending or descending).
Settings Controls the representation basis (count or percentage) as well as the axis orientation.
Export Exports the full confusion matrix a CSV of the data, PNG of the image, or ZIP of both. The class matrix is not included.

Sort classes options

The following describes the sort options:

Option Description
Name Sorts alphanumerically by the name of the class found in the training data, either ascending or descending based on the Order sort option. Each name is presented on both axes. Position—vertical or horizontal—is determined by the orientation chosen in Settings.
Frequency (class was actual) Sorts by the number of times the given class was the predicted class. Occurrences for each class are recorded in the corresponding Total row or column.
Frequency (class was predicted) Sorts by the number of times the given class appeared as the actual class across the training data. Occurrences for each class are recorded in the corresponding Total row or column.
F1 score Provides a measure of the model's accuracy, computed based on precision and recall.
Precision Provides, for all the positive predictions, the percentage of cases in which the model was correct. Also know as Positive Predictive Value (PPV).
Recall Reports the ratio of true positives (correctly predicted as positive) to all actual positives. Also known as True Positive Rate (TPR) of sensitivity.

Settings options

The settings options set how to report the instances of actual vs predicted "confusion" in each cell:

Option Description
Count Reports the raw number of occurrences for the combination of actual and predicted classes.
Percentage of actuals Reports the percentage of rows in which the actual class appeared in a given cell, in relation to the Total count (also known as "Recall").
Percentage of predicted Reports the percentage of rows in which the actual class appeared in a given cell, in relation to the Total count (also known as "Precision").
Orientation of actuals Sets the axis that displays the actual values for each class.

Understand the multiclass matrix

A perfect model would result in the matrix showing a diagonal line through the middle, with those cells referencing either 100% (if set to percentages) or the total number of classes (if set to count). All other cells would be empty. Because this is an unlikely outcome, use the following examples for help interpreting the matrix based on different sorting and settings. Note that:

  • When using percentages as the setting, all cells, across all pages, will total to 100% in the matrix for the result you are displaying by. (If set to percentage of actual, the actual class will sum to 100%.)
  • When using count, all cells, across all pages, will total to the value in Total column.

Consider the matrix below, where actuals are on the left axis, representing the actual class; predicted classes are across the top. Reading left to right tells you, "for all the rows where Actual = X, how often did DataRobot predict each the other class?" This matrix sets the display by Count.

For this example, the model found 27 classes, reported on the axis labels (for example, "Predicted (1-10 of 27")).

Focusing on Emergency/Trauma = Actual, looking across the row:

  • The Total column reports that there are 4 rows with this actual class.
  • The interior cells indicate that, for rows where Actual = Emergency/Trauma, DataRobot predicted:

    • Emergency/Trauma 1 time (correct prediction)
    • Family/General Practice 1 time
    • Internal Medicine 2 times

Now view the matrix set to Percentage of actuals, which shows the values as the raw count divided by the total:

The percentages for Emergency/Trauma = PREDICTED do not sum to 100%. This is because the percentages are taken from actuals, not predicted.

If you change the setting to Percent of predicted, the percentages in that column will sum to 100%.

Now consider the story that coloring tells by viewing the three settings side-by-side:

When viewing by Count, the coloring is based on the maximum value in the visible cells. This means that the most common classes will dominate over rarer classes. In the first screenshot, InternalMedicine is the most common class in both actuals and predicted, so it is assigned the brightest cell. Predicted InternalMedicine vs Actual InternalMedicine is the brightest, with 14 occurrences.

To understand how well the model performs per-class, set to Percentage of actuals; the coloring now reflects an absolute 0-100% scale. This effectively normalizes the data, and because of that, a different story unfolds. Now, Predicted InternalMedicine vs Actual InternalMedicine is a lot less bright because those 14 occurrences represent only 70% (14 occurrences of correct prediction divided by 20 total occurrences) of all the rows where Actual = InternalMedicine

Now consider Actual Urology vs Predicted InternalMedicine. By Count it is colored very dark because there were only two occurrences, versus the maximum of 14 occurrences in this view—there were only two rows in total where Actual = Urology. But looking at Percentage of actuals, the matrix (brightly) reports that in 100% of the rows DataRobot predicted Actual = InternalMedicine.

Switching the setting (coloring) to Percentage of predicted tells a similar story, but for the predicted classes. The third screenshot shows that Predicted InternalMedicine vs Actual InternalMedicine that was so bright when colored by count is even darker still. Because those 14 occurrences are just 41.2% of all the rows where we predicted InternalMedicine.

Work with the matrix

To work with the matrix:

  1. Use the arrows in the axes Predicted and Actual legends to scroll through all classes in the dataset.
  2. Click in a row or column to highlight (with white border lines) all occurrences of that feature in the cell. The four-sided cell indicates the count of times in which the actual class and predicted class are the same. Notice that the cell you click sets the select class matrix to the right.

  3. Hover on a cell to view statistics. The values report the cell's class combination as well as the values for each option selectable from the Settings dropdown.

Selected class matrix

Use the selected class matrix to analyze a specific class. To select a class, either click in the full matrix or choose from the dropdown. Choosing from the dropdown updates the highlighting in the multiclass matrix to focus on the current individual selection. Changing axes in the multiclass matrix changes the layout of the selected class confusion matrix.

The selected class matrix shows:

  • Individual and aggregate statistics for a class—per-class performance (1).

    Metric descriptions

    The following provides a brief description of each metric.

    Metric Description
    F1 Score A measure of the model's accuracy, computed based on precision and recall.
    Recall Also known as Sensitivity or True Positive Rate (TPR). The ratio of true positives (correctly predicted as positive) to all actual positives.
    Precision Also known as Positive Predictive Value (PPV). For all the positive predictions, the percentage of cases in which the model was correct.
  • Percentage of actual and predicted misclassifications for a selected class (2).

  • An individual class confusion matrix, in the same format as the matrix available in the ROC Curve for binary projects (3).

    Quadrant descriptions

    The selected class Confusion Matrix is divided into four quadrants, summarized in the following table:

    Quadrant Description
    True positive (TP) For all rows in the dataset that were actually ClassX, what percent did DataRobot correctly predict as ClassX?
    True negative (TN) For all rows in the dataset that were not ClassX, what percent did DataRobot correctly predict as not ClassX?
    False positive (FP) For all rows in the dataset that DataRobot predicted as ClassX, what percent were not ClassX? This is the sum of all incorrect predictions for the class in the full matrix.
    False negative (FN) For all rows in the dataset that were ClassX, what percent did DataRobot incorrectly predict as something other than ClassX?

Updated April 23, 2024