Confusion Matrix¶
Preview
Support for the Confusion Matrix in Workbench is on by default.
Feature flag: Unlimited multiclass
The multiclass Confusion Matrix helps evaluate model performance in multiclass experiments. "Confusion matrix" refers to how a model can confuse two or more classes by consistently mislabeling (confusing) one class as another. The matrix compares actual with predicted values, making it easy to see if any mislabeling has occurred and with which values. (There is also a confusion matrix available for binary classification experiments, which can be accessed from the ROC Curve tab.)
See the multiclass considerations.
The Confusion Matrix provides two visualizations:
 The multiclass matrix (1), which provides an overview of every class found for the selected target.
 The selected class matrix (2), which analyzes a specific class.
Both matrices compare predicted and actual class values, which are based on the results of the training data used to build the experiment, and help to illustrate mislabeling of classes. The multiclass Confusion Matrix provides an overview of every class found for the selected target, while the Selected Class Confusion Matrix analyzes a specific class. From these comparisons, you can determine how well DataRobot models are performing. Wikipedia provides thorough details for help understanding confusion matrices.
Multiclass matrix overview¶
The multiclass matrix, a heat map, provides a 10cell by 10cell overview, colorcoded by frequency of occurrences, of every class (value) that DataRobot recognized for the selected target. Some tools for working with the matrix display include a multioption toolbar (1), page scrolling (2), and two Total columns (3) to help understand the selected page in the context of the entire training set and the prevalence of a class across the dataset.
Cell ordering is dependent on the settings selected in the toolbar at the top of the insight.
Selector  Description 

Data source  Sets the partition from the training data that is used to build the matrix, with options dependent on the experiment type— validation or holdout for nontime aware, and backtest selection for OTV. 
Sort classes by  Sets the method used to sort and orient the matrix (name, frequency, scores), as well as the sort order (ascending or descending). 
Settings  Controls the representation basis (count or percentage) as well as the axis orientation. 
Export  Exports the full confusion matrix a CSV of the data, PNG of the image, or ZIP of both. The class matrix is not included. 
Sort classes options¶
The following describes the sort options:
Option  Description 

Name  Sorts alphanumerically by the name of the class found in the training data, either ascending or descending based on the Order sort option. Each name is presented on both axes. Position—vertical or horizontal—is determined by the orientation chosen in Settings. 
Frequency (class was actual)  Sorts by the number of times the given class was the predicted class. Occurrences for each class are recorded in the corresponding Total row or column. 
Frequency (class was predicted)  Sorts by the number of times the given class appeared as the actual class across the training data. Occurrences for each class are recorded in the corresponding Total row or column. 
F1 score  Provides a measure of the model's accuracy, computed based on precision and recall. 
Precision  Provides, for all the positive predictions, the percentage of cases in which the model was correct. Also know as Positive Predictive Value (PPV). 
Recall  Reports the ratio of true positives (correctly predicted as positive) to all actual positives. Also known as True Positive Rate (TPR) of sensitivity. 
Settings options¶
The settings options set how to report the instances of actual vs predicted "confusion" in each cell:
Option  Description 

Count  Reports the raw number of occurrences for the combination of actual and predicted classes. 
Percentage of actuals  Reports the percentage of rows in which the actual class appeared in a given cell, in relation to the Total count (also known as "Recall"). 
Percentage of predicted  Reports the percentage of rows in which the actual class appeared in a given cell, in relation to the Total count (also known as "Precision"). 
Orientation of actuals  Sets the axis that displays the actual values for each class. 
Understand the multiclass matrix¶
A perfect model would result in the matrix showing a diagonal line through the middle, with those cells referencing either 100% (if set to percentages) or the total number of classes (if set to count). All other cells would be empty. Because this is an unlikely outcome, use the following examples for help interpreting the matrix based on different sorting and settings. Note that:
 When using percentages as the setting, all cells, across all pages, will total to 100% in the matrix for the result you are displaying by. (If set to percentage of actual, the actual class will sum to 100%.)
 When using count, all cells, across all pages, will total to the value in Total column.
Consider the matrix below, where actuals are on the left axis, representing the actual class; predicted classes are across the top. Reading left to right tells you, "for all the rows where Actual = X
, how often did DataRobot predict each the other class?" This matrix sets the display by Count.
For this example, the model found 27 classes, reported on the axis labels (for example, "Predicted (110 of 27")).
Focusing on Emergency/Trauma = Actual
, looking across the row:
 The Total column reports that there are
4
rows with this actual class. 
The interior cells indicate that, for rows where
Actual = Emergency/Trauma
, DataRobot predicted: Emergency/Trauma
1
time (correct prediction)  Family/General Practice
1
time  Internal Medicine
2
times
 Emergency/Trauma
Now view the matrix set to Percentage of actuals, which shows the values as the raw count divided by the total:
The percentages for Emergency/Trauma = PREDICTED
do not sum to 100%. This is because the percentages are taken from actuals, not predicted.
If you change the setting to Percent of predicted
, the percentages in that column will sum to 100%.
Now consider the story that coloring tells by viewing the three settings sidebyside:
When viewing by Count, the coloring is based on the maximum value in the visible cells. This means that the most common classes will dominate over rarer classes. In the first screenshot, InternalMedicine
is the most common class in both actuals and predicted, so it is assigned the brightest cell. Predicted InternalMedicine vs Actual InternalMedicine
is the brightest, with 14 occurrences.
To understand how well the model performs perclass, set to Percentage of actuals; the coloring now reflects an absolute 0100% scale. This effectively normalizes the data, and because of that, a different story unfolds. Now, Predicted InternalMedicine vs Actual InternalMedicine
is a lot less bright because those 14 occurrences represent only 70% (14 occurrences of correct prediction divided by 20 total occurrences) of all the rows where Actual = InternalMedicine
Now consider Actual Urology vs Predicted InternalMedicine
. By Count it is colored very dark because there were only two occurrences, versus the maximum of 14 occurrences in this view—there were only two rows in total where Actual = Urology
. But looking at Percentage of actuals, the matrix (brightly) reports that in 100% of the rows DataRobot predicted Actual = InternalMedicine
.
Switching the setting (coloring) to Percentage of predicted
tells a similar story, but for the predicted classes. The third screenshot shows that Predicted InternalMedicine vs Actual InternalMedicine
that was so bright when colored by count is even darker still. Because those 14 occurrences are just 41.2% of all the rows where we predicted InternalMedicine
.
Work with the matrix¶
To work with the matrix:
 Use the arrows in the axes Predicted and Actual legends to scroll through all classes in the dataset.

Click in a row or column to highlight (with white border lines) all occurrences of that feature in the cell. The foursided cell indicates the count of times in which the actual class and predicted class are the same. Notice that the cell you click sets the select class matrix to the right.

Hover on a cell to view statistics. The values report the cell's class combination as well as the values for each option selectable from the Settings dropdown.
Selected class matrix¶
Use the selected class matrix to analyze a specific class. To select a class, either click in the full matrix or choose from the dropdown. Choosing from the dropdown updates the highlighting in the multiclass matrix to focus on the current individual selection. Changing axes in the multiclass matrix changes the layout of the selected class confusion matrix.
The selected class matrix shows:

Individual and aggregate statistics for a class—perclass performance (1).
Metric descriptions
The following provides a brief description of each metric.
Metric Description F1 Score A measure of the model's accuracy, computed based on precision and recall. Recall Also known as Sensitivity or True Positive Rate (TPR). The ratio of true positives (correctly predicted as positive) to all actual positives. Precision Also known as Positive Predictive Value (PPV). For all the positive predictions, the percentage of cases in which the model was correct. 
Percentage of actual and predicted misclassifications for a selected class (2).

An individual class confusion matrix, in the same format as the matrix available in the ROC Curve for binary projects (3).
Quadrant descriptions
The selected class Confusion Matrix is divided into four quadrants, summarized in the following table:
Quadrant Description True positive (TP) For all rows in the dataset that were actually ClassX, what percent did DataRobot correctly predict as ClassX? True negative (TN) For all rows in the dataset that were not ClassX, what percent did DataRobot correctly predict as not ClassX? False positive (FP) For all rows in the dataset that DataRobot predicted as ClassX, what percent were not ClassX? This is the sum of all incorrect predictions for the class in the full matrix. False negative (FN) For all rows in the dataset that were ClassX, what percent did DataRobot incorrectly predict as something other than ClassX?