Modeling > Model insights > Evaluate > Confusion Matrix (for multiclass models)

Confusion Matrix (for multiclass models)¶

Availability information

Availability of unlimited classes in multiclass projects is dependent on your DataRobot package. If it is not enabled for your organization, class limit is set to 100. Contact your DataRobot representative to increase this limit.

For multiclass models, DataRobot provides a multiclass confusion matrix to help evaluate model performance. The confusion matrix compares actual data values with predicted data values, making it easy to see if any mislabeling has occurred and with which values.

See considerations for working with multiclass models.

Background¶

In general, there are two types of prediction problems—regression and classification. Regression problems predict continuous values (1.7, 6, 9.8…). Classification problems, by contrast, classify values into discrete, final outputs or classes (buy, sell, hold...).

Classification can be broken down into binary and multiclass problems.

In a binary classification problem, there are only two possible classes. Some examples include predicting whether or not a customer will pay their bill on time (yes or no) or if a patient will be readmitted to the hospital (true or false). The model generates a predicted probability that a given observation falls into the "positive" class (readmitted=yes in the last example). By default, if the predicted probability is 50% or greater, then the predicted class is "positive."
Multiclass classification problems, on the other hand, answer questions that have more than two possible outcomes (classes). For example, which of five competitors will a customer turn to (instead of simply whether or not they are likely to make a purchase). Or, to which department should a call be routed (instead of simply whether or not someone is likely to make a call)? In this case, the model generates a predicted probability that a given observation falls into each class; the predicted class is the one with the highest predicted probability. (This is also called argmax.) With additional class options for multiclass classification problems, you can ask more “which one” questions, which result in more nuanced models and solutions.

Depending on the number of values for a given target feature, DataRobot automatically determines the project type and whether a project is standard, extended, or unlimited multiclass. The following table describes how DataRobot assigns a default problem type for numeric and non-numeric target data types:

Target data type	Number of unique target values	Default problem type	Use multiclass?
Numeric	3-10	Regression	Yes, optional
Numeric	> 10	Regression	Yes, optional (extended multiclass)
Non-numeric	2	Binary	No
Non-numeric	3-100	Multiclass	Yes, automatic
Non-numeric, numeric	100+	Unlimited multiclass	Yes, automatic, if enabled

Build multiclass models¶

Multiclass modeling uses the same general model building workflow as binary or regression projects.

Import a dataset and specify a target.
Change regression project to multiclass, if applicable.
For unlimited multiclass projects with more than 1,000 classes, you can modify the aggregation settings. Otherwise, DataRobot, by default, will keep the top 999 most frequent classes and aggregate the remainder into a single "other" bucket.
Use the Confusion Matrix to evaluate model performance.

Change regression projects to multiclass¶

Once you enter a target feature, DataRobot classifies the project type and indicates the default with a tag next to the target feature:

If the project is classified as regression, and eligible for multiclass conversion, DataRobot provides a Switch To Classification link below the target entry box. Clicking the link changes the project to a classification project (values are interpreted as classes instead of continuous values). If the number of unique values falls outside the allowable range, the Switch To Classification link is not available.

What is eligible for multiclass?

Whether a project is considered "eligible for multiclass" is dependent on settings. If unlimited multiclass is enabled, all projects can be converted. Without unlimited multiclass, you can convert from numeric to multiclass when there are up to 100 unique numeric values.

Click Switch To Regression to switch the project type from classification back to the default regression setting.

With the training method set, verify or change the metric, choose a modeling mode, and click Start.

Unlimited multiclass¶

If enabled for your organization, unlimited multiclass is available to handle projects with a target feature containing more than 100 classes. For projects that contain a target with more than 1000 classes, DataRobot employs multiclass aggregation to bring the modeling class number to 1000.

Set unlimited multiclass aggregation¶

To support more than 1000 classes, DataRobot automatically aggregates classes, based on frequency, to 1000 unique labels. You can, however, configure the aggregation parameters to ensure all classes necessary to your project are represented.

DataRobot handles the breakdown based on the number of classes detected:

If 101-1000 classes, modeling continues as usual.
If 1000 or more classes, a warning appears below the target entry field:

If this warning appears, you can allow DataRobot to handle the aggregation. In this case, there will be 999 classes—the 999 classes with the most frequency. All other classes are binned into a 1000th class—"other." You can, however, configure the aggregation settings in the Feature Constraints advanced setting. See Feature Constraints for field descriptions and an aggregation example.

Note

Aggregation settings are also available for multiclass projects with fewer than 1,000 classes.

Changes to Feature Impact¶

In projects with more than 100 classes, the Feature Impact visualization charts only the aggregated feature impact, not per-class impact. This is because:

Using only aggregated classes improves runtime.
Given that each class instance has a comparatively low count, it makes the score less reliable than the aggregated score.

As a result, the Select Class dropdown is not available on the chart.

Confusion Matrix overview¶

For each classification project type, DataRobot builds a confusion matrix to help evaluate model performance. The name "confusion matrix" refers to how a model can confuse two or more classes by consistently mislabeling (confusing) one class as another. The confusion matrix compares actual data values with predicted data values, making it easy to see if any mislabeling has occurred and with which values.

A confusion matrix specific to the problem type is available for both binary classification (in the ROC Curve) and multiclass problems. To access the multiclass confusion matrix, first build your models and then select the Confusion Matrix tab from the Evaluate division.

The tab displays two confusion matrix tables for each multiclass model: the Multiclass Confusion Matrix and the Selected Class Confusion Matrix. Both matrices compare predicted and actual values for each class, which are based on the results of the training data used to build the project, and through the graphic elements illustrate mislabeling of classes. The Multiclass Confusion Matrix provides an overview of every class found for the selected target, while the Selected Class Confusion Matrix analyzes a specific class. From these comparisons, you can determine how well DataRobot models are performing.

The following describes the components available in the Confusion Matrix tab.

	Option	Description
1	Matrix	Overview of every found class.
2	Data selection	Data partition selector.
3	Display modes	Modes that impact display.
4	Display options	Menu for display options.
5	Matrix detail	Numeric frequency details.
6	Class selector	Individual class selector.
7	Selected class confusion matrix	Class-specific matrix.
8	Extended-class confusion matrix thumbnail	Thumbnail for extended classes.

Large confusion matrix¶

This matrix provides an overview of every class (value) that DataRobot recognized for the selected target in the dataset. It reports class prediction results using different colored and sized circles. Color indicates prediction accuracy—green circles represent correct predictions while red circles represent incorrect predictions. The size of a circle is a visual indicator of the occurrence (based on row count) of correct and incorrect predictions (for example, the number of rows in which “product problem” was predicted but the actual value was “bad support”).

The default size of the matrix changes depending on the type of multiclass:

Up to 100 classes, the matrix is 10 features by 10 features.
More than 100 classes, the matrix is 25 features by 25 features.

Click on any of the correct predictions (green circles) in the Multiclass Confusion Matrix to view and analyze additional details for that class in the display to the right of the matrix.

Data selection¶

The data used to build the Multiclass Confusion Matrix is dependent on your project type and can be changed using the Data Selection dropdown. The option you choose changes the display to reflect the selected subset of the project's historical (training) data:

For non time-aware projects, it is sourced from the validation, cross-validation, or holdout (if unlocked) partitions.
For time-aware projects, it is sourced from an individual backtest, all backtests, or holdout (if unlocked).

Additionally, you can add an external test dataset to help evaluate model performance.

Modes¶

There are three mode options—Global, Actual, and Predicted—that provide detailed information about each class within the target column. Changing the mode updates the full matrix, the selected class matrix, and the details for the selected class.

The following table describes each of the Multiclass Confusion Matrix modes. See the metrics documentation or the Google developers foundation course for descriptions of Recall and Precision.

Mode	Description	Hover over a cell on the matrix grid to display...
Global	Provides F1 Score, Recall, and Precision metrics for each selected class.	Total row count Total row count compared to total row count in the selected partition (%)
Actual	Provides details of the Recall score as well as a partial list of classes that the model confused with the selected class. Click Full List to see Recall score for all confused classes.*	Total row count Total row count compared to the total row count of actual class values in the selected partition (%)
Predicted	Provides details of the Precision score (how often the model accurately predicted the selected class). Click Full List to see Precision score for all confused classes.*	Total row count Total row count compared to the total row count of predicted class values in the selected partition (%)

Clicking Full List opens the Feature Misclassification popup, which lists scores for all classes and allows you to switch between the Actual and Predicted modes.

Display options¶

The gear icon provides a menu of options for sorting and orienting the Multiclass Confusion matrix into different formats.

Display options include:

Orientation of Actuals: sets the axis (rows or columns) for the Actual values display.
Sort by: sets the sort order, either alphabetically, by actual or predicted frequency, or by F1 Score.
Order: orders the matrix display in either ascending or descending order.

For example, to view the lowest Predicted Frequency values, select the Predicted Frequency and Ascending order options to display those values at the top of the matrix.

Matrix detail¶

The blue bars that border the right and bottom sides of the Multiclass Confusion Matrix display numeric frequency details for each class and help determine DataRobot’s accuracy. For any class, click a bar opposite the Actual axis to see actual frequency or opposite the Predicted axis to see predicted frequency.

The example below reports the actual frequency for the class [50-60) of the feature age. In this case, based on the training data, there were 264 instances (at this sample size) in which the [50-60) class was the value of the target age. Those 264 rows make up 16.5% of the total dataset:

Tip

You can view frequency details for any class, regardless of which class is currently selected, by hovering over any of the blue bars.

Class selector¶

The dropdown selects an individual class and provides details based on the active mode.

Selected Class Confusion Matrix¶

The smaller matrix provides accuracy details for a single class. Changing the mode or the selected class, whether through the dropdown or by clicking a green circle in the full matrix, dynamically updates the Selected Class Confusion Matrix. The class displayed on the Selected Class Confusion Matrix is simultaneously highlighted on the full matrix and the frequency percentages are displayed in the labeled quadrants. Hover over a circle in the matrix to view its contribution to the total number of rows in that sample (for the selected partition). The sum of rows in each quadrant equals the total dataset. For example, there are 1600 instances where Bad Support was the value of the target ChurnReasons. Hover over each quadrant to view a count of each outcome (the accuracy) of the DataRobot prediction.

The Selected Class Confusion Matrix is divided into four quadrants, summarized in the following table:

Quadrant	Description
True Positive	For all rows in the dataset that were actually ClassA, how many (what percent) did DataRobot correctly predict as ClassA? This quadrant is equal to the value reflected in the full matrix.
True Negative	For all rows in the dataset that were not ClassA, how many (what percent) did DataRobot correctly predict as not ClassA? This quadrant is equal to the value reflected in the full matrix.
False Positive	For all rows in the dataset that DataRobot predicted as ClassA, how many (what percent) were not ClassA? This is the sum of all incorrect predictions for the class in the full matrix.
False Negative	For all rows in the dataset that were ClassA, how many (what percent) did DataRobot incorrectly predict as something other than ClassA? This quadrant shows the sum of all rows that should have been the selected class in the full matrix but were not.

Extended-class Confusion Matrix thumbnail¶

For extended-class (between 11 and 100) multiclass projects, DataRobot provides a thumbnail pagination tool to allow you a more detailed inspection of your results. The thumbnail is a smaller representation of the full multiclass matrix. The blue dots in the thumbnail indicate locations that contain the most predictions (whether classified correctly or incorrectly) and therefore might be the most interesting to investigate.

Clicking on an area in the thumbnail updates the larger matrix to display the 10x10 area surrounding your selection. The final frame (lower right corner) displays only the remaining columns beyond the last 10 boundary (for example, a dataset with 83 classes will show only three entries). The full matrix functions in the same way as the non-extended multiclass matrix described above. Statistics on each cell shown in the larger 10x10 matrix are calculated across the full confusion matrix represented by the thumbnail.

You can navigate the thumbnail either using the arrows along the outside or by clicking in a specific box; row and column numbers help identify the current matrix position:

A thumbnail displaying blue dots roughly on the diagonal from upper left to lower right potentially indicates a good model—there are many correct predictions. However, it is also possible that, because categories are not ordered, the dots indicate misses that are gathered by chance and so it is important to fully investigate each square to check performance.

Feature considerations¶

The following notes apply to working with multiclass models generally. These sections provide details specific to more than 10 classes:

Working with more than 11 classes
Working with more than 100 classes
If you do not have unlimited multiclass enabled, DataRobot supports up to 100 classes in multiclass projects. If you create a project with more than 100 classes, the Data page will indicate that the target is unsuitable for modeling by displaying an “Invalid target” badge next to its name.
When using the Leaderboard > Lift Chart visualization, selecting a class is not backward compatible; you must retrain any model built before the feature was introduced to see its multiclass insights.
Stratified partitioning and smart downsampling are not supported.
Exposures, offsets, and event counts are not supported.
Advanced preprocessing steps are not supported (e.g., auto-encoders, k-means, cosine similarity, credibility intervals, extra-trees-based feature selection, search for best transform, search for differences/ratios).
The following tabs and tools are not supported:
- ROC Curve
- SHAP-based Prediction Explanations (XEMP is supported)
- Rating Table
- Hotspots and Variable Effects insights
When working with the Text Mining and Word Cloud insights or data from the Coefficients tab, multiclass projects with more than 20 classes will only display insights for the 20 classes that appear the most often in the training data.
User and open-source models are not supported (and are deprecated).
The Confusion Matrix for multiclass projects that are run with slim-run (no stacked predictions) is disabled when the model was trained into Validation.
You cannot use anomaly detection with multiclass models.
Multiclass supports OTV but not time series projects.

More than 11 classes¶

The following considerations apply when your project has 11 or more classes:

Stacked predictions are disabled (if trained into Validation and/or Holdout, those scores display N/A on the Leaderboard).
ExtraTrees Classifier models have a row limit of 500K.
Maximum derived text features are set to 20,000 to prevent OOM errors on text-heavy datasets.
Some models can take significantly longer to train, depending on the dataset. On average, training time scales up with the number of classes.

More than 100 classes¶

The following considerations apply when your project has 100 or more classes:

Per-class Feature Impact is unavailable.
The Confusion Matrix in DataRobot Classic uses a 25x25 grid; Workbench uses a 10x10 grid.
The public API response for the Confusion Matrix does not include classMetrics due to response size limitations. All metrics there can be derived from the Confusion Matrix data itself.