Skip to content

Click in-app to access the full platform documentation for your version of DataRobot.

Confusion Matrix (for multiclass models)

Availability information

Availability of unlimited classes in multiclass projects is dependent on your pricing plan. If it is not enabled for your organization, class limit is set to 100. Contact your DataRobot representative to increase this limit.

For multiclass models, DataRobot provides a multiclass confusion matrix to help evaluate model performance. The confusion matrix compares actual data values with predicted data values, making it easy to see if any mislabeling has occurred and with which values.

Background

In general, there are two types of prediction problems—regression and classification. Regression problems predict continuous values (1.7, 6, 9.8…). Classification problems, by contrast, classify values into discrete, final outputs or classes (buy, sell, hold...).

Classification can be broken down into binary and multiclass problems. In a binary classification problem, there are only two possible classes. Some examples include predicting whether or not a customer will pay their bill on time (yes or no) or if a patient will be readmitted to the hospital (true or false).

Multiclass classification problems, on the other hand, answer questions that have more than two possible outcomes (classes). For example, which of five competitors will a customer turn to (instead of simply whether or not they are likely to make a purchase). Or, to which department should a call be routed (instead of simply whether or not someone is likely to make a call)? With additional class options for multiclass classification problems, you can ask more “which one” questions, which result in more nuanced models and solutions.

Depending on the number of values for a given target feature, DataRobot automatically determines the project type and whether a project is standard, extended, or unlimited multiclass. The following table describes how DataRobot assigns a default problem type for numeric and non-numeric target data types:

Target data type Number of unique target values Default problem type Use multiclass?
Numeric 3-10 Regression Yes, optional
Numeric > 10 Regression Yes, optional (extended multiclass)
Non-numeric 2 Binary No
Non-numeric 3-100 Multiclass Yes, automatic
Non-numeric, numeric 100+ Unlimited multiclass Yes, automatic, if enabled

Build multiclass models

Multiclass modeling uses the same general model building workflow as binary or regression projects.

  1. Import a dataset and specify a target.
  2. Change regression project to multiclass, if applicable.
  3. For unlimited multiclass projects with more than 1000 classes, you can modify the aggregation settings. Otherwise, DataRobot, by default, will keep the top 999 most frequent classes and aggregate the remainder into a single "other" bucket.
  4. Use the Confusion Matrix to evaluate model performance.

Change regression projects to multiclass

Once you enter a target feature, DataRobot classifies the project type and indicates the default with a tag next to the target feature:

If the project is classified as regression, and eligible for multiclass conversion, DataRobot provides a Switch To Classification link below the target entry box. Clicking the link changes the project to a classification project (values are interpreted as classes instead of continuous values). If the number of unique values falls outside the allowable range, the Switch To Classification link is not available.

Tip

Whether a project is considered "eligible for multiclass" is dependent on settings. If unlimited multiclass is enabled, all projects can be converted. Without unlimited multiclass, you can convert from numeric to multiclass when there are up to 100 unique numeric values.

Click Switch To Regression to switch the project type from classification back to the default regression setting.

With the training method set, verify or change the metric, choose a modeling mode, and click Start.

Unlimited multiclass

If enabled for your organization, unlimited multiclass is available to handle projects with a target feature containing more than 100 classes. For projects that contain a target with more than 1000 classes, DataRobot employs multiclass aggregation to bring the modeling class number to 1000.

Set unlimited multiclass aggregation

To support more than 1000 classes, DataRobot automatically aggregates classes, based on frequency, to 1000 unique labels. You can, however, configure the aggregation parameters to ensure all classes necessary to your project are represented.

DataRobot handles the breakdown based on the number of classes detected:

  • If 101-1000 classes, modeling continues as usual.
  • If 1000 or more classes, a warning appears below the target entry field:

If this warning appears, you can allow DataRobot to handle the aggregation. In this case, there will be 999 classes—the 999 classes with the most frequency. All other classes are binned into a 1000th class—"other." You can, however, configure the aggregation settings.

Note

Aggregation settings are also available for multiclass projects with fewer than 1000 classes.

Configure aggregation

To configure aggregation settings, click the Show advanced options link and select Feature Constraints. Scroll to the section Aggregate target classes.

The following table describes each field:

Element Description
Aggregate target classes Enables the aggregation functionality. When more than 1000 classes are detected, the selection is on and cannot be changed. If fewer than 1000 classes, the selection is off by default but can be enabled.
Name of aggregation class Sets the name of the "other" bin—the bin containing all classes that do not fall within the configuration set for this aggregation plan. It represents all the rows for the excluded values in the dataset. The provided name must differ from all existing target values in the column.
Min frequency for non-aggregate classes Sets the minimum occurrence of rows belonging to a class that is required to avoid being bucketed in the "other" bin. That is, classes with fewer instances will be collapsed into a class.
Max number of non-aggregate classes Sets the final number of classes after aggregation. The last class being the "other" bin. (For example, if you enter 900, there will be 899 class bins from your data and 1 "other" bin of aggregated classes.) Enter a value between 3-1000.
Classes to be excluded from aggregation Identifies a comma-separated list of classes that will be preserved from aggregation, ensuring the ability to predict on less frequent classes that are of interest.

Aggregation example

A dataset has the following parameters of the target column, with 8 unique values (classes).

Class Row count
A 1024
B 512
C 256
D 128
E 64
F 32
G 16
H 8

The parameters are set as follows:

Parameter Value
Name of aggregation class Other bin
Min frequency for non-aggregate classes 50
Max number of classes 5
Classes to be excluded from aggregation [E, H]

Class mapping will happen as follows:

Class Row count Impact
A 1024 None, above minimum frequency
B 512 None, above minimum frequency
C 256 None, above minimum frequency
D 128 None, above minimum frequency
E 64 None, above minimum frequency
Other bin 48 Combined rows of F and G above; did not meet minimum frequency
H 8 Excluded from aggregation

So far, the class mapping has resulted in 7 unique values (F and G dropped and replaced with an aggregated class). The "Max number of classes" parameter sets the maximum to five, requiring two more "drops." DataRobot will next drop the least frequent that are not excluded from aggregation (E and H are excluded) and so drops C and D. As a result, the final target class values distribution will be:

  • Classes A and B are most frequent.
  • Classes E and H are excluded from aggregation
  • Classes C, D, F, and G are aggregated into a single class, DR_RARE_TARGET_VALUES.
Response time when making predictions

When using unlimited multiclass, it is best to use a smaller "chunk" size when making predictions because response time grows linearly with the number of classes and number of rows in a prediction dataset.

Each class prediction can generate up to 10 digits to the right of the decimal point (0.xxxxxxxxxx). This can result, for each row, in 13-bytes per class. So, for example, a single dataset prediction for a 1000-class multiclass for 10,000 rows can yield 13B * 1000 classes * 10000 rows or roughly a 130MB response.

Changes to Feature Impact

In projects with more than 100 classes, the Feature Impact visualization charts only the aggregated feature impact, not per-class impact. This is because:

  1. Using only aggregated classes improves runtime.
  2. Given that each class instance has a comparatively low count, it makes the score less reliable than the aggregated score.

As a result, the Select Class dropdown is not available on the chart.

Confusion Matrix overview

For each classification project type, DataRobot builds a confusion matrix to help evaluate model performance. The name "confusion matrix" refers to how a model can confuse two or more classes by consistently mislabeling (confusing) one class as another. The confusion matrix compares actual data values with predicted data values, making it easy to see if any mislabeling has occurred and with which values.

A confusion matrix specific to the problem type is available for both binary classification (in the ROC Curve) and multiclass problems. To access the multiclass confusion matrix, first build your models and then select the Confusion Matrix tab from the Evaluate division.

The tab displays two confusion matrix tables for each multiclass model: the Multiclass Confusion Matrix and the Selected Class Confusion Matrix. Both matrices compare predicted and actual values for each class, which are based on the results of the training data used to build the project, and through the graphic elements illustrate mislabeling of classes. The Multiclass Confusion Matrix provides an overview of every class found for the selected target, while the Selected Class Confusion Matrix analyzes a specific class. From these comparisons, you can determine how well DataRobot models are performing.

The following describes the components available in the Confusion Matrix tab.

Option Description
Matrix Overview of every found class.
Data selection Data partition used.
Modes Modes that impact display.
Display options Menu for display options.
Matrix detail Numeric frequency details.
Class selector Individual class selector.
Selected Class Confusion Matrix Class-specific matrix.
Extended-class Confusion Matrix thumbnail Thumbnail for extended classes.

Large confusion matrix

This matrix provides an overview of every class (value) that DataRobot recognized for the selected target in the dataset. It reports class prediction results using different colored and sized circles. Color indicates prediction accuracy—green circles represent correct predictions while red circles represent incorrect predictions. The size of a circle is a visual indicator of the occurrence (based on row count) of correct and incorrect predictions (for example, the number of rows in which “product problem” was predicted but the actual value was “bad support”).

The default size of the matrix changes depending on the type of multiclass:

  • Up to 100 classes, the matrix is 10 features by 10 features.
  • More than 100 classes, the matrix is 25 features by 25 features.

Click on any of the correct predictions (green circles) in the Multiclass Confusion Matrix to view and analyze additional details for that class in the display to the right of the matrix.

Data selection

The data used to build the Multiclass Confusion Matrix is dependent on your project type and can be changed using the Data Selection dropdown. The option you choose changes the display to reflect the selected subset of the project's historical (training) data:

  • For non time-aware projects, it is sourced from the validation, cross-validation, or holdout (if unlocked) partitions

  • For time-aware projects, it is sourced from an individual backtest, all backtests, or holdout (if unlocked).

Additionally, you can add an external test dataset to help evaluate model performance.

Modes

There are three mode options—Global, Actual, and Predicted—that provide detailed information about each class within the target column. Changing the mode updates the full matrix, the selected class matrix, and the details for the selected class.

The following table describes each of the Multiclass Confusion Matrix modes.

Mode Description Hover over a cell on the matrix grid to display...
Global Provides F1 Score, Recall and Precision metrics for each selected class.
  • total row count
  • total row count compared to total row count in the selected partition (%)
Actual Provides details of the Recall score as well as a partial list of classes that the model confused with the selected class. Click Full List to see Recall score for all confused classes.*
  • total row count
  • total row count compared to the total row count of actual class values in the selected partition (%)
Predicted Provides details of the Precision score (how often the model accurately predicted the selected class). Click Full List to see Precision score for all confused classes.*
  • total row count
  • total row count compared to the total row count of predicted class values in the selected partition (%)

Clicking Full List opens the Feature Misclassification popup, which lists scores for all classes and allows you to switch between the Actual and Predicted modes.

Display options

The gear icon provides a menu of options for sorting and orienting the Multiclass Confusion matrix into different formats.

Display options include:

  • Orientation of Actuals: sets the axis (rows or columns) for the Actual values display.
  • Sort by: sets the sort order, either alphabetically, by actual or predicted frequency, or by F1 Score.
  • Order: orders the matrix display in either ascending or descending order.

For example, to view the lowest Predicted Frequency values, select the Predicted Frequency and Ascending order options to display those values at the top of the matrix.

Matrix detail

The blue bars that border the right and bottom sides of the Multiclass Confusion Matrix display numeric frequency details for each class and help determine DataRobot’s accuracy. For any class, click a bar across opposite the Actual axis to see actual frequency, or opposite the Predicted axis to see predicted frequency.

The example below reports the actual frequency for the class [50-60) of the feature age. In this case, based on the training data, there were 264 instances (at this sample size) in which the [50-60) class was the value of the target age. Those 264 rows make up 16.5% of the total dataset:

Tip

You can view frequency details for any class, regardless of which class is currently selected, by hovering over any of the blue bars.

Class selector

The dropdown selects an individual class and provides details based on the active mode.

Selected Class Confusion Matrix

The smaller matrix provides accuracy details for a a single class. Changing the mode or the selected class, whether through the dropdown or by clicking a green circle in the full matrix, dynamically updates the Selected Class Confusion Matrix. The class displayed on the Selected Class Confusion Matrix is simultaneously highlighted on the full matrix and the frequency percentages are displayed in the labeled quadrants. Hover over a circle in the matrix to view its contribution to the total number of rows in that sample (for the selected partition). The sum of rows in each quadrant equals the total dataset. For example, there are 1600 instances where Bad Support was the value of the target ChurnReasons. Hover over each quadrant to view a count of each outcome (the accuracy) of the DataRobot prediction.

The Selected Class Confusion Matrix is divided into four quadrants, summarized in the following table:

Quadrant Description
True Positive For all rows in the dataset that were actually ClassA, how many (what percent) did DataRobot correctly predict as ClassA? This quadrant is equal to the value reflected in the full matrix.
True Negative For all rows in the dataset that were not ClassA, how many (what percent) did DataRobot correctly predict as not ClassA? This quadrant is equal to the value reflected in the full matrix.
False Positive For all rows in the dataset that DataRobot predicted as ClassA, how many (what percent) were not ClassA? This is the sum of all incorrect predictions for the class in the full matrix.
False Negative For all rows in the dataset that were ClassA, how many (what percent) did DataRobot incorrectly predict as something other than ClassA? This quadrant shows the sum of all rows that should have been the selected class in the full matrix but were not.

Extended-class Confusion Matrix thumbnail

For extended-class (between 11 and 100) multiclass projects, DataRobot provides a thumbnail pagination tool to allow you a more detailed inspection of your results. The thumbnail is a smaller representation of the full multiclass matrix. The blue dots in the thumbnail indicate locations that contain the most predictions (whether classified correctly or incorrectly) and therefore might be the most interesting to investigate.

Clicking on an area in the thumbnail updates the larger matrix to display the 10x10 area surrounding your selection. The final frame (lower right corner) displays only the remaining columns beyond the last 10 boundary (for example, a dataset with 83 classes will show only three entries). The full matrix functions in the same way as the non-extended multiclass matrix described above. Statistics on each cell shown in the larger 10x10 matrix are calculated across the full confusion matrix represented by the thumbnail.

You can navigate the thumbnail either using the arrows along the outside or by clicking in a specific box; row and column numbers help identify the current matrix position:

A thumbnail displaying blue dots roughly on the diagonal from upper left to lower right potentially indicates a good model—there are many correct predictions. However, it is also possible that, because categories are not ordered, the dots indicate misses that are gathered by chance and so it is important to fully investigate each square to check performance. 


Updated November 30, 2021
Back to top