Skip to content

On-premise users: click in-app to access the full platform documentation for your version of DataRobot.

Feature Associations

The sections below include a general discussion about associations, understanding the mutual information and Cramer's V metrics, and how associations are calculated.

What are associations?

There is a lot of terminology to describe a feature pair's relationship to each other—feature associations, mutual dependence, levels of co-occurrence, and correlations (although technically this is somewhat different) to name the more common examples. The Feature Association insight is a tool to help visualize the association, both through a wide angle lens (the full matrix) and more close up (both matrix zoom and feature association pair details).

Looking at the matrix, each dot tells you, "If I know the value of one of these features, how accurate will my guess be as to the value of the other?" The metric value puts a numeric value on that answer. The closer the metric value is to 0, the more independent the features are of each other. Knowing one doesn't tell you much about the other. A score of 1, on the other hand, says that if you know X, you know Y. Intermediate values indicate a pattern, but aren't completely reliable. The closer they are to "perfect mutual information" or 1, the higher their metric score and the darker their representation on the matrix.

More about metrics

The metric score is responsible for ordering and positioning of clusters and features in the matrix and the detail pane. You can select either the Mutual Information (the default) or Cramer's V metric. These metrics are well-documented on the internet:

  • A technical overview of Mutual Information on Wikipedia.
  • A longer discussion of Mutual Information on Scholarpedia, with examples.
  • A technical overview of Cramer's V on Wikipedia.
  • A Cramer's V tutorial of "what and why."

Both metrics measure dependence between features and selection is largely dependent on preference and familiarity. Keep in mind that Cramer's V is more sensitive and, as such, when features depend weakly on each other it reports associations that Mutual Information may not.

How associations are calculated

When calculating associations, DataRobot selects the top 50 numeric and categorical features (or all features if fewer than 50). "Top" is defined as those features with the highest importance score, the value that represents a feature's association with the target. Data from those features is then randomly subsampled to a maximum of 10k rows.

Note the following:

  • For associations, DataRobot performs quantile binning of numerical features and does no data imputation. Missing values are grouped as a new bin.
  • Outlying values are excluded from correlational analysis.
  • For clustering, features below an association threshold of 0.1 are eliminated.
  • If all features are relatively independent of each other—no distinct families—DataRobot displays the matrix but all dots are white.
  • Features missing over 90% of their values are excluded from calculations.
  • High-cardinality categorical features with more than 2000 values are excluded from calculations.

Updated March 26, 2025