Analyze feature associations¶
In this tutorial, you'll learn how to use a Feature Association matrix to visualize relationships among your numeric and categorical features. You can quickly see the top ten associations and the clusters that are present in your data.
How are feature associations calculated?
Feature associations are calculated using Mutual Information, by default, but you can switch to Cramer's V. Learn more about these metrics in the Feature Association documentation.
This tutorial shows how to:
- View the Feature Associations matrix.
- Investigate feature relationships including pairs and clusters of features.
View the Feature Associations tab¶
The Feature Associations tab is available after your features are analyzed in EDA2.
The sample dataset featured in this tutorial contains patient data.
The goal is to predict the likelihood of patient readmission to the hospital. The target feature is
On the Begin a project page, upload your data, then specify a target and and click Start.
DataRobot performs EDA2 prior to generating model blueprints.
Once DataRobot finishes feature analysis, click Feature Associations on the Data tab.
The Feature Assocations matrix displays.
The features are listed on the x and y axes of the matrix. The association between two features (a feature pair) is represented by a colored dot.
Investigate clusters of features.
Feature clusters are groups of features that are associated to some degree. The dots in a cluster display in the same general color with association strength represented by the depth of the color—dark (opaque) to light (more transparent). White dots indicate features that are not in a cluster.
Notice the green, red, and blue clusters identifed in the chart. The red cluster contains the
metforminfeatures. It makes sense that these features are in a cluster because they all relate to diabetes medications—insulin and metformin are diabetes medications, while the
changefeature indicates that the patient's medication was changed.
To zoom in, drag the cursor to outline a section of the matrix.
To view the whole matrix again, click Reset zoom below the display.
Explore the features by sorting them using the Sort By dropdown menu.
By default, the list is sorted by Feature Cluster. You can also sort by name and Importance.
Use the Feature List dropdown menu to view the feature associations based on a different feature list.
Explore pairs of features¶
Select a dot in the matrix to view details about the feature pair.
The Associations tab on the right shows the cluster that contains the feature pair, as well as the value for the selected metric (Mutual Information, in this case). The tab also provides details about the individual features.
Click View Feature Association Pairs at the bottom of the Associations tab.
The window displays a visualization of the association between the two features.
In this case, the features are both categorical so a contingency table shows the frequency distribution of the feature values. For other feature types, different plots display.
Select other pairs of features from the Feature 1 and Feature 2 dropdown menus.
For pairs of numeric features, DataRobot generates scatter plots.
If a pair includes a numeric feature and a categorical feature, DataRobot generates a box and whisker plot.
In this example, the feature pair of
admission_type_id(a categorical feature) and
time_in_hospital(a numeric feature) generates a box and whisker plot. The plot shows the upper and lower quartiles for the data. The endpoints represent the upper and lower extremes.