Skip to content

On-premise users: click in-app to access the full platform documentation for your version of DataRobot.

Cluster Insights

With the Cluster Insights visualization, you can understand and name each cluster in a dataset. Use clustering to capture a latent feature in your data, to surface and communicate actionable insights quickly, or to identify segments in the data for further modeling.

Note

The maximum number of features computed for Cluster Insights is 100. The features are selected from the features used to train the model, based on the Feature Impact (high to low). The remaining features (those not used to train the model) are sorted alphabetically.

To analyze the clusters in your data, after building a clustering experiment, select a model from the Leaderboard and open the Cluster Insights visualization.

The following table describes the Cluster Insights visualization.

Element Description
1 Visualization controls Provides tools for working with the display.
2 Clusters and features Provides cluster and feature details, including visualizing cluster breakdown by feature and listing features, sorted by feature importance. The Informative Feature list displays by default; use the Feature list dropdown in the controls to change the display.

Visualization controls

Use the controls in the top bar to work with the display.

Select clusters

Use Select clusters to add or remove clusters from the visualization view (not from the experiment). The visualization supports a maximum of five clusters per screen (use the arrow on the far right )

Click + Add cluster to display additional clusters; delete a cluster from the display with the trash can . To reorder clusters, click a cluster in a position and re-assign a new cluster to that position.

Rename clusters

You can rename clusters after you gain an understanding of what they represent. The cluster names propagate to other insights and predictions, allowing you to further analyze the clusters. Click Rename clusters, edit cluster names, and click Finish editing when done.

Change or create feature lists

By default, DataRobot builds clustering models using the Informative Features list. Select another feature list, either automatically generated or custom, to explore a different subset of features. Changing the list does not impact the model, only the display; however, analyzing the features not used to generate the clusters can still be useful to answer questions like "How does income distribute among my clusters, even if I'm not using it for clustering?"

See the custom feature list reference for information on creating new lists.

Use Search to show an individual feature's placement in each cluster.

Download CSV

Click Download CSV to download the cluster insights. The CSV contains the information displayed in the Cluster Insights visualization, and more detailed feature data.

View more features

Features display, for each displayed cluster, in order of Feature Impact, most important to least by default. Four features display by default; click the number to adjust the number of features displayed per page. To navigate through the features, click the right arrow above the clusters.

Clusters and features

Clusters are comprised of groups of similar features that form natural segments. The Clustering Insights visualization helps to understand how those groups were formed. See the reference documentation for details on investigating cluster features.

Clusters display in columns, showing the features in the cluster and the feature impact score and values for each feature. The visualization helps to evaluate the distribution of features across clusters. Cluster sizes are shown as percentages above the cluster name. The All data cluster contains 100% as a baseline comparison.

  • Click the arrow to the right of the cluster names to scroll through cluster.
  • Click the Impact column name to reverse the order.

Hover on a feature within a cluster to see details for the top four features.

Expand a row to see additional features or statistics, depending on feature type, within a cluster.

For numeric features:

For categorical features, see a histogram showing the top four features and all others bucketed into Other:

To drill into all categories, click the gear :fontawesome-gear: next to the feature name and select High cardinality. Hover on a value how many records in the selected cluster contain that value.


Updated August 28, 2024