# Clustering

> Clustering - Learn how to use clustering, a form of unsupervised learning, to separate your samples
> into clusters that help you to better understand your data or to use as segments for time series
> modeling.

This Markdown file sits beside the HTML page at the same path (with a `.md` suffix). It summarizes the topic and lists links for tools and LLM context.

Companion generated at `2026-04-24T16:03:56.608154+00:00` (UTC).

## Primary page

- [Clustering](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/unsupervised/clustering.html): Full documentation for this topic (HTML).

## Sections on this page

- [How to use clustering models](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/unsupervised/clustering.html#how-to-use-clustering-models): In-page section heading.
- [Build a clustering model](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/unsupervised/clustering.html#build-a-clustering-model): In-page section heading.
- [Sample clustering blueprint](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/unsupervised/clustering.html#sample-clustering-blueprint): In-page section heading.
- [Visualizations for exploring clusters](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/unsupervised/clustering.html#visualizations-for-exploring-clusters): In-page section heading.
- [Cluster Insights](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/unsupervised/clustering.html#cluster-insights): In-page section heading.
- [Image Embeddings](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/unsupervised/clustering.html#image-embeddings): In-page section heading.
- [Activation Maps](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/unsupervised/clustering.html#activation-maps): In-page section heading.
- [Feature Impact](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/unsupervised/clustering.html#feature-impact): In-page section heading.
- [Feature Associations](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/unsupervised/clustering.html#feature-associations): In-page section heading.
- [Configure the number of clusters](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/unsupervised/clustering.html#configure-the-number-of-clusters): In-page section heading.
- [Set the number of clusters in Advanced Options](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/unsupervised/clustering.html#set-the-number-of-clusters-in-advanced-options): In-page section heading.
- [Update the number of clusters and rerun a model](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/unsupervised/clustering.html#update-the-number-of-clusters-and-rerun-a-model): In-page section heading.
- [Update the number of clusters and rerun all models](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/unsupervised/clustering.html#update-the-number-of-clusters-and-rerun-all-models): In-page section heading.
- [Feature considerations](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/unsupervised/clustering.html#feature-considerations): In-page section heading.

## Related documentation

- [Classic UI documentation](https://docs.datarobot.com/en/docs/classic-ui/index.html): Linked from this page.
- [Modeling](https://docs.datarobot.com/en/docs/classic-ui/modeling/index.html): Linked from this page.
- [Specialized workflows](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/index.html): Linked from this page.
- [Unsupervised learning](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/unsupervised/index.html): Linked from this page.
- [unsupervised learning](https://docs.datarobot.com/en/docs/reference/glossary/index.html#unsupervised-learning): Linked from this page.
- [Clustering for segmented modeling](https://docs.datarobot.com/en/docs/classic-ui/modeling/time/ts-clustering.html): Linked from this page.
- [image collection](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/visual-ai/index.html): Linked from this page.
- [MLOps](https://docs.datarobot.com/en/docs/api/dev-learning/python/mlops/index.html): Linked from this page.
- [anomaly detection](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/unsupervised/anomaly-detection.html): Linked from this page.
- [Upload data](https://docs.datarobot.com/en/docs/classic-ui/data/import-data/index.html): Linked from this page.
- [Silhouette Score](https://docs.datarobot.com/en/docs/reference/pred-ai-ref/opt-metric.html#silouette-score): Linked from this page.
- [deploy the model](https://docs.datarobot.com/en/docs/classic-ui/mlops/deployment/index.html): Linked from this page.
- [make predictions](https://docs.datarobot.com/en/docs/api/dev-learning/python/predictions/index.html): Linked from this page.
- [Cluster Insights](https://docs.datarobot.com/en/docs/classic-ui/modeling/analyze-models/understand/cluster-insights-classic.html): Linked from this page.
- [Image Embeddings](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/visual-ai/vai-insights.html#image-embeddings): Linked from this page.
- [Granularity](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/visual-ai/vai-tuning.html#granularity): Linked from this page.
- [Visual AI reference](https://docs.datarobot.com/en/docs/reference/pred-ai-ref/vai-reference/vai-ref.html): Linked from this page.
- [Feature Impact](https://docs.datarobot.com/en/docs/classic-ui/modeling/analyze-models/understand/feature-impact-classic.html): Linked from this page.
- [Feature Associations](https://docs.datarobot.com/en/docs/classic-ui/data/analyze-data/feature-assoc.html): Linked from this page.
- [time series-specific clustering considerations](https://docs.datarobot.com/en/docs/reference/pred-ai-ref/ts-reference/ts-consider.html#clustering-considerations): Linked from this page.

## Documentation content

# Clustering

Clustering, an application of [unsupervised learning](https://docs.datarobot.com/en/docs/reference/glossary/index.html#unsupervised-learning), lets you explore your data by grouping and identifying natural segments. Use clustering to explore clusters generated from many types of data—numeric, categorical, text, image, and geospatial data—independently or combined. In clustering mode, DataRobot captures a latent behavior that's not explicitly captured by a column in the dataset.

You can also use clustering to generate the segments for a time series segmented modeling project. See [Clustering for segmented modeling](https://docs.datarobot.com/en/docs/classic-ui/modeling/time/ts-clustering.html) for details.

See the associated [considerations](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/unsupervised/clustering.html#feature-considerations) for additional information.

## How to use clustering models

Clustering is useful when data doesn't come with explicit labels and you have to determine what they should be. You can upload any dataset to get an understanding of your data because no target is needed. Examples of clustering include:

- Detecting topics, types, taxonomies, and languages in a text collection. You can apply clustering to datasets containing a mix of text features and other feature types or a single text feature for topic modeling.
- Determining appropriate segments to be used fortime series segmented modeling.
- Segmenting your customer base before running a predictive marketing campaign. Identify key groups of customers and send different messages to each group.
- Capturing latent categories in animage collection.
- Deploying a clustering model usingMLOpsto serve cluster assignment requests at scale, as a step in a more extensive pipeline.

## Build a clustering model

The clustering workflow is similar to the [anomaly detection](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/unsupervised/anomaly-detection.html) workflow, also an unsupervised learning application.

To build a clustering model:

1. Upload data, clickNo target?and selectClusters. Modeling Modedefaults to Comprehensive andOptimization Metricdefaults toSilhouette Score.
2. ClickStart. DataRobot generates clustering models based on default cluster counts for your dataset size. You can alsoconfigure the number of clusters. For clustering, DataRobot divides the original dataset into training and validation partitions with no holdout partition. When modeling is complete, the Leaderboard displays the generated clustering models ranked by silhouette score: TheClusterscolumn indicates the number of clusters used by the clustering algorithm.
3. Select a model to investigate. By default, theDescribe > Blueprinttab displays.
4. Analyzevisualizationsto select a clustering model.
5. After evaluating and selecting a clustering model,deploy the modelandmake predictionson existing or new data as you would any other model. You can make predictions from the Leaderboard or the deployment.

## Sample clustering blueprint

Following is an example of a clustering blueprint.

Click a blueprint node to access documentation on the algorithm or transform. This example shows details on the K-Means Clustering node.

This dataset contains categorical, geospatial location, numeric, image, and text variables. The clustering algorithm is applied after preprocessing and dimensionality reduction of the variable types to improve processing speed.

## Visualizations for exploring clusters

The following visualization tools are useful for clustering projects:

### Cluster Insights

The [Cluster Insights](https://docs.datarobot.com/en/docs/classic-ui/modeling/analyze-models/understand/cluster-insights-classic.html) visualization ( Understand > Cluster Insights) helps you investigate clusters generated during modeling.

Compare the [feature values of each cluster](https://docs.datarobot.com/en/docs/classic-ui/modeling/analyze-models/understand/cluster-insights-classic.html#investigate-cluster-features) to gain an understanding of the groupings.

### Image Embeddings

If your dataset contains images, use the [Image Embeddings](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/visual-ai/vai-insights.html#image-embeddings) visualization ( Understand > Image Embeddings) to see how the images from each cluster are sorted.

For clustering models, the frame of each image displays in a color that represents the cluster containing the image. Hover over an image to view the probability of the image belonging to each cluster.

### Activation Maps

With [Activation Maps](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/visual-ai/vai-insights.html#activation-maps), you can see which image areas the model is using when making prediction decisions, in this case, how best to cluster the data. Hover over an image to see which cluster the image was assigned to.

> [!NOTE] Note
> For unsupervised projects, the default image preprocessing uses low-level featurization while supervised projects use multi-level featurization. See [Granularity](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/visual-ai/vai-tuning.html#granularity) for details. See also the [Visual AI reference](https://docs.datarobot.com/en/docs/reference/pred-ai-ref/vai-reference/vai-ref.html).

### Feature Impact

Use the [Feature Impact](https://docs.datarobot.com/en/docs/classic-ui/modeling/analyze-models/understand/feature-impact-classic.html) tool ( Understand > Feature Impact) to see which features had the most influence on the clustering outcomes:

> [!TIP] How is Feature Impact calculated for clustering projects?
> As with supervised projects, DataRobot permutes each feature and looks at how much the prediction changes based on the [RMSE metric](https://docs.datarobot.com/en/docs/reference/pred-ai-ref/opt-metric.html#rmse-weighted-rmse-rmsle-weighted-rmsle). The larger the change, the higher the impact of the feature.

### Feature Associations

Because clustering can be computationally expensive, you might want to use the [Feature Associations](https://docs.datarobot.com/en/docs/classic-ui/data/analyze-data/feature-assoc.html) tool ( Data > Feature Associations) to determine if there are redundant features that you can possibly remove.

In this example, `year_built` and `sold_date` derive features that are highly correlated and thus might not be useful to the clustering algorithms. If so, you can remove the features and rerun clustering.

> [!NOTE] Note
> To generate feature associations for a clustering project (or any unsupervised learning project), DataRobot uses the first 50 features alphabetically. Unlike supervised learning where the [ACE score](https://docs.datarobot.com/en/docs/reference/glossary/index.html#ace-scores) is used to select features, unsupervised projects don't use targets and therefore cannot compute the ACE score.

## Configure the number of clusters

Some clustering algorithms (i.e., K-Means) require a cluster count prior to modeling. Others (i.e., HDBSCAN—Hierarchical Density-Based Spatial Clustering of Applications with Noise) discover an effective number of clusters dynamically. You can learn more about these clustering algorithms in their [blueprints](https://docs.datarobot.com/en/docs/classic-ui/modeling/special-workflows/unsupervised/clustering.html#sample-clustering-blueprint).

The following sections discuss how to set the cluster count:

- Prior to modeling
- When rerunning a single model
- When rerunning all clustering models

### Set the number of clusters in Advanced Options

Prior to starting a clustering run, you can customize the number of clusters you want DataRobot to use:

1. After you upload your data and set up clustering mode, clickAdvanced settings. In theAdvanced Optionssection that displays, clickClusteringon the left.
2. Enter one or more numbers in theNumber of clustersfield. You can enter up to 10 numbers. For each number you enter, DataRobot trains multiple models, one for each algorithm that supports setting a fixed number of clusters (such as K-Means or Gaussian Mixture Model).

### Update the number of clusters and rerun a model

To rerun a model on a different number of clusters:

1. Click the+icon in theClusterscolumn of the model.
2. Enter the number of clusters to use for the run.

### Update the number of clusters and rerun all models

To update the number of clusters and rerun all models:

1. ClickRerun modelingon the Workers pane on the right.
2. Update the numbers of clusters you want the clustering algorithms to use and clickRerun. For this example, DataRobot runs clustering algorithms using 7, 10, 12, and 15 clusters.

## Feature considerations

When using clustering, consider the following:

- Datasets for clustering projects must be less than 5GB.
- The following is not supported:
- Clustering models can be deployed to dedicated prediction servers, but Portable Prediction Servers (PPS) and monitoring agents are not supported.
- The maximum number of clusters is 100.

See also the [time series-specific clustering considerations](https://docs.datarobot.com/en/docs/reference/pred-ai-ref/ts-reference/ts-consider.html#clustering-considerations).
