Skip to content

On-premise users: click in-app to access the full platform documentation for your version of DataRobot.

Word Cloud

Text variables often contain words that are highly indicative of the response. The Word Cloud insight displays up to the 200 most impactful words and short phrases in word cloud format.

Note

See Word Cloud in Workbench for additional capabilities available when viewing an experiment's word cloud in DataRobot NextGen.

Select a model from the Leaderboard and click Understand > Word Cloud to display the chart:

Element Description
1 Selected word Displays details about the selected word. (The term word here equates to an n-gram, which can be a sequence of words.)

Mouse over a word to select it. Words that appear more frequently display in a larger font size in the Word Cloud, and those that appear less frequently display in smaller font sizes.
2 Coefficient Displays the coefficient value specific to the word.
3 Color spectrum Displays a legend for the color spectrum and values for words, from blue to red, with blue indicating a negative effect and red indicating a positive effect.
4 Appears in # rows Specifies the number of rows the word appears in.
5 Filter stop words Removes stop words (commonly used terms that can be excluded from searches) from the display.
6 Export Allows you to export the Word Cloud.
7 Zoom controls Enlarges or reduces the image displayed on the canvas. Alternatively, double-click on the image. To move areas of the display into focus, click and drag.
8 Select class For multiclass projects, selects the class to investigate using the Word Cloud.
Word Cloud availability

You can access Word Cloud from either the Insights page or the Leaderboard. Operationally, each version of the model behaves the same—use the Leaderboard tab to view a Word Cloud while investigating an individual model and the Insights page to access, and compare, each Word Cloud for a project. Additionally, they are available for multimodal datasets (i.e., datasets that mix images, text, categorical, etc.)—a Word Cloud is displayed for all text from the data.

The Word Cloud visualization is supported in the following model types and blueprints:

  • Binary classification:

    • All variants of ElasticNet Classifier (linear family models) with the exception of TinyBERT ElasticNet classifier and FastText ElasticNet classifier
    • LightGBM on ElasticNet Predictions
    • Text fit on Residuals
    • Extended support for multimodal datasets (with single Auto-Tuned N-gram)
  • Multiclass:

    • Stochastic Gradient Descent with at least 1 text column with the exception of TinyBERT SGD classifier and FastText SGD classifier
  • Regression:

    • Ridge Regressor
    • ElasticNet Regressor
    • Lasso Regressor
    • Single Auto-Tuned Multi-Modal
    • LightGBM on ElasticNet Predictions
    • Text fit on Residuals

Note

The Word Cloud for a model is based on the data used to train that model, not on the entire dataset. For example, a model trained on a 32% sample size will result in a Word Cloud that reflects those same 32% of rows.

See Text-based insights for a description of how DataRobot handles single-character words.


Updated February 21, 2024