Skip to content

Click in-app to access the full platform documentation for your version of DataRobot.

Understand the Word Cloud

In this tutorial, you'll learn how to draw insights from text features in models using the Word Cloud.

To access the Word Cloud, select a model on the Leaderboard and click Understand > Word Cloud.

Takeaways

This tutorial explains:

  • How to access and interpret the Word Cloud
  • How to export the Word Cloud as raw values

Access the Word Cloud

When a training dataset contains one or more text features, DataRobot specially trains models to generate text-based insights, including the Word Cloud. If a dataset has multiple text features, a Word Cloud is created for each one.

  1. Select a model that supports Word Clouds on the Leaderboard, for example the Auto-Tuned Word N-Gram Text Modeler. (See the Word Cloud description for other model types.)

    Select text-enabled model

    Search tip for finding models

    To narrow down Leaderboard results, enter "insights" into the search bar at the top to quickly find models that produce a visualization on the Insights page.

  2. Click Understand > Word Cloud.

    Click Word Cloud tab

Interpret the Word Cloud

After selecting Word Cloud, the window displays a visualization of the model's top 200 text features chosen based on their relationship to the target feature.

  1. Mouse over a word. The active word displays in the upper-left corner.

    Hover over word

    Stop words

    To prevent common stop words (the, for, was, etc.) from appearing in the Word Cloud, select the box next to Filter stop words.

  2. Look at the size of the word. Size represents the frequency of the word in the dataset—larger words appear more frequently than smaller words.

    Word size

  3. Look at the color of the word. Color represents how closely related the word is to the target feature—red indicates a positive effect on the target feature and blue indicates a negative effect on the target feature.

    Word color

  4. Look at the coefficient value.

Export the Word Cloud

You can export Word Cloud insights as raw values in a CSV file. To export, click the Export button and then Download in the resulting dialog.

Click Export button

When the download is complete, open the CSV file.

Export Word Cloud

Fields of the CSV are described below:

Column Description
name The word found in the column (in var_name).
var_name Feature name (name of the column).
resp Normalized coefficient from the linear model.
freq Normalized word occurrences.
abs_freq Total word occurrences (count).
stop_word Whether stop words are filtered.

Learn more

How does DataRobot handle text features?

If a dataset contains one or more text features, DataRobot uses Natural Language Processing (NLP) tools, such as Auto-Tuned Word N-Gram Text Modelers, to specially tune models and generate NLP visualization techniques, including frequency value tables and word clouds.

During model building, DataRobot incorporates a matrix of word-grams in blueprints. The matrix is produced using common techniques, TF-IDF values, and a combination of multiple text columns.

For large datasets, DataRobot uses the Auto-Tuned Word N-Gram Text Modelers tool, which looks at one text column at a time. This approach uses a single N-Gram model for each text feature in the input dataset, and then uses the predictions from these models as inputs for other models.

Documentation:


Updated May 7, 2022
Back to top