Understand the Word Cloud¶
In this tutorial, you'll learn how to draw insights from text features in models using the Word Cloud.
To access the Word Cloud, select a model on the Leaderboard and click Understand > Word Cloud.
Takeaways¶
This tutorial explains:
- How to access and interpret the Word Cloud
- How to export the Word Cloud as raw values
Access the Word Cloud¶
When a training dataset contains one or more text features, DataRobot specially trains models to generate text-based insights, including the Word Cloud. If a dataset has multiple text features, a Word Cloud is created for each one.
-
Select a model that supports Word Clouds on the Leaderboard, for example the Auto-Tuned Word N-Gram Text Modeler. (See the Word Cloud description for other model types.)
Search tip for finding models
To narrow down Leaderboard results, enter "insights" into the search bar at the top to quickly find models that produce a visualization on the Insights page.
-
Click Understand > Word Cloud.
Interpret the Word Cloud¶
After selecting Word Cloud, the window displays a visualization of the model's top 200 text features chosen based on their relationship to the target feature.
-
Mouse over a word. The active word displays in the upper-left corner.
Stop words
To prevent common stop words (the, for, was, etc.) from appearing in the Word Cloud, select the box next to Filter stop words.
-
Look at the size of the word. Size represents the frequency of the word in the dataset—larger words appear more frequently than smaller words.
-
Look at the color of the word. Color represents how closely related the word is to the target feature—red indicates a positive effect on the target feature and blue indicates a negative effect on the target feature.
-
Look at the coefficient value.
Export the Word Cloud¶
You can export Word Cloud insights as raw values in a CSV file. To export, click the Export button and then Download in the resulting dialog.
When the download is complete, open the CSV file.
Fields of the CSV are described below:
Column | Description |
---|---|
name |
The word found in the column (in var_name ). |
var_name |
Feature name (name of the column). |
resp |
Normalized coefficient from the linear model. |
freq |
Normalized word occurrences. |
abs_freq |
Total word occurrences (count). |
stop_word |
Whether stop words are filtered. |
Learn more¶
How does DataRobot handle text features?
If a dataset contains one or more text features, DataRobot uses natural language processing (NLP) tools, such as Auto-Tuned Word N-Gram Text Modelers, to specially tune models and generate NLP visualization techniques, including frequency value tables and word clouds.
During model building, DataRobot incorporates a matrix of word-grams in blueprints. The matrix is produced using common techniques, TF-IDF values, and a combination of multiple text columns.
For large datasets, DataRobot uses the Auto-Tuned Word N-Gram Text Modelers tool, which looks at one text column at a time. This approach uses a single N-Gram model for each text feature in the input dataset, and then uses the predictions from these models as inputs for other models.
Documentation: