Skip to content

On-premise users: click in-app to access the full platform documentation for your version of DataRobot.

Text AI resources

Text AI in DataRobot allows you to seamlessly incorporate text data into your model without being a Natural Language Processing (NLP) expert and without injecting extra steps in the model building process. With models and preprocessing steps designed specifically for NLP, DataRobot supports all languages from ISO 639, the set of standards for representing names for languages and language groups.

The tools available for working with text are described in the following sections.

Topic Describes...
Working with text
Automated transformations Learn about automated feature engineering for text, built to enhance model accuracy.
Clustering based on text collections Use clustering for detecting topics, types, taxonomies, and languages in a text collection.
Aggregation and imputation in time series projects Set handling for text features in time series projects.
Composable ML transformers Edit model blueprints, including pre-trained transformers, to best represent text features.
Model insights
Coefficients See how text-preprocessing transforms text found in a dataset into a form that can be used by a DataRobot model.
Text Mining Display the most relevant words and short phrases in any variables detected as text.
Word cloud Display the most relevant words and short phrases found in your dataset in word cloud-format.
Text Explanations Visualize not only the text feature that is impactful, but also which specific words within a feature are impactful.
Multilabel modeling for text categorization Use multilabel classification for text categorization.
Example: Capturing sentiment in text See an example of uplifting a model by capturing sentiment in the text.
Text-related feature announcements
NLP Fine-Tuner blueprints Read about NLP Fine-Tuner blueprints.
FastText for language detection Read about FastText for language detection at data ingest.
TinyBERT featurizer Read about using Google's Bidirectional Encoder Representations from Transformers (distilled version).

Updated January 31, 2024