Skip to content

On-premise users: click in-app to access the full platform documentation for your version of DataRobot.

GCP-based enrichment of modeling datasets

Access this AI accelerator on GitHub

Text data is a valuable source of information for machine learning models, as it allows algorithms to extract insights from large volumes of unstructured text data. Text data can be obtained from various sources, such as social media, news articles, and customer feedback. The benefits of using text data in ML models include its ability to provide valuable insights, such as sentiment analysis, and topic modeling, which can help organizations make informed decisions. However, using text data in ML models can be challenging due to several factors, such as the complexity of natural language, the presence of bias and noise, and the lack of standardization in text data. Additionally, text data requires significant preprocessing and feature engineering to ensure that it can be effectively used in ML models.

One common application of text mining is sentiment analysis, where a numerical value is assigned representing whether the text carries a positive, neutral, or negative sentiment. While DataRobot can help efficiently build such models, the training requires a large, accurately labeled corpora that have been accurately labeled, making it a challenging task for users lacking such training dataset.

In this accelerator, demo the usage of the Google Cloud Natural Language API for sentiment analysis to enrich a customer churn dataset. The sentiment scores from the Google API help improve the model performance in predicting the likelihood of churn for each customer, without requiring the user to train their own sentiment models.

Updated May 20, 2024