Fundamentals of predictive modeling¶
This section describes DataRobot's predictive solutions; see GenAI fundamentals for an overview of working with generative AI-related tools and options.
Predicitve AI uses automated machine learning (AutoML) to build models that solve real-world problems across domains and industries. DataRobot takes the data you provide, generates multiple machine learning (ML) models, and recommends the best model to put into use. You don't need to be a data scientist to build ML models using DataRobot, but an understanding of the basics will help you build better models. Your domain knowledge and DataRobot's AI expertise will lead to successful models that solve problems with speed and accuracy.
DataRobot supports many different approaches to ML modeling—supervised learning, unsupervised learning, time series modeling, segmented modeling, multimodal modeling, and more. This section describes these approaches and also provides tips for analyzing and selecting the best models for deployment.
This section describes predictive modeling methods. See the Workbench predictive model training overview for a generalized discussion of the steps to build predictive models.
Predictive modeling methods¶
ML modeling is the process of developing algorithms that learn by example from historical data. These algorithms predict outcomes and uncover patterns not easily discerned. DataRobot supports a variety of modeling methods, each suiting a specific type of data and problem type.
Supervised and unsupervised learning¶
The most basic form of machine learning is supervised learning.
With supervised learning, you provide "labeled" data. A label in a dataset provides information to help the algorithm learn from the data. The label—also called the target—is what you're trying to predict.
-
In a regression experiment, the target is a numeric value. A regression model estimates a continuous dependent variable given a list of input variables (also referred to as features or columns). Examples of regression problems include financial forecasting, time series forecasting, maintenance scheduling, and weather analysis. Regression experiments can also be handled as classification by changing the target type from numeric to classification.
-
In a classification experiment, the target is a category. A classification model groups observations into categories by identifying shared characteristics of certain classes. It compares those characteristics to the data you're classifying and estimates how likely it is that the observation belongs to a particular class. Classification experiments can be binary (two classes) or multiclass (three or more classes). For classification, DataRobot also supports multilabel modeling where the target feature has a variable number of classes or labels; each row of the dataset is associated with one, several, or zero labels.
Another form of machine learning is unsupervised learning.
With unsupervised learning, the dataset is unlabeled and the algorithm must infer patterns in the data.
-
In an anomaly detection experiment, the algorithm detects unusual data points in your dataset. Potential uses include the detection of fraudulent transactions, faults in hardware, and human error during data entry.
-
In a clustering experiment, the algorithm splits the dataset into groups according to similarity. Clustering is useful for gaining intuition about your data. The clusters can also help label your data so that you can then use a supervised learning method on the dataset.
Time-aware modeling¶
Time data is a crucial component in solving prediction and forecasting problems. Models using time-relevant data make row-by-row predictions, time series forecasts, or current value predictions ("nowcasts"). An experiment becomes time-aware when, if the data is appropriate, the partitioning method is set to date/time.
-
With time series modeling, you can generate a forecast—a series of predictions for a period of time in the future. You train time series models on past data to predict future events. Predict a range of values in the future or use nowcasting to make a prediction at the current point in time. Use cases for time series modeling include predicting pricing and demand in domains such as finance, healthcare, and retail—basically, any domain where problems have a time component.
-
You can use time series modeling for a dataset containing a single series, but you can also build a model for a dataset that contains multiple series. For this type of multiseries experiment, one feature serves as the series identifier. An example is a "store location" identifier that essentially divides the dataset into multiple series, one for each location. So you might have four store locations (e.g., Paris, Milan, Dubai, and Tokyo) and therefore four series for modeling.
-
With a multiseries experiment, you can choose to generate a model for each series using segmented modeling. In this case, DataRobot creates a deployment using the best model for each segment.
-
Sometimes, the dataset for the problem you're solving contains date and time information, but instead of generating a forecast as you do with time series modeling, you predict a target value on each individual row. This approach is called time-aware predictions.
See What is time-aware modeling? for an in-depth discussion of these strategies.
Specialized modeling workflows¶
DataRobot provides specialized workflows to help you address a wide range of problems.
-
Image augmentation allows you to include images as features in your datasets. Use the image data alongside other data types to improve outcomes for various types of modeling experiments—regression, classification, anomaly detection, clustering, and more.
-
With editable blueprints, you can build and edit your own ML blueprints—the preprocessing steps (tasks), modeling algorithms, and post-processing steps that go into building a model—incorporating DataRobot preprocessing and modeling algorithms, as well as your own models.
-
For text features in your data, use Text AI insights like Word Clouds to understand their impact.
-
Location AI supports geospatial analysis of modeling data. Use geospatial features to gain insights and visualize data using interactive maps before and after modeling.
See the generalized discussion of the steps to build predictive models in Workbench. Or, try it yourself with the model building walkthrough.