Data augmentation methods¶
This page summarizes the various ways in which DataRobot augments datasets for different experiment types.
Feature Discovery for derived features¶
Feature Discovery discovers and generates new features from multiple datasets so that you no longer need to perform manual feature engineering to consolidate various datasets into one. It automates the procedure of joining and aggregating datasets, using a variety of heuristics to determine the list of features to derive in a DataRobot project. The results depend on a number of factors such as detected feature types, characteristics of the features, relationships between datasets, data size constraints, and more.
Time series feature derivation¶
DataRobot time series uses a feature engineering and reduction process to create the time series modeling dataset. The modeling framework extracts relevant features from time-sensitive data, modifies them based on user-configurable forecasting needs, and creates an entirely new dataset derived from the original. DataRobot then uses standard, as well as time series-specific, machine learning algorithms for model building. The feature engineering process includes time series data prep capabilities and the ability to restore features removed by the reduction process.
Time-aware data wrangling¶
Time-aware wrangling creates recipes of operations and applies them first to a sample and then, when verified, to a full dataset of time-aware data. This way, you can perform time series feature engineering during the data preparation phase. Executing operations like lags and rolling statistics on input data provides control over which time-based features are generated before modeling. By reviewing the preview that results from adding both time-aware and non-time-aware operations, you can adjust before publishing, preventing the need to rerun modeling if what would otherwise be done automatically doesn't fit your use case.
Train-time image augmentation¶
Train-time image augmentation is a processing step in the DataRobot blueprint that creates new images for training by randomly transforming existing images, thereby increasing the size of the training data. By creating new images for training by randomly transforming existing images, you can build insightful projects with datasets that might otherwise be too small. In addition, all image projects that use augmentation have the potential for smaller overall loss by improving the generalization of models on unseen data.
Automated location feature engineering¶
Location AI provides the ability to ingest, autorecognize, and transform geospatial data unlocks powerful capabilities for DataRobot model blueprints. For example, geometric properties associated with row-level geometries can be powerful predictors in machine learning models. Location AI automatically derives features from the properties of the input geometries. DataRobot derives features for MultiPoints, Lines/MultiLines, Polygons/MultiPolygons