Transform data¶
DataRobot supports multiple methods of feature engineering—automatic and manual feature transformations for single datasets, as well as Feature Discovery for multiple datasets. See the table below to learn about the feature transformation options in DataRobot.
Topic | Description | Dataset | Notes |
---|---|---|---|
Automatic transformations | |||
Automatic feature transformations | Understand date-type feature transformations generated by DataRobot. | Primary | Calculated during EDA1. |
Interaction-based transformations | Transform features based in interactions within your primary dataset by enabling an advanced option. | Primary | Enabled in project and calculated during EDA2. |
Feature Discovery | Perform multi-dataset, interaction-based feature creation. | Secondary | Configured in project and calculated during EDA2. |
Automatic modeling transformations | Understand the automated feature engineering DataRobot performs as part of the modeling process. | All | Performed during modeling. |
Manual transformations | |||
Manual feature transformations | Manually transform features in your dataset, including variable type transformations. | Primary | Transformed in project. |
AI Catalog transformations | |||
Prepare data in AI Catalog with Spark SQL | Enrich, transform, shape, and blend together datasets using Spark SQL queries within the AI Catalog. |
What is feature engineering?¶
Feature engineering is the process of preparing a dataset for machine learning by changing existing features or deriving new features to improve model performance. Automated Feature Engineering uses AI to accelerate the transformation of data into machine learning assets, allowing you to build better machine learning models in less time.
Feature engineering takes place after data preparation and ingest, and before model building.
During EDA1, DataRobot analyzes and profiles every feature in each dataset—detecting feature types, automatically transforming date-type features, and assessing feature quality.
Before model building, you can take further advantage of Automated Feature Engineering by enabling interaction-based transformations for primary datasets or defining relationships between multiple datasets using Feature Discovery. You can also manually transform features in your dataset, including variable type transformations, with functions.
During EDA2, DataRobot uses these known interactions, or relationships, to discover relevant features for your ML models and automatically transforms them to address the unique requirements of each algorithm in the blueprint library.
After model building, navigate to the Leaderboard and select a model. There are a few places you can view which transformations DataRobot performed for individual models during the modeling process:
Feature | Description | Location |
---|---|---|
Blueprint | Displays preprocessing, modeling algorithms, and post-processing tasks for the selected model. | Click Describe > Blueprints. |
Data Quality Handling report | Displays feature and imputation information for supported blueprint tasks. | Click Describe > Data Quality Handling. |
Coefficients | Allows you to download coefficients and preprocessing information, including feature transformations, for supported model types. | Click Describe > Coefficients and click Export. |