If your project has no secondary dataset, the Feature Discovery process does not apply. For these cases, use the capability to search for interactions in a primary dataset to automatically create new features based on interactions between features from your primary dataset.
These newly engineered features can provide additional insight that might be important for modeling. For example, if you were to provide the year a house was sold and the year a house was built, DataRobot could extract a new feature from the difference. This engineered feature, “age of house at sell date,” may prove more relevant than the build or sale dates alone.
The search for interactions functionality, run as part of the EDA2 process, results in not only new features but also new feature lists, both default lists and custom. The new features are represented in the following tabs:
- Feature Impact if in the top 50 most impactful.
- Feature Effects if they have more than zero influence on the model (based on the feature importance score).
- Prediction Explanations, if applicable to the displayed reasons.
If the search does not create any new features (or you have not enabled the option in Advanced options, there are no changes to the Data page list of features and no new feature lists are created.
See the considerations for feature availability.
DataRobot additionally provides automatic feature transformations for features of type date. This transformation, which occurs during EDA1 and is described as part of the feature transformations section, requires no manual settings.
Search for interactions¶
To enable interaction search for a primary dataset, after selecting a target, expand the Advanced options link and select the Additional tab. In the Automation Settings section, select Search for interactions:
Return to the top of the page and click Start. As EDA2 runs, you can watch as the newly created features are added to the Data page. New features are named in a way that indicates the operation that created them:
Note the Importance score of the new features, showing the strength of their relationship to the target.
To improve efficiency when run, Autopilot does not search for differences/ratios for selected blueprints. This is because Search for Interactions, which is done at the EDA2 stage (before Autopilot is run), has already performed a similar search and added new features when applicable.
Feature lists and created features¶
DataRobot creates new feature lists—"Informative Features" and, if applicable, custom lists—with the created features and marks the lists with a plus (+) sign. Informative features:
A custom list:
When EDA2 completes, if DataRobot found and created new features, the selected modeling mode uses the new list to build models.
A few things to note about feature lists:
The target feature is automatically added to every feature list.
If Autopilot is set to run on the "Informative Features" list, DataRobot creates Informative Features +. If set to run on a custom list, DataRobot creates both <Custom_Features> + and Informative Features +.
For custom lists, DataRobot only adds those features that make sense to the original content of the list. Also, DataRobot only creates a new custom list if the original custom list contains the parent of at least one newly derived feature.
Informative Features + may or may not have the same number of features as the original. This is because when deriving the new feature from the old, keeping both may result in redundancy. If that is the case, DataRobot removes one of the parent features.
Informative Features + is created based on the Informative Features with Leakage Removed feature list.
<Custom_Features> + is created based on the features in the custom list and any engineered features whose parents are in the custom list.
Explore new features¶
Once a new feature is created, the Transformation tab provides insights that explain the relationships. To view:
- From the Data page click on the new feature name.
Select the Transformation tab. The display compares the transformed feature with the parent features and indicates the interaction (MINUS, EQUAL, or DIVIDED BY):
To further investigate the newly engineered features, and how newly derived features affect model predictions, find them in the following insights:
In general, DataRobot considers an interaction between a pair "useful" only when the interaction satisfies criteria of both interpretability and accuracy. This is achieved through high correlation and significance checks. DataRobot fits a Generalized Linear Model with the derived features and then determines the significance of that feature (for example, using p-values or other statistical criteria).
Search for Interactions typically adds additional insights, but can sometimes result in insights being slightly less accurate. That change in accuracy can lead to DataRobot selecting a different recommended model and also can change the runtime of the 80% model.
Search for interactions on primary datasets is supported for:
- Pure numeric
- Special numeric (date, percentage, currency, length)
And does not support the following: