Work with feature lists¶
Feature lists control the subset of features that DataRobot uses to build models. You can use one of the automatically created lists or manually add features from the Data page or the menu. You can also review, rename, and delete (some) feature lists. The list used for modeling is called the default modeling feature list. That is, it is the feature list selected when you clicked the Start button.
If you don't override the selection, DataRobot uses either of the following lists to build models:
- All features that provide information potentially valuable for modeling (the Informative Features list).
- All features that provide information potentially valuable for modeling with any feature(s) at risk of causing target leakage removed (the Informative Features - Leakage Removed list).
You can select features to create a new feature list, before or after EDA2. The target feature is automatically added to every feature list. Once created, the new list becomes available in the Feature List dropdown. DataRobot highlights the active list, which controls the display of features on the page, in blue.
Note that the Project Data tab defaults to showing All Features, which is not actually a feature list but instead a way to view every feature in the dataset.
Select a feature list¶
To use a feature list other than the list assigned by DataRobot, select the list to use as the default modeling list from the Feature List dropdown.
To select a different feature list:
Scroll down to the Project Data tab.
By default, the All Features list displays.
Click the Feature List dropdown menu and select a new feature list (Informative Features in this example) list.
The Informative Features list displays below the Start button.
Create feature lists¶
If you do not want to use one of the automatically created feature lists, you can create customized feature lists and train your models on them to see if they yield a better model. You can create these lists from the Data page or the menu. Additionally, you can create lists based on feature impact from the Feature Impact tab, including lists with redundant features removed. You can later manage these lists from the Feature Lists tab.
Create feature lists from the Data page¶
To create feature lists from the Data page:
- Select the Project Data tab.
(Optional) From the Feature List dropdown select All Features to display all columns (features) in your dataset.
Use the checkboxes to the left of a feature name to select a set of features. When you select the first feature, the Create Feature List link becomes active.
Select each feature you want added to your new list and click Create Feature List.
Enter a name in the resulting dialog box and click Create feature list. The page display updates to show only those features that are part of the new list (highlighted in blue in the Feature List dropdown).
!!! tip Click in the box to select all, or deselect any, selected features.
Create feature lists from an existing list¶
Use the menu to select an existing feature list, then add or remove features to create a new feature list.
Click Menu on the top left of the Project Data tab and click Select features by feature list.
Clicking a feature list name causes DataRobot to select all features on the displayed page that are members of the chosen feature list (set by the Feature List dropdown).
Add or remove features using the check boxes to the left of the feature names.
Click + Create feature list and enter the new feature list name to save your custom feature list. The new list is available for selection across the project (from the Feature List dropdown).
Filter and select by var type¶
Filter and select features by variable data type.
Click Menu on the top left of the Project Data tab and click Select features by var type.
Add or remove features using the check boxes to the left of the feature names.
Click + Create feature list and enter the new feature list name to save your custom feature list.
Feature Lists tab¶
The Feature Lists tab of the the Data page provides a mechanism for managing feature lists. It provides a summary (name, number of features, number of models, created date, and description) of DataRobot-created and custom feature lists and allows you to delete or rename (some) lists to help avoid clutter and confusion. A lock() next to the name indicates the list cannot be deleted.
After building models, the list includes additional automatically created lists (1) as well as any custom lists (2):
Manage feature lists¶
DataRobot provides several tools for working with feature lists. Depending on how the list was created (automatically by DataRobot or manually by a user), or whether it has been used to create models on your Leaderboard, the actions may behave differently:
The following table describes the actions:
|Exports features that are part of the selected list as a CSV file.
|Opens the selected feature list on the Project Data tab.
|Provides a dialog to let you edit the list name and/or description. (Automatically created feature lists cannot be renamed although the description can be changed.)*
|Restarts Autopilot using the selected feature list.*
|Deletes the selected list (or indicates it cannot be deleted). Automatically created feature lists cannot be deleted.*
* You must have User-level or above project access to delete or rename feature lists, as well as to restart Autopilot.
You cannot add or remove features from a feature list. Instead, create a new feature list with all desired features.
Delete feature lists¶
Deleting a feature list also deletes any models in the project that were built with that list. Only custom feature lists can be deleted (no next to the name). If you click to delete a custom feature list that has been used for modeling, DataRobot warns with the number of models impacted:
You cannot use the delete function if the feature list is:
- An automatically created list.
- The default modeling list for the project.
- Configured as a monotonic constraint feature list for the project.
- Used as the input feature list to create the modeling dataset for a time series project.
- Used in a model deployment (the model and its feature lists cannot be deleted until after the deployments are deleted).
Edit names and descriptions¶
When creating a custom feature list, you simply name the list in the initial dialog. From the Feature Lists tab you can append a description to the list. To add that description, or edit an existing description, highlight the list and click the pencil icon ().
You can change a description, but not a name, for a DataRobot-created list.
Rerun Autopilot on a feature list¶
After you build your models, you rerun Autopilot from the Feature Lists tab. This is helpful if you customized a feature list after running Autopilot and want to generate additional models.
If you restart while models are building for the project, DataRobot halts the feature list that is currently running (i.e., stops building new models with it) and restarts Autopilot, from the beginning, using the selected list.
This is the same action as rerunning Autopilot from the Configure modeling settings link available in the right-panel Worker Queue.
To rerun Autopilot with a custom feature list:
On the Data tab, click the Feature Lists tab.
Click the menu to the right of the feature list you want to use to build new models and select Rerun Autopilot.
In the Rerun Modeling window, select the Modeling mode and click Rerun.
Automatically created feature lists¶
DataRobot automatically creates several feature lists for each project. Note that:
- Time series feature lists differ from AutoML feature lists.
- Features created from a search for interactions result in different lists (appended with a plus (+) sign).
- A project's target feature is automatically added to every feature list.
The following describes the automatically created feature lists, although not all lists apply to a project.
|All Features (default)
|Includes all dataset features; performs no feature engineering.
|The default feature list if DataRobot does not detect target leakage. This list includes features that pass a "reasonableness" check that determines whether they contain information useful for building a generalizable model. For example, DataRobot excludes features it determines are low information or redundant, such as duplicate columns, a column containing all ones or reference IDs, a feature with too few values, and others. Informative features are sorted to the top of the Features list.
|Informative Features - Leakage Removed
|The default feature list if DataRobot detects target leakage. This list excludes feature(s) that are at risk of causing target leakage and any features providing little or no information useful for modeling. To determine what was removed, you can see these features labeled in the Data table with All Features selected.
|All features in the dataset, excluding user-derived features and including those excluded from the Informative Features list (e.g., duplicates, high missing values).
|Features that meet a certain threshold (an ACE score above 0.005) for non-linear correlation with the selected target. DataRobot calculates, for each entry in the Informative Features list, the feature’s individual relationship against the target. This list is not available until EDA2 completes.
|DR Reduced Features
|A subset of features, selected based on the Feature Impact calculation of the best non-blender model on the Leaderboard. DataRobot then automatically retrains the best non-blender model with this DR Reduced Features list, creating a new model. DataRobot compares the original and new models, selects the better one, and retrains this model at a higher sample size for model recommendation purposes. DR Reduced Features, in most cases, consists of the features that provide 95% of the accumulated impact for the model. If that number is greater than 100, only the top 100 features are included. If redundant feature identification is supported in the project, redundant features are excluded from DR Reduced Features. Note that this list is not created in Quick mode.
While not a feature list (not available for use to build models), the All Features selection sets the Project Data display to list all columns in the dataset as well as any additional transformed features.