The following sections describe DataRobot's automatic transformations. Transformed features do not replace the original, raw features; rather, they are provided as new, additional features for building models. For information on automated feature transformations DataRobot performs during the modeling process, see the Modeling process documentation.
Transformed features (including numeric features created as user-defined functions) cannot be used for special variables, such as Weight, Offset, Exposure, and Count of Events.
When DataRobot identifies a feature column as variable type date, it automatically creates transformations for qualifying features (see below the table) after EDA1 completes. When complete, the dataset can have up to four new features for each date column:
|Feature variable||Description||Variable type|
|Hour of Day||Numeric value representing a 24-hour period, 0-23. Data must contain one or more date columns and at least three different hours in the date field.||Numeric|
|Day of week||Numeric and text value representing the day of the week, where 0 corresponds to Monday (for example, 0: Monday, 2: Wednesday, 5: Saturday). Data must contain at least three different weeks.||Categorical|
|Day of Month||The day of the month, 1-31. Data must contain at least three different years.||Numeric|
|Month||Numeric value representing the month, 1-12. Data must contain at least three different years.||Categorical|
|Year||Data must contain at least three different years.||Numeric|
Date features are not automatically extracted if:
- there are 10 or more date and/or time columns in the dataset
- transformed features would not be informative (e.g., if there is only 1 year of data there is no need to extract year)
- transformed features risk overfitting (e.g., with 1 year of data, modeling on month cannot identify full seasonal effects)
The new derived features are included in the Informative Features feature list and used for Autopilot. DataRobot also maintains the original date column. Note, however, that the original raw date is excluded from Informative Features if all four features listed above were extracted (that is, the dataset included at least three years of data). The following is an example of a dataset that contains over 10 years' worth of data. As a result, DataRobot created new features for all four date columns:
If any of the automatically-transformed date features are duplicates of existing features in the dataset, they are not included in the Informative Features list. As an example, assume you add a date-type column containing the manufacturing year, “MfgYear”, to the dataset prior to ingestion. DataRobot marks the transformed feature, "MfgYear(Year)”, as a duplicate and excludes it from Informative Features. If, however, the automatically-transformed feature has a different type than the original column, it is included in Informative Features.