Manual transformations¶
The following sections describe manual, user-created transformations. Transformed features do not replace the original, raw features; rather, they are provided as new, additional features for building models.
Note
Transformed features (including numeric features created as user-defined functions) cannot be used for special variables, such as Weight, Offset, Exposure, and Count of Events.
Create transformations¶
DataRobot supports different transformations that you can apply to your data, including taking the natural logarithm, squaring, and running functions on numeric data. (You can also change the variable type for features.) These transformations are only available when it is appropriate to the feature type. The following steps describe creating a user transformation.
-
Hover over a feature available for transformation and click the orange arrow to the left of the feature name to expose the Transformations menu:
-
Select a transformation. If you select the natural log
log(<feature>)
or squaring<feature>^2
options, transformation is computed immediately and the new derived feature created. -
If you select the function option
f(<feature>)
, a dialog for adding a new transformation appears.- In the New feature name field, type a name for this transformation. You can create multiple function-based transformations for a feature.
- Type the function and feature(s), using the supported syntax.
- Click Create to create the transformation.
Note that you can also access this functionality from the menu:
The transformed feature appears under the original feature in the Data page (all features). It can be included in any new feature lists and can also be used for modeling. When using a model that contains transformed features for predictions, DataRobot automatically includes the new feature in any uploaded dataset.
As with other features, you can view the histogram, charted frequent values, and a table of values by clicking the feature name. However, instead of allowing further variable type transformations, the display compares the transformed feature with the parent feature:
Variable type transformations¶
DataRobot bases variable type assignment on the values seen during EDA and then lists the variable type for each feature in your dataset on the Data page. There are times, however, when you may need to change the type. For example, area codes may be interpreted as numeric but you would rather they map to categories. Or a categorical feature may be encoded as a number (that is intended to map to a feature value, such as 1=yes, 2=no
) but without transformation is interpreted as a number.
There are certain cases where variable type transforms are not available. These include columns that DataRobot has identified as special columns for both integral and float values. (Date columns are a special case and do support transforms. See the description of single feature transformations.) Additionally, a column that is all numeric except for a single unique non-numeric value is treated as special. In this case, DataRobot converts the unique value to NaN and disallows conversion to prevent losing the value.
Note
When converting from numeric variable types to categorical, be aware that DataRobot drops any values after the decimal point. In other words, the value is truncated to become an integer. Also, when transforming floats with missing values to categorical, the new feature is converted, not rounded. For example, 9.9 becomes 9, not 10.
Tip
When making predictions DataRobot expects the columns in the prediction data to be the same as the original data. If a model uses the original variable plus the transformed variable, the prediction data must use the original feature name. DataRobot will calculate the derived features internally.
You can transform the variable type of many features at the same time (using a batch transformation), or one feature at a time.
Multiple feature transformations¶
To modify the variable type for multiple features as a single batch operation, use the Change Variable Types option from the menu. This option is useful, for example, if you want to transform all features of one variable type.
You can select to transform all features or multiple features for a specific variable type to another variable type. For example, you could change all Categorical features to Text, or you can pick specific Categorical features to transform to Text. All new features created using batch variable type transformations are available from the Data page, in the All Features list (although you can transform features from any feature list, when batch transformation completes, you need to view all features on the Data page). You can add the new features to other feature lists.
Note
Keep the following in mind when transforming multiple features at the same time:
- A feature that is the result of a previous transformation operation cannot be selected for transformation.
- All features selected for batch transformation must be of the same variable type.
If DataRobot does not let you transform the features, you should correct the list of features and try the transformation again.
-
First, select the features to transform using one of the following methods:
- If you want to manually-select features: Select each feature you want to transform (1). Make sure you do not select any previously-transformed features (2) and that all features you select are of the same variable type (3).
- If you want to select all features for a variable type: From the menu, under Select Features by Var Type, select the variable type you want to transform for the dataset. Only variable types present in the dataset can be selected.
All features of that variable type are shown as selected in the Data page (i.e., checks in the left-hand boxes).
-
From the menu, under Actions, click Change Variable Types. (If the link is disabled, there is an issue with the selected features. Hover over the disabled link to see the reason DataRobot cannot transform the selected features. See Transform options and syntax for details.)
The Change Variable Type dialog appears.
Note
DataRobot supports transforming up to 500 features at a time. If you see a message indicating more than 500 features are selected for transformation, you need to deselect features.
-
Configure how to create the new features for the selected variable type.
Component | Description |
---|---|
Selected features (1) | Identifies the number of selected features, the variable type for the features, and the names of all features selected for transformation. |
Change variable type option (2) | Shows the selected variable and prompts you to select the target variable type for the transformation. DataRobot performs specific transformations for numeric variable types. |
Prefix for new features (3) | Provides a prefix to apply to the original feature names to create the transformed feature names. You can keep the default prefix (Updated_) or create your own. If creating a prefix, do not include - " . { } / \ . The names of the new features must have a suffix, prefix, or both; if a suffix is defined, then a prefix is not required. |
Suffix for new features (4) | Provides a suffix to apply to the original feature names to create the transformed feature names. You can keep the default suffix, which is the new variable type, or create your own. (If creating a suffix, do not include - " . { } / \ . The names of the new features must have a suffix, prefix, or both; if a prefix is defined, then a suffix is not required. |
New Feature Names (5) | Shows how the new (transformed) features will be named: prefix_[original feature name]_suffix (using the actual prefix and/or suffix). |
Change (6) | Creates new features for all selected features, for the target variable type. |
When you click Change, DataRobot creates a list of the selected features and submits them for variable type transformation. A message indicates features have been selected and transformation has started:
DataRobot creates the transformed features in the background. As each new (transformed) feature completes finishes processing, it is shown in the Data page (all features). Depending on the number of features selected for transformation, it may take several minutes for all new features to finish transformation and become available. A message indicates when all transformations are complete:
Feature transformation limit¶
DataRobot supports transforming up to 500 features at a time and will show a message in the Change Variable Types dialog if you select more than 500:
If this is the case, you need to deselect features so that only 500 or fewer features are selected. To do this, close the dialog and, in the Data page, deselect features:
Then, when 500 or fewer features are selected for transformation, select Change Variable Type.
Single feature transformations¶
To modify the variable type for a single feature, use one of the following methods:
- View the Transformations menu for the feature and click Change Var Type, or
- View the histogram for the feature and click Var Type Transform.
Both methods open the same dialog, which will vary depending on the variable type for the selected feature.
The following table explains the settings for a categorical transformation:
Component | Description |
---|---|
Current variable type transformation (1) | Displays the current variable type assigned to the feature. |
Transformation options (2) | Selects a new feature type, via the dropdown, from the available variable types for the current feature. DataRobot performs specific transformations for numeric and categorial variable types. |
New Feature Name (3) | Provides a field to rename the new feature. By default, DataRobot uses the existing feature name with the new variable type appended. |
Feature list application (4) | Selects which feature list the new feature is added to. Select to add to "All Features" or use the dropdown (5) to add it to a specific list instead. |
Feature list selection (5) | Provides a dropdown selection of feature lists from the project, allowing you to select which list to add the feature to. |
Create Feature (6) | Creates the new feature. The new feature is then listed below the original on the Data page. |
You can create any number of transformations from the same feature. By default, DataRobot applies a unique name to each transformation. If you inadvertently create duplicate features, DataRobot marks them as such and ignores them in processing.
The following is an example of date transformation, which allows you to select which date-specific derivations to apply. You can also select whether the result should be considered a categorical or numeric value.
Here's an example of a numeric to categorical transformation:
Transform options and syntax¶
DataRobot uses a subset of Python's Numexp package to create user transformations of column values (features). When you select the function option, f()
, from the Transformations menu, a dialog for entering the user transformation syntax appears. The following describes DataRobot's application of Numexpr
and provides some examples.
Note
The DataRobot API supports only variable type transformations.
To create transformations, enter feature name(s) within curly braces {}
and apply the appropriate function and operator. DataRobot provides auto-completion for feature names; if you click after the initial curly brace, you can select from the list of displayed features.
Note that:
- Feature names are case-sensitive.
- You cannot transform features of variable type date. Instead, create new features out of the derived date features (for example,
Timestamp (Hour of Day
). - You cannot do a feature transformation on the target.
Allowed functions | Description |
---|---|
log({feature}) |
natural logarithm |
sqrt({feature}) |
square root |
abs({feature}) |
absolute value |
where({feature1} operator {feature2}, value-if-true, value-if-false) |
if-then-else functionality |
The following lists the allowed binary arithmetic operators. Use parentheses to group and order operations, for example (1 + 2) * (3 + 4)
. You can reference multiple features in a single transform, for example {number_inpatient} + {num_medications}
.
Supported arithmetic operators are:
- + (addition)
- – (subtraction)
- * (multiplication)
- / (division)
- ** (exponentiation)
You can also use comparison operators, but they must be wrapped within a where()
function (i.e., where({feature1} operator {feature2}, value-if-true, value-if-false)
. Supported comparison operators are:
- < , > (less than, greater than)
- == (equal to)
- != (not equal to)
- <= , >= (less than or equal to, greater than or equal to)
Comparison operators with missing values¶
If there are missing values (NaN) in the dataset, applying transformations that compare feature values to the input values requires special consideration. If you do create statements using comparison operators with NaN values—and the goal is that the derived feature returns NaN if the original feature is NaN—be sure to compare results against the expected behavior.
For example, transforming the feature sales
to excellent_sales
using the following statement will always return False
if sales is NaN
. Even if there are missing values in the data for the feature sales
, missing values will not be returned in the result:
Excellent_Sales = where({Sales}>300000,1,0)
If this is not the desired result, consider an expression like the following:
Excellent_Sales = where(~({Sales} > 300000) & ~({Sales} <= 300000), {Sales}, where({Sales} > 300000, 1,0))
Some transformation examples:
OrderOfMag = log({NumberOfLabs})
Success = sqrt({sales} + 10)
CostBreakdown = abs({sales} - {costs})
IsRich = where({YearlyIncome} > 1000000, 1, 0)