Pass features into a task
Certain features of the same data type may need to be processed differently than others. For example, suppose you are working on solving a problem with a dataset containing text features. One of which lends itself well to using word-grams for preprocessing, while the other uses char-grams.
When using Composable ML in DataRobot, you can pass one or more specific features to another task.
When using project-specific functionality, DataRobot recommends running the following code.
w.set_project(project_id="<project_id>")
# or
# w = Workshop(project_id="<project_id>")
In this example, select the Age feature, perform missing value imputation, and pass it to the Keras neural network classifier. Note that similar to other pieces of functionality, you may auto-complete feature names with w.Features.<tab>
to complete available features.
features = w.FeatureSelection(w.Features.Age)
pni = w.Tasks.PNI2(features)
keras = w.Tasks.KERASC(pni)
keras_blueprint = w.BlueprintGraph(keras)
You may link a blueprint to a specific project if desired, ensuring the blueprint is validated based on the linked project, for example, to confirm that the selected features exist in the dataset associated with the project.
# Make sure it is saved at least once, or pass `user_blueprint_id` to `link_to_project`
keras_blueprint.save()
keras_blueprint.link_to_project(project_id="<project_id>")
To only pass a desired column into a task, add the Task Single Column Converter or Multiple Column Converter. Then, pick the column name from the original dataset as the parameter column_name or column_names. The following task(s) will only receive the selected column(s).
Click Update and then Save Blueprint to see the new task referencing the chosen column. Here's an example of a blueprint performing specific preprocessing on certain columns. Notice how each column name is observable.
Continuing with this example, you can also pass all columns to another task. To do so, add a new connection from Numeric Variables to the desired task.
You may link a blueprint to a specific project if desired, ensuring the blueprint is validated based on the linked project; for example, to confirm that the selected features exist in the dataset associated with the project
Features may also be excluded instead, which is particularly useful when a particular feature should be processed one way, and everything else, processed another way.
without_insurance_type = w.FeatureSelection(w.Features.Insurance_Type, exclude=True)
only_insurance_type = w.FeatureSelection(w.Features.Insurance_Type)
one_hot = w.Tasks.PDM3(without_insurance_type)
ordinal = w.Tasks.ORDCAT2(only_insurance_type)
keras = w.Tasks.KERASC(one_hot, ordinal)
keras_blueprint = w.BlueprintGraph(keras)
To process certain features in different ways, add the Task Multiple Column Converter. This task lets you select columns. You can give it a list with several columns that you want to include and the rest will be dropped (using the parameter column_names). Alternatively, you can instead provide a list of several columns that you would like to use.
Next, create an edge from the categorical data to the modeler, insert the alternative processing task, then add a second Multiple Column Converter and pick the same column name and change method to be exclude.
Now, one column is processed using one task, and all others are processed with a different task.