Python module in pipelines¶
The Python module is off by default. Contact your DataRobot representative or administrator for information on enabling the feature.
Feature flag: Enable Pipeline Workspaces in the AI Catalog and Enable Workspace Experimental Modules
The Python module, now available for public preview, provides code-centric users with a built-in API to perform data transformations not currently covered in existing modules, including domain-specific transformations. This module supports multiple input and output ports.
When adding a module to your pipeline, the Python module is listed under Transform on the Add new module window.
With the module selected, click the Details tab to access a text editor with syntax highlighting and auto-indentation. The module provides basic code to read from the input port (1) and write to the output port (2) (similar to the SQL module).
Code in Python using popular dependencies, including a pre-configured DataRobot public API, pandas, and pyarrow, to execute data transformations. See the Python module API reference.
The example below users the pandas
df.groupby function to group the
gender columns, a transformation not otherwise possible in other modules.
After running the pipeline, you can view the results in the Results tab; a separate tab is displayed for each configured output port. You can also use runtime logs—only available while the module is running and shortly after—to debug your code if the run results in an error.