Skip to content

Click in-app to access the full platform documentation for your version of DataRobot.

Pipelines

DataRobot Pipelines enable data science and engineering teams to build and run machine learning data flows. Teams start by collecting data from various sources, cleaning them, and combining them. They standardize the values among other data preparation operations to build a dataset at the unit of analysis.

To make repeatable data extraction and preparation easier, teams often build a data pipeline—a set of connected data processing steps—so that they can prepare data for training models, making predictions, and applying to other relevant use cases.

With DataRobot Pipelines, you connect to data sources of varied formats and transform data to build and orchestrate your machine learning data flows.

This section describes how to work with workspaces and pipelines:

Topic Describes...
Pipeline workspaces Add and edit workspaces.
Compose a pipeline Add and connect modules to build a pipeline.
Run a pipeline Run successfully compiled modules. You can run a module alone or as part of a path.
Modules
Import data Bring external data into the pipeline.
Generate batch predictions Generate batch predictions to use as an input dataset, as well as perform pre- and post- processing steps.
Transform data Use Spark SQL to create data transformations.
Export data Export data to configured data sources, for example, the AI Catalog and S3.

For information on pipeline data processing limits for each module type, see the Dataset requirements page.


Updated May 31, 2022
Back to top