Skip to content

Click in-app to access the full platform documentation for your version of DataRobot.

Fundamentals of working with data

High-quality data is integral to the ML workflow—from importing and cleaning data to transforming and engineering features, from scoring with prediction datasets to deploying on a prediction server—data is critical.

DataRobot provides Pipeline workspaces to help with your data prep, management, and processing needs.

You can build and run ML data flow pipelines—collections of connected modules that process data and pass the output to subsequent modules for further processing.

DataRobot also provides tools to help you clean and prepare your data, including Data Prep (formerly Paxata), a general purpose data preparation tool, and time series data prep, which helps mitigate the effects of gaps in your time series data.

You can transform data directly in DataRobot, as well as discover and generate new features from multiple datasets using Feature Discovery. Here you define relationships among features in multiple datasets. You can then view the lineage of derived features:

DataRobot helps you assess the quality of the relationship configurations and provides you with tools to review and manage the derived features, like the Feature Lineage window shown here.

Data sources

To begin the modeling process, you import from data sources, for example, from the AI Catalog, configured data connections, local files, and URLs:

You can:

  • Import data into the AI Catalog then select a dataset from there.

  • Import data directly from a connected data source.

  • Import a local file or specify a URL from where to pull the data.

After you upload the data, DataRobot conducts exploratory data analysis, analyzing the quality of the data, and displays the results in the Data Quality Assessment window.


Updated May 31, 2022
Back to top