Skip to content

On-premise users: click in-app to access the full platform documentation for your version of DataRobot.

Wrangle data

Video: Data wrangling

Availability information

The ability to perform wrangling and pushdown on datasets stored in the Data Registry is off by default. Contact your DataRobot representative or administrator for information on enabling the feature.

To wrangle Data Registry datasets, you must first add the dataset to your Use Case. Then, you can begin wrangling from the Actions menu next to the dataset. Note that this feature is only available for multi-tenant SaaS users and installations with AWS VPC or Google VPC environments.

Feature flag: Enable Wrangling Pushdown for Data Registry Datasets

DataRobot's wrangling capabilities provide a seamless, scalable, and secure way to access and transform data for modeling. In Workbench, "wrangle" is a visual interface for executing data cleaning at the source, leveraging the compute environment and distributed architecture of your data source.

When you click Wrangle, DataRobot pulls a uniform random sample of 10000 rows and calculates exploratory data insights on that sample, all while connected to your data source. Then, you build a recipe of operations you want to apply to the entire dataset—the transformations are first applied to the live sample to make sure it's being done correctly. When the recipe is ready to be published, it's pushed down to the data source where it's executed to materialize an output dataset.

Why wrangle data in DataRobot?

  • It's fully integrated in Workbench—find the right datasets, apply transformations, and in realtime, see the effects of those transformations on your dataset in one place.
  • It's pushed down—leverage the scale of your cloud data warehouse or lake.
  • It's secure—limiting data movement means faster results, better performance, and enhanced security.

This section covers the following topics:

Topic Description
Build a recipe Build a recipe to interactively prepare data for modeling without moving it from your data source.
Publish a recipe Publish a recipe to push down transformations to your data source and generate an output dataset.

Updated July 10, 2024