Skip to content

On-premise users: click in-app to access the full platform documentation for your version of DataRobot.

Interact with the live sample

When you click Wrangle, DataRobot pulls a uniform random sample of 10000 rows and calculates exploratory data insights on that sample, all while connected to your data source. Then, you build a recipe of operations you want to apply to the entire dataset—the transformations are first applied to the live sample to make sure it's being done correctly. When the recipe is ready to be published, it's pushed down to the data source where it's executed to materialize an output dataset.

You can launch the data wrangler from the following areas in a Use Case:

Modify wrangling settings

In a recipe, you can modify the settings to make the summary information more descriptive for future use, as well as the number of rows included in the live preview.

Edit the recipe metadata

By default, DataRobot assigns a name and description to each wrangling recipe based on the source data, however, you can modify this information to make it more applicable to your specific use case.

To edit the recipe metadata:

  1. Hover on the field you want to edit—either the title or the description. Then, click the field or the pencil icon to the right.

  2. Modify the name or description, and when you're done, click outside the field or the check mark on the right to save your changes.

Configure the live sample

By default, DataRobot retrieves 10000 random rows for the live sample, however, you can modify this number and sampling method in the wrangling settings. Note that the more rows you retrieve, the longer it will take to render the live sample.

To configure the live sample:

  1. Click Settings in the right panel and open Preview sample.

  2. Select a Sampling method. Use the dropdown to select a sampling method—either Random, First-N Rows, or for wrangling time series data, Date/time.

  3. Specify the Number of rows to be retrieved from the source data. Enter the number of rows (under 10000) you want to include in the live sample and click Resample. The live sample updates to display the specified number of rows.

Analyze the live sample

During data wrangling, DataRobot performs exploratory data analysis on the live sample, generating table- and column-level summary statistics and visualizations that help you profile the dataset and recognize data quality issues as you apply operations. For more information on interacting with the live sample, see the section on exploratory data insights.

Speed up live sample

To speed up the time it takes to retrieve and render the live sample, use the toggle next to Show Insights to hide the feature distribution charts.

Live sample vs. exploratory data insights on the Data tab

Although both pages provide similar insights, you can specify the number of rows displayed in the live sample and it updates each time a transformation is added to your recipe.

Next steps

From here, you can:

Read more

To learn more about the topics discussed on this page, see:


Updated October 25, 2024