# Publish a recipe

> Publish a recipe - Publish a recipe to push down transformations to your data source and generate an
> output dataset.

This Markdown file sits beside the HTML page at the same path (with a `.md` suffix). It summarizes the topic and lists links for tools and LLM context.

Companion generated at `2026-05-06T18:17:10.054620+00:00` (UTC).

## Primary page

- [Publish a recipe](https://docs.datarobot.com/en/docs/workbench/nxt-workbench/dataprep/wrangle-data/pub-recipe.html): Full documentation for this topic (HTML).

## Sections on this page

- [Publish to your data source](https://docs.datarobot.com/en/docs/workbench/nxt-workbench/dataprep/wrangle-data/pub-recipe.html#publish-to-your-data-source): In-page section heading.
- [Configure downsampling](https://docs.datarobot.com/en/docs/workbench/nxt-workbench/dataprep/wrangle-data/pub-recipe.html#configure-downsampling): In-page section heading.
- [Configure smart downsampling](https://docs.datarobot.com/en/docs/workbench/nxt-workbench/dataprep/wrangle-data/pub-recipe.html#configure-smart-downsampling): In-page section heading.
- [Publishing re-wrangled datasets](https://docs.datarobot.com/en/docs/workbench/nxt-workbench/dataprep/wrangle-data/pub-recipe.html#publishing-re-wrangled-datasets): In-page section heading.
- [Read more](https://docs.datarobot.com/en/docs/workbench/nxt-workbench/dataprep/wrangle-data/pub-recipe.html#read-more): In-page section heading.

## Related documentation

- [NextGen UI documentation](https://docs.datarobot.com/en/docs/workbench/index.html): Linked from this page.
- [Workbench](https://docs.datarobot.com/en/docs/workbench/nxt-workbench/index.html): Linked from this page.
- [Data preparation](https://docs.datarobot.com/en/docs/workbench/nxt-workbench/dataprep/index.html): Linked from this page.
- [Prepare data](https://docs.datarobot.com/en/docs/workbench/nxt-workbench/dataprep/wrangle-data/index.html): Linked from this page.
- [considerations](https://docs.datarobot.com/en/docs/workbench/nxt-workbench/dataprep/data-faq/index.html#wrangle-data): Linked from this page.
- [DataRobot's file size requirements](https://docs.datarobot.com/en/docs/reference/data-ref/file-types.html): Linked from this page.

## Documentation content

Once the recipe is built and the live sample looks ready for modeling, you can publish the recipe, pushing it down as a query to the data source. There, the query is executed by applying the recipe to the entire dataset and materializing a new output dataset. The output is sent back to DataRobot and added to the Use Case.

See the associated [considerations](https://docs.datarobot.com/en/docs/workbench/nxt-workbench/dataprep/data-faq/index.html#wrangle-data) for important additional information.

> [!NOTE] Publishing large datasets
> When publishing a wrangling recipe for input datasets larger than 20GB, you can push the data transformations and analysis down to a DataRobot compute engine for scalable and secure data processing of CSV and Parquet files stored in S3. Note that this is only available for AWS SaaS and VPC installations.
> 
> Feature flag OFF by default: Enable Distributed Spark Support for Data Engine

To publish a recipe:

1. After you're done wrangling a dataset, open theRecipe actionsdropdown and selectPublish.
2. Enter a name for the output dataset. DataRobot will use this name to register the dataset in the AI Catalog and Data Registry.
3. (Optional) ConfigureAutomatic downsampling.
4. ClickPublish. DataRobot sends the published recipe to Snowflake and where it is applied to the source data to create a new output dataset. In DataRobot, the output dataset is registered in the Data Registry and added to your Use Case.

## Publish to your data source

When you publish a wrangling recipe, those operations and settings are pushed down into your virtual warehouse, allowing you to leverage the security, compliance, and financial controls specified within its environment. Selecting this option materializes an output dynamic dataset in DataRobot's Data Registry as well as your data source.

> [!WARNING] Required permissions
> You must have `write` access to the selected schema and database.

To enable in-source materialization (for Snowflake in this example):

1. In thePublishing Settingsmodal, clickPublish to Snowflake.
2. Select the appropriate SnowflakeDatabaseandSchemausing the dropdowns.
3. From here, you can:

## Configure downsampling

Automatic downsampling is a technique used to reduce the size of a dataset by reducing the size of the majority class using random sampling.  Consider enabling automatic downsampling if the size of your source data exceeds that of [DataRobot's file size requirements](https://docs.datarobot.com/en/docs/reference/data-ref/file-types.html).

To configure downsampling:

1. Enable theAutomatic downsamplingtoggle in thePublishing Settingsmodal.
2. Specify theMaximum number of rowsandEstimated sizein megabytes.

## Configure smart downsampling

You can use smart downsampling to reduce the size of your output dataset when publishing a wrangling recipe. Smart downsampling is a data science technique to reduce the time it takes to fit a model without sacrificing accuracy; it is particularly useful for imbalanced data. This downsampling technique accounts for class imbalance by stratifying the sample by class. In most cases, the entire minority class is preserved, and sampling only applies to the majority class. Because accuracy is typically more important on the minority class, this technique greatly reduces the size of the training dataset (reducing modeling time and cost), while preserving model accuracy.

To configure smart downsampling:

1. Enable theAutomatic downsamplingtoggle and clickSmart.
2. Populate the following fields:

> [!NOTE] Note
> Any rows with `null` as a value in the target column will be filtered out after smart downsampling.

## Publishing re-wrangled datasets

If you're publishing a recipe for a dataset that you've previously wrangled and published, there are two additional settings:

| Setting | Description |
| --- | --- |
| Publish as a new dataset version | When published, the output dataset is registered as new version of the wrangled dataset. |
| Publish as a new dataset | When published, the output dataset is registered as a separate dataset. |

## Read more

To learn more about the topics discussed on this page, see:

- Snowflake documentation on pushdown.
- DataRobot file size requirements.
