Skip to content

On-premise users: click in-app to access the full platform documentation for your version of DataRobot.

Publish a recipe

Once the recipe is built and the live sample looks ready for modeling, you can publish the recipe, pushing it down as a query to the data source. There, the query is executed by applying the recipe to the entire dataset and materializing a new output dataset. The output is sent back to DataRobot and added to the Use Case.

See the associated considerations for important additional information.

To publish a recipe:

  1. After you're done wrangling a dataset, click Publish recipe.

  2. Enter a name for the output dataset. DataRobot will use this name to register the dataset in the AI Catalog and Data Registry.

  3. (Optional) Configure Automatic downsampling.

  4. Click Publish.

    DataRobot sends the published recipe to Snowflake and where it is applied to the source data to create a new output dataset. In DataRobot, the output dataset is registered in the Data Registry and added to your Use Case.

Publish to your data source

Public preview

In-source materialization for Databricks, and the associated feature flags, are on by default.

Feature flags:

  • Enable Databricks Driver
  • Enable Databricks Wrangling
  • Enable Databricks In-Source Materialization in Workbench
  • Enable Dynamic Datasets in Workbench

When you publish a wrangling recipe, those operations and settings are pushed down into your virtual warehouse, allowing you to leverage the security, compliance, and financial controls specified within its environment. Selecting this option materializes an output dynamic dataset in DataRobot's Data Registry as well as your data source.

Required permissions

You must have write access to the selected schema and database.

To enable in-source materialization (for Snowflake in this example):

  1. In the Publishing Settings modal, click Publish to Snowflake.

  2. Select the appropriate Snowflake Database and Schema using the dropdowns.

  3. From here, you can:

Configure downsampling

Automatic downsampling is a technique used to reduce the size of a dataset by reducing the size of the majority class using random sampling. Consider enabling automatic downsampling if the size of your source data exceeds that of DataRobot's file size requirements.

To configure downsampling:

  1. Enable the Automatic downsampling toggle in the Publishing Settings modal.

  2. Specify the Maximum number of rows and Estimated size in megabytes.

Configure smart downsampling

Public preview

Smart downsampling in Workbench is on by default.

Feature flag: Enables Smart Downsampling in Wrangle Publishing Settings

You can use smart downsampling to reduce the size of your output dataset when publishing a wrangling recipe. Smart downsampling is a data science technique to reduce the time it takes to fit a model without sacrificing accuracy; it is particularly useful for imbalanced data. This downsampling technique accounts for class imbalance by stratifying the sample by class. In most cases, the entire minority class is preserved, and sampling only applies to the majority class. Because accuracy is typically more important on the minority class, this technique greatly reduces the size of the training dataset (reducing modeling time and cost), while preserving model accuracy.

To configure smart downsampling:

  1. Enable the Automatic downsampling toggle and click Smart.

  2. Select a Target feature—a binary classification or zero-inflated feature. If the dataset does not contain either feature type, the option to apply smart downsampling is unavailable.

  3. (Optional) Enter a name for the Weights feature. This column, which contains downsampling weights, is computed and added to your output dataset as a result of smart downsampling.

  4. Enter the desired Maximum number of rows or Estimated Size (MB). These values are linked, so if you change the value in one field, the other field updates automatically. See DataRobot's dataset requirements to ensure the output dataset is below the file size limit.

Note

Any rows with null as a value in the target column will be filtered out after smart downsampling.

Next steps

From here, you can:

Read more

To learn more about the topics discussed on this page, see:


Updated February 1, 2024