Skip to content

On-premise users: click in-app to access the full platform documentation for your version of DataRobot.

Data connections

In Workbench, you can easily configure and reuse secure connections to predefined data sources. Not only does this allow you to interactively browse, preview, and profile your data, it also gives you access to DataRobot's integrated data preparation capabilities.

See the associated considerations for important additional information.

Source IP addresses for allowing

Before setting up a data connection, make sure the source IPs have been allowed.

Public preview

Support for dynamic datasets in Workbench is on by default.

When this feature is enabled:

  • Datasets added via a data connection will be registered as dynamic datasets in the Data Registry and Use Case.
  • Dynamic datasets added via a connection will be available for selection in the Data Registry.
  • DataRobot will pull a new live sample when viewing Exploratory Data Insights for dynamic datasets.

Feature flag: Enable Dynamic Datasets in Workbench

Supported connections

Workbench currently supports the following connections:

Connection Notes
Snowflake See the documentation for required parameters and additional information.
BigQuery See the documentation for required parameters and additional information.
Databricks
(public preview)
See the documentation for required parameters and additional information.
S3
(public preview)
See the documentation for required parameters and additional information.
ADLS Gen2
(public preview)
See the documentation for required parameters and additional information.

Public preview

Support for Databricks in Workbench is on by default.

Feature flag(s): Enable Databricks Driver

Public preview

Support for AWS S3 in Workbench is on by default.

Feature flag(s): Enable Native S3 Driver

Public preview

Support for ADLS Gen2 in Workbench is on by default.

Feature flag(s): Enable ADLS Gen2 Connector

For a complete list of available connections in Workbench and which features they support, see the connection capabilities table.

Connect to a data source

Creating a data connection lets you explore external source data and then add it to your Use Case.

To create a data connection:

  1. In a Use Case, click Add new > Add datasets. The Add data modal opens.

  2. Click Connect.

  3. Select the data source (Snowflake in this example).

    Now, you can configure the data connection.

Configure the connection

Note

When configuring your data connection, configuration types, authentication options, and required parameters are based on the selected data source. The example below shows how to configure Snowflake with OAuth using new credentials.

To configure the data connection:

  1. On the Configuration page, select a configuration method—either Parameters or JDBC URL.

  2. Enter the required parameters for the selected configuration method.

  3. Click New Credentials and select an authentication method—in this case, either Basic or OAuth.

    Saved credentials

    If you previously saved credentials for the selected data source, click Saved credentials and select the appropriate credentials from the dropdown.

  4. Click Save in the upper right corner.

    If you selected OAuth as your authentication method, you will be prompted to sign in before you can select a dataset. See the DataRobot Classic documentation for more information about supported authentication methods and required parameters.

Select a dataset

Once you've set up a data connection, you can add datasets by browsing the database schemas and tables you have access to.

To select a dataset:

  1. Select the schema associated with the table you want to add.

  2. Select the box to the left of the appropriate table.

    With a dataset selected, you can:

    Description
    1 Click Wrangle to prepare the dataset before adding it to your Use Case.
    2 Click Preview to open a snapshot preview to help determine if the dataset is relevant to your Use Case and/or if it needs to be wrangled.
    3 Click Add to Use Case to add it to your Use Case, making it available to you and other team members on the Datasets tab.
    Large datasets

    If you want to decrease the size of the dataset before adding it to your Use Case, click Wrangle. When you publish a recipe, you can configure automatic downsampling to control the number of rows when Snowflake materializes the output dataset.

Next steps

From here, you can:

Read more

To learn more about the topics discussed on this page, see:


Updated February 16, 2024