Data connections¶
In Workbench, you can easily configure and reuse secure connections to predefined data sources, allowing you to interactively browse, preview, and profile your data before using DataRobot's integrated data preparation capabilities.
See the associated considerations for important additional information.
Source IP addresses for allowing
Before setting up a data connection, make sure the source IPs have been allowed.
Connection support¶
You can connect to and add data from all connectors and JDBC drivers that are currently supported in DataRobot. For a full list of supported data connections, see Supported data stores.
Note that Snowflake, BigQuery, and Databricks connections use pushdown wrangling—all other connections use Spark wrangling.
The table below highlights the capabilities supported by each wrangling method:
| Wrangling method | Snapshot datasets | Dynamic datasets | Live preview | Wrangling | In-source materialization |
|---|---|---|---|---|---|
| Pushdown wrangling: Snowflake, BigQuery, Databricks | ✔ | ✔ | ✔ | ✔ | |
| Spark wrangling: snapshots uploaded from local files, public URLs, all supported connections | ✔ | ✔ |
JDBC driver capabilites
You can only add snapshot datasets from a JDBC driver connection.
Connect to a data source¶
Creating a data connection lets you explore external source data—from both connectors and JDBC drivers—and then add it to your Use Case.
To create a data connection:
-
From the Data assets tile, click Add data > Browse data in the upper-right corner, opening the Browse data modal.
-
Click + Add connection.
-
Choose either Structured for connections that support adding structured data, or Unstructured for connections that support unstructured data. Then, select a data store.
Now, you can configure the data connection.
Configure the connection¶
Note
When configuring your data connection, configuration types, authentication options, and required parameters are based on the selected data source. The example below shows how to configure Snowflake with OAuth using new credentials.
To configure the data connection:
-
With the Connection Configuration tab selected in the Edit Connection modal, choose a configuration method—either Parameters or JDBC URL.
-
Enter the required parameters for the selected configuration method.
-
Click New Credentials and select an authentication method—the available authentication methods are based on the selected connection.
Saved credentials
If you previously saved credentials for the selected data source, click Saved credentials and select the appropriate credentials from the dropdown.
-
Click Save in the upper right corner. If your browser window is small, you may need to scroll up.
If you selected OAuth as your authentication method, you will be prompted to sign in before you can select a dataset. See the list of supported data stores for more information about supported authentication methods and required parameters.
Select a dataset¶
Once you've set up a data connection, you can add datasets by browsing the database schemas and tables you have access to.
To select a dataset:
-
Select the schema associated with the table you want to add.
-
Select the box to the left of the appropriate table.
With a dataset selected, you can:
Element Description 1 Add to Use Case Adds the data asset to your Use Case, making it available to you and other team members. 2 Add from SQL query Allows you to use SQL queries to add data. 3 Settings Allows you to show, hide, and/or pin columns. 4 Actions menu Provides access to the following actions: - Preview: Open a snapshot preview to help determine if the dataset is relevant to your Use Case and/or if it needs to be modified in either Wrangler or the SQL Editor.
- Open in Wrangler: Perform data preparation before adding the asset to your Use Case.
- Open in SQL Editor: Create a recipe comprised of SQL queries that enrich, transform, shape, and blend datasets together.
Large datasets
If you want to decrease the size of the dataset before adding it to your Use Case, click Wrangle. When you publish a recipe, you can configure automatic downsampling to control the number of rows when Snowflake materializes the output dataset.
Edit a connection¶
From the Browse data modal, you can modify existing data connections, including configuration parameters, as well as associated credentials and data sources.
To edit a connection, hover over the data connection and click the edit icon .
See below for a description of each tab—what information is displayed on each and the available edit options:
On the Connection Configuration tab, you can modify connection parameters, including adding new parameters and selecting or creating new credentials.
The Data Sources tab displays all data assets associated with the connection. This includes data that has been manually added via the data connection in addition to those that are automatically added whenever a dataset or file is added from the connection to the Data Registry or Use Case. From here, you can:
| Element | Description | |
|---|---|---|
| 1 | Search | Allows you to search for specific credentials. |
| 2 | Columns | Displays the name and date when the data was last updated. |
| 3 | Actions menu | Provides access to the following actions:
|
The Credentials tab displays all Snowflake credentials that have been added by or shared with you. From here, you can:
| Element | Description | |
|---|---|---|
| 1 | Search | Allows you to search for specific credentials. |
| 2 | Columns | Displays the name, credential type, and date the credentials were first added. |
| 3 | Selected badge | Indicates the credentials currently in use by the data connection. |
| 4 | Actions menu | Provides access to the following actions:
|
When you're done editing the connection, click Save.
Read more¶
To learn more about the topics discussed on this page, see:










