Skip to content

Click in-app to access the full platform documentation for your version of DataRobot.

Databricks

Connecting to Databricks is currently certified through Azure and AWS.

Public preview

Support for Databricks is on by default.

Feature flag(s):

  • Enable Databricks Driver: Allows you to connect to and add snapshotted data from Databricks.
  • Enable Databricks Wrangling: Allows you to perform data wrangling on Databricks datasets in Workbench.
  • Enable Databricks In-Source Materialization in Workbench: Allows you to materialize wrangled datasets in Databricks as well as the Data Registry.
  • Enable Dynamic Datasets in Workbench: Allows you to add dynamic Databricks data to a Use Case—enabling the ability to view live samples, perform data wrangling, and initiate in-source materialization.

Supported authentication

  • Personal access token

Prerequisites

The following is required before connecting to Databricks in DataRobot:

Generate a personal access token

In the Azure Portal app, generate a personal access token for your Databricks workspace. This token will be used to authenticate your connection to Databricks in DataRobot.

See the Azure Databricks documentation.

In AWS, generate a personal access token for your Databricks workspace. This token will be used to authenticate your connection to Databricks in DataRobot.

See the Databricks on AWS documentation.

Set up a connection in DataRobot

To connect to Databricks in DataRobot (note that this example uses Azure):

  1. Open Workbench and select a Use Case.
  2. Follow the instructions for connecting to a data source.
  3. With the information retrieved in the previous section, fill in the required configuration parameters.

  4. Under Authentication, click New credentials. Then, enter your access token and a unique display name. If you've previously added credentials for this data source, you can select it from your saved credentials.

  5. Click Save.

Required parameters

The table below lists the minimum required fields to establish a connection with Databricks:

Required field Description Documentation
Server Hostname The address of the server to connect to. Azure Databricks documentation
HTTP Path The compute resources URL. Azure Databricks documentation
Required field Description Documentation
Server Hostname The address of the server to connect to. Databricks on AWS documentation
HTTP Path The compute resources URL. Databricks on AWS documentation

SQL warehouses are dedicated to execute SQL, and as a result, have less overhead than clusters and often provide better performance. It is recommended to use a SQL warehouse if possible. 

Note

If the catalog parameter is specified in a connection configuration, Workbench will only show a list of schemas in that catalog. If this parameter is not specified, Workbench lists all catalogs you have access to.

Troubleshooting

Problem Solution Instructions
When attempting to execute an operation in DataRobot, the firewall requests that you clear the IP address each time. Add all allowed IPs for DataRobot. See Allowed source IP addresses. If you've already added the allowed IPs, check the existing IPs for completeness.

Updated November 2, 2023
Back to top