Databricks¶
Connecting to Databricks is currently certified through Azure and AWS.
Public preview
Support for Databricks is on by default.
Feature flag(s):
- Enable Databricks Driver: Allows you to connect to and add snapshotted data from Databricks.
- Enable Databricks Wrangling: Allows you to perform data wrangling on Databricks datasets in Workbench.
- Enable Databricks In-Source Materialization in Workbench: Allows you to materialize wrangled datasets in Databricks as well as the Data Registry.
- Enable Dynamic Datasets in Workbench: Allows you to add dynamic Databricks data to a Use Case—enabling the ability to view live samples, perform data wrangling, and initiate in-source materialization.
Supported authentication¶
- Personal access token
Prerequisites¶
The following is required before connecting to Databricks in DataRobot:
- A Databricks workspace in the Azure Portal app
- Data stored in an Azure Databricks database
- A Databricks workspace in AWS
- Data stored in an AWS Databricks database
Generate a personal access token¶
In the Azure Portal app, generate a personal access token for your Databricks workspace. This token will be used to authenticate your connection to Databricks in DataRobot.
See the Azure Databricks documentation.
In AWS, generate a personal access token for your Databricks workspace. This token will be used to authenticate your connection to Databricks in DataRobot.
See the Databricks on AWS documentation.
Set up a connection in DataRobot¶
To connect to Databricks in DataRobot (note that this example uses Azure):
- Open Workbench and select a Use Case.
- Follow the instructions for connecting to a data source.
-
With the information retrieved in the previous section, fill in the required configuration parameters.
-
Under Authentication, click New credentials. Then, enter your access token and a unique display name. If you've previously added credentials for this data source, you can select it from your saved credentials.
-
Click Save.
Required parameters¶
The table below lists the minimum required fields to establish a connection with Databricks:
Required field | Description | Documentation |
---|---|---|
Server Hostname | The address of the server to connect to. | Azure Databricks documentation |
HTTP Path | The compute resources URL. | Azure Databricks documentation |
Required field | Description | Documentation |
---|---|---|
Server Hostname | The address of the server to connect to. | Databricks on AWS documentation |
HTTP Path | The compute resources URL. | Databricks on AWS documentation |
SQL warehouses are dedicated to execute SQL, and as a result, have less overhead than clusters and often provide better performance. It is recommended to use a SQL warehouse if possible.
Note
If the catalog
parameter is specified in a connection configuration, Workbench will only show a list of schemas in that catalog. If this parameter is not specified, Workbench lists all catalogs you have access to.
Troubleshooting¶
Problem | Solution | Instructions |
---|---|---|
When attempting to execute an operation in DataRobot, the firewall requests that you clear the IP address each time. | Add all allowed IPs for DataRobot. | See Allowed source IP addresses. If you've already added the allowed IPs, check the existing IPs for completeness. |