Connecting to Databricks is currently certified through Azure and AWS.
Support for Databricks is on by default.
- Enable Databricks Driver: Allows you to connect to and add snapshotted data from Databricks.
- Enable Databricks Wrangling: Allows you to perform data wrangling on Databricks datasets in Workbench.
- Enable Databricks In-Source Materialization in Workbench: Allows you to materialize wrangled datasets in Databricks as well as the Data Registry.
- Enable Dynamic Datasets in Workbench: Allows you to add dynamic Databricks data to a Use Case—enabling the ability to view live samples, perform data wrangling, and initiate in-source materialization.
- Personal access token
The following is required before connecting to Databricks in DataRobot:
Generate a personal access token¶
In the Azure Portal app, generate a personal access token for your Databricks workspace. This token will be used to authenticate your connection to Databricks in DataRobot.
See the Azure Databricks documentation.
In AWS, generate a personal access token for your Databricks workspace. This token will be used to authenticate your connection to Databricks in DataRobot.
See the Databricks on AWS documentation.
Set up a connection in DataRobot¶
To connect to Databricks in DataRobot (note that this example uses Azure):
- Open Workbench and select a Use Case.
- Follow the instructions for connecting to a data source.
Under Authentication, click New credentials. Then, enter your access token and a unique display name. If you've previously added credentials for this data source, you can select it from your saved credentials.
The table below lists the minimum required fields to establish a connection with Databricks:
|Server Hostname||The address of the server to connect to.||Azure Databricks documentation|
|HTTP Path||The compute resources URL.||Azure Databricks documentation|
SQL warehouses are dedicated to execute SQL, and as a result, have less overhead than clusters and often provide better performance. It is recommended to use a SQL warehouse if possible.
catalog parameter is specified in a connection configuration, Workbench will only show a list of schemas in that catalog. If this parameter is not specified, Workbench lists all catalogs you have access to.
|When attempting to execute an operation in DataRobot, the firewall requests that you clear the IP address each time.||Add all allowed IPs for DataRobot.||See Allowed source IP addresses. If you've already added the allowed IPs, check the existing IPs for completeness.|