Skip to content

Click in-app to access the full platform documentation for your version of DataRobot.

Spark SQL Connector for Data Prep

User Persona: Data Prep User, Data Prep Admin, or Data Source Admin

Note

This document covers all configuration fields available during connector setup. Some fields may have already been filled out by your Administrator at an earlier step of configuration and may not be visible to you. For more information on Data Prep's connector framework, see Data Prep Connector setup. Also, your Admin may have named this connector something else in the list of Data Sources.

Configure Data Prep

This connector allows you to connect to Spark SQL for browsing, importing, and exporting available data. The following fields are used to define the connection parameters.

General

  • Name: Name of the data source as it will appear to users in the UI.
  • Description: Description of the data source as it will appear to users in the UI.

Tip

You can connect Data Prep to multiple Spark SQL instances. Using a descriptive name can be a big help to users in identifying the appropriate data source.

Spark SQL Server Configuration

  • Spark SQL Server: The hostname or IP address of the server hosting the Spark SQL database.
  • Spark SQL Port: The port for the Spark SQL database.

  • Use SSL: Set this property to the value specified in the 'hive.server2.use.SSL' property of your Hive configuration file (hive-site.xml).

  • Transport Mode: Set this property to the value specified in the 'hive.server2.transport.mode' property of your Hive configuration file (hive-site.xml).

  • HTTP Path: This property is used to specify the path component of the URL endpoint when using HTTP Transport Mode. This property should be set to the value specified in the 'hive.server2.thrift.http.path' property of the Hive configuration file (hive-site.xml).

Spark SQL Server Authentication Configuration

  • User: The username used to authenticate with Spark SQL. For Databricks, set to 'token'.
  • Password: The password used to authenticate with Spark SQL. For Databricks, set to your personal access token (value can be obtained by navigating to the User Settings page of your Databricks instance and selecting the Access Tokens tab).

Data Import Information

Via Browsing

  • View to a table and "Select" the table for import.

Via SQL Query

  • Supports importing using a legal SQL Select Query.

Updated October 28, 2021
Back to top