Skip to content

Click in-app to access the full platform documentation for your version of DataRobot.

Hortonworks HDP2 Hive Connector for Data Prep

User Persona: Data Prep Admin, Data Source Admin, or IT/DevOps

Availability information

This Connector is not available to Data Prep SaaS customers.

Note

This document covers all configuration fields available during connector setup. Some fields may have already been filled out by your Administrator at an earlier step of configuration and may not be visible to you. For more information on Data Prep's connector framework, see Data Prep Connector setup. Also, your Admin may have named this connector something else in the list of Data Sources.

Configure Data Prep

This connector allows you to connect to a Hortonworks HDP 2.6.5 Hive for import and export. The following fields are used to define the connection parameters.

Note

Configuring this Connector requires file system access on the Data Prep Server and a core-site.xml with the Hadoop cluster configuration. Contact your Customer Success representative for assistance with this step.

General

  • Name: Name of the data source as it will appear to users in the UI.
  • Description: Description of the data source as it will appear to users in the UI.

Tip

You can connect Data Prep to multiple Hive directories. Using a descriptive name can be a big help to users in identifying the appropriate data source.

Hadoop Cluster

  • HDFS User: The username on the HDFS cluster used to write files for export to Hive.

Kerberos Configuration

The following parameters are required for Kerberos Authentication.

  • Principal: Kerberos Principal.
  • Realm: Kerberos Realm.
  • KDC Hostname: Kerberos Key Distribution Center Hostname.
  • Kerberos Configuration File: Fully-qualified path of Kerberos configuration file on webserver.
  • Keytab File: Fully-qualified path of Kerberos Keytab File on webserver.

The Proxy User and Use Application User options allow you to specify the account to impersonate. See this documentation for more information about impersonation in HDFS. You have three options here: use a specific proxy user, a proxy user with modifiers, or the individual application user.

  • Proxy User: Here you can either specify the user account that will be impersonated for all connections or check the Use Application User box to impersonate the user account of the individual Data Prep user who runs the connector. Note that the Proxy User field is not enabled if Use Application User is checked. Entering ${user.name} as the proxy user works similarly to selecting Use Application User but allows for more flexibility because you can add modifiers or additional text. For example:
    • To add a domain to the user’s credentials, enter \domain_name\${user.name} in the Proxy User field. Data Prep will pass the username and the domain.
      • Example: \Accounts\${user.name} results in Accounts\Joe (assuming Joe is the username).
    • To apply a text modifier to the username, add .modifier to the key ${user.name}. The acceptable modifiers are: toLower, toUpper, toLowerCase, toUpperCase, and trim.
      • Example: ${user.name.toLowerCase} converts Joe into joe (assuming Joe is the username).

Hive Configuration

When you export data using the Hive connector, a file is written into HDFS and then an external table is created in Hive through the Hive JDBC driver. The Proxy User field specifies the user account to impersonate when writing a file into HDFS, but in order to do an impersonation in Hive, you must also specify the user in the JDBC URL.

  • JDBC URL: The URL used to access Hive for import and registration of external tables. If Kerberos authentication is used, the following string must be added to the URL ";auth=kerberos;hive.server2.proxy.user=${user.name}.

  • If a proxy user is used, then the string ${user.name} must be replaced with the proxy username.

  • Hive File Location: The location within HDFS used to store Hive files for external tables.

Credentials

  • Hive User: The username used to access Hive for Simple and Hybrid authentication.
  • Hive Password: The password used to access Hive for Simple authentication.

Hive Options

  • Pre-Import SQL: SQL to be executed before import process. This SQL may execute multiple times (for preview and import) and could be multiple SQL statements, newline-delimited.
  • Post-Import SQL: SQL to be executed after import process. This SQL may execute multiple times (for preview and import) and could be multiple SQL statements, newline-delimited.

Note

As the Pre- and Post-Import SQL may be executed multiple times throughout the import process, take care when specifying these values in the Connector/Datasource Configuration as they will be executed for every import performed with this configuration.*

  • Pre-Export SQL: SQL to be executed before export process. This SQL will execute once and could be multiple SQL statements, newline-delimited.
  • Post-Export SQL: SQL to be executed after export process. This SQL will execute once and could be multiple SQL statements, newline-delimited.

Data Import Information

Via Browsing

Not Supported

Via SQL Query

Using SQL Select queries


Updated October 28, 2021
Back to top