Cloudera CDH5 HDFS Connector for Data Prep¶
User Persona: Data Prep Admin, Data Source Admin, or IT/DevOps
Availability information
This Connector is not available to Data Prep SaaS customers.
Note
This document covers all configuration fields available during connector setup. Some fields may have already been filled out by your Administrator at an earlier step of configuration and may not be visible to you. For more information on Data Prep's connector framework, see Data Prep Connector setup. Also, your Admin may have named this connector something else in the list of Data Sources.
Configure Data Prep¶
This connector allows you to connect to Cloudera CDH 5.16 Hadoop File System (HDFS) for import and export. The following fields are used to define the connection parameters.
Note
Configuring this Connector requires file system access on the Data Prep Server and a core-site.xml
with the Hadoop cluster configuration. Please reach out to your Customer Success representative for assistance with this step.
General¶
- Name: Name of the data source as it will appear to users in the UI.
- Description: Description of the data source as it will appear to users in the UI.
Tip
You can connect Data Prep to multiple HDFS clusters. Using a descriptive name can be a big help to users in identifying the appropriate data source.
Simple Configuration (only for Simple authentication)¶
- Username: The application web server will connect to your HDFS cluster as the username you provide here.
Configuration¶
- Data Store Root Directory: The ’parent directory’ on your cluster where the Connector will read from and write to for import and export operations. This also supports import and export for sub-directories of the root.
Kerberos Configuration¶
The following parameters are required for Kerberos authentication.
- Principal: Kerberos Principal.
- Realm: Kerberos Realm.
- KDC Hostname: Kerberos Key Distribution Center Hostname.
- Kerberos Configuration File: Fully-qualified path of Kerberos configuration file on webserver.
- Keytab File: Fully-qualified path of Kerberos Keytab File on webserver.
- Use Application User: Check this box to read/write as the logged-in application user, or uncheck to use proxy user.
-
Proxy User: The proxy used to authenticate with the cluster. Enter ${user.name} as the proxy user. ${user.name} works similar to selecting Use Application User but allows for more flexibility. For example:
-
To add a domain to the user’s credentials, enter
\domain_name\${user.name}
in the Proxy User field. Data Prep will pass the username and the domain. -
Example:
\Accounts\${user.name}
results in AccountsJoe (assuming Joe is the username). -
To apply a text modifier to the username, add .modifier to the key
${user.name}
. The acceptable modifiers are: toLower, toUpper, toLowerCase, toUpperCase, and trim. -
For example
${user.name.toLowerCase}
converts Joe into joe (assuming Joe is the username).
Data Import Information¶
Via Browsing¶
Supported
Via SQL Query¶
Not supported