Data ingestion from Hive with Kerberos Authentication¶
On-premise installations of DataRobot support data ingestion from Hive metastore with Kerberos authentication using a JDBC driver.
Prerequisites¶
- Hive JDBC driver
.jarfiles - Kerberos principal and keytab file
krb5.conffile from the target Hive/Hadoop cluster- TLS truststore
.jksfile to use accessing Hive with TLS - Network connectivity from DR cluster to KDC (Kerberos key distribution center) server
Configuration¶
- Set up client secrets from
krb5.conf, keytab file, and truststore.jksfile:
kubectl -n DR_CORE_NAMESPACE create secret generic hive-configuration --from-file=krb5.conf=./krb5.conf --from-file=datarobot.keytab=./datarobot.keytab --from-file=cacerts.jks=./cacerts.jks
- Modify your
values.yaml:
core:
config_env_vars:
# kerberos related settings
KERBEROS_ENABLE: "True"
ENABLE_DIRECT_KERBEROS_AUTHENTICATION: "True"
KRB5_CONFIG: "/etc/krb5.conf"
KERBEROS_KEYTAB_FILE: "/etc/datarobot.keytab"
KERBEROS_PRINCIPAL: "admin/cluster1@EXAMPLE.COM"
KERBEROS_INIT_ENABLE: "False"
# mounting the kerberos items
DSS_EXTRA_SECRET_MOUNTS: '[{"secret_name": "hive-configuration", "mount_path": "/etc/krb5.conf", "sub_path": "krb5.conf"},{"secret_name": "hive-configuration", "mount_path": "/etc/datarobot.keytab", "sub_path": "datarobot.keytab"},{"secret_name": "hive-configuration", "mount_path": "/opt/datarobot/etc/hadoop/cacerts.jks", "sub_path": "cacerts.jks"}]'
# -- Parameters for the DataRobot `datasets-service` sub-chart {: #parameters-for-the-datarobot-datasets-service-sub-chart }
datasets-service:
component:
api:
# -- Set number of replicas
replicaCount: 1
extraVolumes:
- name: dss-extra-secret-mounts
secret:
secretName: hive-configuration
defaultMode: 420
extraVolumeMounts:
- name: dss-extra-secret-mounts
mountPath: /etc/krb5.conf
subPath: krb5.conf
readOnly: true
- name: dss-extra-secret-mounts
mountPath: /etc/datarobot.keytab
subPath: datarobot.keytab
readOnly: true
- name: dss-extra-secret-mounts
mountPath: /opt/datarobot/etc/hadoop/cacerts.jks
subPath: cacerts.jks
readOnly: true
JDBC driver creating¶
After installing DataRobot with the given values, in the DataRobot UI, open User Settings and select Data Connections > Drivers.
Click + Add new driver, and then create a driver with the following settings:
- Configuration:
Custom - Class name:
org.apache.hive.jdbc.HiveDriver - Driver file(s):
Hive jdbc jar files
Configure New Data Connection¶
Navigate to the Data Connections tab and click + Add new data connection.
- Click My Drivers.
- Select the Hive driver from the dropdown.
- JDBC URL:
jdbc:hive2://<host>:10000/default;principal=admin/cluster1@EXAMPLE.COM;ssl=true;sslTrustStore=/opt/datarobot/etc/hadoop/cacerts.jks;trustStorePassword=changeit
- Click Add New Connection.
- Click Test Connection. A Test Data Connection window will appear.
Note: The
usernameandpasswordfields can contain any data as that information is not passed to Hive.