Skip to content

Data ingestion from Hive with Kerberos Authentication

On-premise installations of DataRobot support data ingestion from Hive metastore with Kerberos authentication using a JDBC driver.

Prerequisites

  • Hive JDBC driver .jar files
  • Kerberos principal and keytab file
  • krb5.conf file from the target Hive/Hadoop cluster
  • TLS truststore .jks file to use accessing Hive with TLS
  • Network connectivity from DR cluster to KDC (Kerberos key distribution center) server

Configuration

  • Set up client secrets from krb5.conf, keytab file, and truststore .jks file:
kubectl -n DR_CORE_NAMESPACE create secret generic hive-configuration --from-file=krb5.conf=./krb5.conf --from-file=datarobot.keytab=./datarobot.keytab --from-file=cacerts.jks=./cacerts.jks
  • Modify your values.yaml:
core:
  config_env_vars:
    # kerberos related settings
    KERBEROS_ENABLE: "True"
    ENABLE_DIRECT_KERBEROS_AUTHENTICATION: "True"
    KRB5_CONFIG: "/etc/krb5.conf"
    KERBEROS_KEYTAB_FILE: "/etc/datarobot.keytab"
    KERBEROS_PRINCIPAL: "admin/cluster1@EXAMPLE.COM"
    KERBEROS_INIT_ENABLE: "False"

    # mounting the kerberos items
    DSS_EXTRA_SECRET_MOUNTS: '[{"secret_name": "hive-configuration", "mount_path": "/etc/krb5.conf", "sub_path": "krb5.conf"},{"secret_name": "hive-configuration", "mount_path": "/etc/datarobot.keytab", "sub_path": "datarobot.keytab"},{"secret_name": "hive-configuration", "mount_path": "/opt/datarobot/etc/hadoop/cacerts.jks", "sub_path": "cacerts.jks"}]'

# -- Parameters for the DataRobot `datasets-service` sub-chart {: #parameters-for-the-datarobot-datasets-service-sub-chart }
datasets-service:
  component:
    api:
      # -- Set number of replicas
      replicaCount: 1
  extraVolumes:
  - name: dss-extra-secret-mounts
    secret:
      secretName: hive-configuration
      defaultMode: 420
  extraVolumeMounts:
  - name: dss-extra-secret-mounts
    mountPath: /etc/krb5.conf
    subPath: krb5.conf
    readOnly: true
  - name: dss-extra-secret-mounts
    mountPath: /etc/datarobot.keytab
    subPath: datarobot.keytab
    readOnly: true
  - name: dss-extra-secret-mounts
    mountPath: /opt/datarobot/etc/hadoop/cacerts.jks
    subPath: cacerts.jks
    readOnly: true

JDBC driver creating

After installing DataRobot with the given values, in the DataRobot UI, open User Settings and select Data Connections > Drivers.

Click + Add new driver, and then create a driver with the following settings:

  • Configuration: Custom
  • Class name: org.apache.hive.jdbc.HiveDriver
  • Driver file(s): Hive jdbc jar files

Configure New Data Connection

Navigate to the Data Connections tab and click + Add new data connection.

  • Click My Drivers.
  • Select the Hive driver from the dropdown.
  • JDBC URL:
jdbc:hive2://<host>:10000/default;principal=admin/cluster1@EXAMPLE.COM;ssl=true;sslTrustStore=/opt/datarobot/etc/hadoop/cacerts.jks;trustStorePassword=changeit
  • Click Add New Connection.
  • Click Test Connection. A Test Data Connection window will appear.

Note: The username and password fields can contain any data as that information is not passed to Hive.