Skip to content

Security Best Practices

はじめに

This guide is designed to help configure DataRobot following security best practices. DataRobot is a complex product with a wealth of configuration options -- at the time of writing (Jan 2023), this guide covers a small number of them but we hope to extend it over time.

To configure these options, refer to the Tuning Datarobot Environment Variables section of this guide.

Session expiration

Leaving web UI sessions active forever is considered to be a security threat. DataRobot offers flexible control over the session expiration mechanism that allows you to configure session lifetime to be relative to last user visit or to expire after a certain amount of seconds after user login. Once a webui session has expired, a user must re-login and re-authenticate before they can get access to the application again.

Control over the session expiration logic is done by the boolean configuration option WEB_UI_SESSION_ABSOLUTE_EXPIRATION_ENABLED, which determines if the session lifetime should be treated as relative to last user action (request) or absolute to user login. The lifetime (in seconds) of the session is configured by the WEB_UI_SESSION_LIFE_TIME configuration option. The default value is 0 which means that session will last forever.

Configuration examples

The following configuration options in core.config_env_vars set session lifetime to 10 minutes relative to last user action:

# helm chart values snippet
core:
  config_env_vars:
    WEB_UI_SESSION_LIFE_TIME: 600 

The following configuration options in core.config_env_vars set session lifetime to 10 minutes from the moment of user login:

# helm chart values snippet
core:
  config_env_vars:
    WEB_UI_SESSION_LIFE_TIME: 600
    WEB_UI_SESSION_ABSOLUTE_EXPIRATION_ENABLED: true 

Host header protection

To prevent host header poisoning, API servers validate the host header against an allowed list.

By default, the ALLOWED_HOSTS includes:

  • EXTERNAL_WEB_SERVER_URL for the external load balancer (and for situations there is a proxy in front of the application)
  • DATAROBOT_PUBLIC_LB for the global.domain chart values setting for the application hostname when accessed via Ingress.
  • DATAROBOT_INTERNAL_LB for the service hostname within the Kubernetes cluster (e.g. datarobot-nginx) used by internal services.
  • CUSTOM_MODEL_WEB_SERVER_URL for a configurable hostname used by Custom Models for connecting to the API server

If necessary, the ALLOWED_HOSTS setting can be configured in chart values, using a comma-delimited string of hostnames:

core:
  config_env_vars:
    ALLOWED_HOSTS: "datarobot-nginx,datarobot.example.com" 

Although not recommended, to disable host header protection:

core:
  config_env_vars:
    ALLOWED_HOSTS: "*" 

Integrating with customer HTTP proxy

It may be required to use a HTTP proxy for providing cluster access to specific resources over the internet or within the corporate network. To accomplish this, customers may have a HTTP proxy such as Squid proxy configured to route cluster traffic.

Please note that this configuration requires a lot of customization and maintenance to keep up-to-date with upgrades of DataRobot or to reflect changes to your internal infrastructure.

To achieve this, network traffic from application services needs to be routed to the proxy, but ensuring that internal traffic to services within the cluster remains internal and skips the proxy. The following environment variables can be configured globally for all containers:

  • HTTP_PROXY for proxying HTTP traffic within the cluster through an external proxy
  • HTTPS_PROXY for proxying HTTPS traffic within the cluster through an external proxy
  • NO_PROXY for excluding certain hostnames from routing through the proxy. This should be a comma-separated list of hostnames.

These can be configured in the Helm chart values:

global:
  extraEnvVars:
  - name: HTTP_PROXY
    value: http://proxy.example.com:3128
  - name: HTTPS_PROXY
    value: http://proxy.example.com:3128
  - name: NO_PROXY
    value: .dr-app,.svc,.local,.internal,localhost,127.0.0.1,kubernetes,auth-server-hydra-admin,auth-server-hydra-public,blob-view-service,browser-adls,browser-bigquery,browser-controller,browser-databricks,browser-datasphere,browser-s3,browser-snowflake,build-service,buzok-llm-gateway,buzok-web,cfx-session-port-proxy,compute-jobs-service,compute-spark,custom-apps-websocket-proxy,datarobot-analyticsbroker-api,datarobot-analyticsbroker-ingest,datarobot-apigateway-apigateway,datarobot-apps-builder-api,datarobot-apps-builder-internal-api,datarobot-auth-server,datarobot-datasets-service-api,datarobot-internal-api,datarobot-mmapp,datarobot-mmqueue,datarobot-nginx,datarobot-otel-collector,datarobot-pngexport,datarobot-prediction-server,datarobot-prediction-spooler,datarobot-predictions-gateway,datarobot-public-api,datarobot-rsyslog-master,datarobot-tileservergl,datarobot-upload,datavolt-service,identity-resource-service,nbx-audit-logs,nbx-code-assistance,nbx-code-nuggets,nbx-data-retention,nbx-env-vars,nbx-exec-envs,nbx-filesystems,nbx-ingress,nbx-notebook-import,nbx-notebook-revisions,nbx-notebooks,nbx-orchestrator,nbx-permissions,nbx-scheduling,nbx-session-port-proxy,nbx-terminals,nbx-usage-tracking,nbx-users,nbx-vcs,nbx-websockets,network-policy-consumer-service,notification-service,oauth-providers-service,ocr-service,pcs-elasticsearch,pcs-elasticsearch-master-hl,pcs-mongo-0,pcs-mongo-1,pcs-mongo-2,pcs-mongo-arbiter-headless,pcs-mongo-headless,pcs-pgpool,pcs-postgresql,pcs-postgresql-headless,pcs-rabbitmq,pcs-rabbitmq-headless,pcs-redis,pcs-redis-headless,pred-environments-api,service-registration-controller 

Make sure to replace proxy.example.com:3128 with your desired proxy address.

For internal service communications in Kubernetes, this will cover all <service>.<namespace>.svc.cluster.local fully-qualified names for DataRobot services.

You MUST also add the following to NO_PROXY based on your configuration:

  • The namespace where DataRobot will be installed (e.g. .dr-app by default, replace it if another namespace will be used instead)
  • The ClusterIP associated with the Kubernetes API accessed by pods, as returned by kubectl get svc kubernetes (e.g. 10.100.0.1 for EKS by default)
  • The global.domain host where your DataRobot application will be accessed by users (e.g. datarobot.example.com)
  • Hostnames for cloud services endpoints when deployed into a VPC.

For cloud providers' service endpoints:

  • AWS: you can specify .amazonaws.com for all regions or .<region>.amazon.aws.com to limit to a specific region such as us-east-1. See AWS Service Endpoints for the supported list of hostnames.
  • GCP: you can specify .googleapis.com to allow list all services, otherwise refer to Access Google APIs through endpoints for supported list of hostnames.
  • Azure: please see Azure Private Endpoints for a list of supported hostnames.

Additionally, you SHOULD add the following if they apply to your situation:

  • Internal IP addresses for corporate networks that should not be accessed through the proxy (e.g. 1.2.3.4). NOTE - CIDR ranges are NOT supported due to limitations in python.
  • Internal hostnames for resources that are accessible without the tunnel proxy (e.g. .example.com)

When upgrading to 10.2 or later, make sure to remove HTTP_PROXY, HTTPS_PROXY and NO_PROXY settings from other sections of the values YAML (e.g. core.common_env_vars). When set in global.extraEnvVars the settings should correctly be applied to all pods.

Network Policy for custom workloads on Long Running Services (LRS)

Custom workloads (custom models, custom jobs, custom apps) running on LRS have a deny-by-default NetworkPolicy setup, accompanied by Network Policies to allow specific ingress or egress with external services as needed.

These pages explain the specifics to each workload:

To restrict egress for LRS to specific CIDR ranges, the CUSTOM_WORKLOADS_PUBLIC_ACCESS_IGNORE_CIDRS setting can be configured in chart values:

core:
  config_env_vars:
    CUSTOM_WORKLOADS_PUBLIC_ACCESS_IGNORE_CIDRS: '10.0.0.0/8,172.16.0.0/12,192.168.0.0/16' 

This should be a comma-delimited list of CIDR ranges.

It is important that LRS can egress to the datarobot-nginx pod for workloads to use the Public API. Workloads should be using the internal server name datarobot-nginx instead of the external hostname (e.g. datarobot.example.com). In order for LRS to communicate with the Public API using Ingress, the Network Policy would need to be modified to allow egress to the CIDR range covering the ingress controller.

Wrangler on Spark with external object storage

If your cluster is configured with an external s3-compatible object store (e.g. Minio), you will need to ensure that LRS pods hosting the interactive spark session for Data Wrangling have egress to the object store.

If your cluster uses Cilium CNI, create the following CiliumNetworkPolicy. Make sure to replace DATAROBOT-NAMESPACE and MINIO-API-HOSTNAME-GOES-HERE accordingly.

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: allow-lrs-spark-app-access-to-external-object-store
  namespace: DATAROBOT-NAMESPACE
spec:
  endpointSelector:
    matchLabels:
      datarobot-type: lrs
  egress:
  - toFQDNs:
    - matchName: "MINIO-API-HOSTNAME-GOES-HERE"
    toPorts:
    - ports:
      - port: "443"
        protocol: TCP 

Without Cilium, you will need to configure a NetworkPolicy to allow egress to the IP addresses that the DNS hostname resolves to. Make sure to replace DATAROBOT-NAMESPACE and X.X.X.X accordingly.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-lrs-spark-app-access-to-external-object-store
  namespace: DATAROBOT_NAMESPACE
spec:
  podSelector:
    matchLabels:
      datarobot-instance-name: datarobot-lrs
      datarobot-lrs-type: spark_app
  egress:
    - ports:
        - protocol: TCP
          port: 443
      to:
        - ipBlock:
            cidr: X.X.X.X/32
  policyTypes:
    - Egress 

Setting up custom DNS policies

A default configuration would create two policies for DNS access from LRS pods. The first policy is for standard DNS access in k8s cluster which covers all egress from LRSes to kube-dns pods on 53/5353 TCP/UDP ports. And the second policy covers the similar traffic but for OpenShift installations. Here are both policies:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-kube-dns-egress
  namespace: <installation_namespace>
spec:
  egress:
    - ports:
        - port: 53
          protocol: UDP
        - port: 53
          protocol: TCP
        - port: 5353
          protocol: UDP
        - port: 5353
          protocol: TCP
      to:
        - namespaceSelector: {}
          podSelector:
            matchLabels:
              k8s-app: kube-dns
  podSelector:
    matchLabels:
      datarobot-type: lrs
  policyTypes:
    - Egress 
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-openshift-dns-egress
  namespace: <installation_namespace>
spec:
  egress:
    - ports:
        - port: 53
          protocol: UDP
        - port: 53
          protocol: TCP
        - port: 5353
          protocol: UDP
        - port: 5353
          protocol: TCP
      to:
        - namespaceSelector: {}
          podSelector:
            matchLabels:
              dns.operator.openshift.io/daemonset-dns: default
  podSelector:
    matchLabels:
      datarobot-type: lrs
  policyTypes:
    - Egress 

These policies should be enough to cover most cases. If any other custom policy is required then lrsEgressNetworkPolicies could be overwritten. Let's say you want to define an egress DNS policy to pods with custom-label-name=custom-label-value selector on 53 UDP port in kube-system namespace. The overwritten piece should look like:

operator:
  lrsEgressNetworkPolicies:
    - name: "allow-custom-label-name-egress"
      podSelectorLabel: "custom-label-name"
      podSelectorValue: "custom-label-value"
      namespaceSelector: "kube-system"
      ports:
      - protocol: "UDP"
        port: 53 

Please note that lrsEgressNetworkPolicies field can define any custom egress policy for LRSes, not just DNS ones.