セキュリティのベストプラクティス¶

はじめに¶

This guide is designed to help configure DataRobot following security best practices. DataRobot is a complex product with a wealth of configuration options -- at the time of writing (Jan 2023), this guide covers a small number of them and may be extended over time.

To configure these options, refer to the Tuning DataRobot Environment Variables section of this guide.

Session expiration¶

Leaving web UI sessions active forever is considered to be a security threat. DataRobot offers flexible control over the session expiration mechanism that allows you to configure session lifetime to be relative to last user visit or to expire after a certain amount of seconds after user login. Once a webui session has expired, a user must re-login and re-authenticate before they can get access to the application again.

Control over the session expiration logic is done by the boolean configuration option WEB_UI_SESSION_ABSOLUTE_EXPIRATION_ENABLED, which determines if the session lifetime should be treated as relative to last user action (request) or absolute to user login. The lifetime (in seconds) of the session is configured by the WEB_UI_SESSION_LIFE_TIME configuration option. The default value is 0 which means that session lasts forever.

Configuration examples¶

The following configuration options in core.config_env_vars set session lifetime to 10 minutes relative to last user action:

# helm chart values snippet
core:
  config_env_vars:
    WEB_UI_SESSION_LIFE_TIME: 600

The following configuration options in core.config_env_vars set session lifetime to 10 minutes from the moment of user login:

# helm chart values snippet
core:
  config_env_vars:
    WEB_UI_SESSION_LIFE_TIME: 600
    WEB_UI_SESSION_ABSOLUTE_EXPIRATION_ENABLED: true

Host header protection¶

To prevent host header poisoning, API servers validate the host header against an allowed list.

By default, the ALLOWED_HOSTS includes:

EXTERNAL_WEB_SERVER_URL for the external load balancer (and for situations there is a proxy in front of the application)
DATAROBOT_PUBLIC_LB for the global.domain chart values setting for the application hostname when accessed via Ingress.
DATAROBOT_INTERNAL_LB for the service hostname within the Kubernetes cluster (e.g. datarobot-nginx) used by internal services.
CUSTOM_MODEL_WEB_SERVER_URL for a configurable hostname used by Custom Models for connecting to the API server

If necessary, the ALLOWED_HOSTS setting can be configured in chart values, using a comma-delimited string of hostnames:

core:
  config_env_vars:
    ALLOWED_HOSTS: "datarobot-nginx,datarobot.example.com"

Although not recommended, to disable host header protection:

core:
  config_env_vars:
    ALLOWED_HOSTS: "*"

Integrating with customer HTTP proxy¶

It may be required to use a HTTP proxy for providing cluster access to specific resources over the internet or within the corporate network. To accomplish this, customers may have a HTTP proxy such as Squid proxy configured to route cluster traffic.

備考

This configuration requires a lot of customization and maintenance to keep up-to-date with upgrades of DataRobot or to reflect changes to your internal infrastructure.

To achieve this, network traffic from application services needs to be routed to the proxy, but ensuring that internal traffic to services within the cluster remains internal and skips the proxy. The following environment variables can be configured globally for all containers:

HTTP_PROXY for proxying HTTP traffic within the cluster through an external proxy
HTTPS_PROXY for proxying HTTPS traffic within the cluster through an external proxy
NO_PROXY for excluding certain hostnames from routing through the proxy. This should be a comma-separated list of hostnames.

These can be configured in the Helm chart values:

global:
  extraEnvVars:
  - name: HTTP_PROXY
    value: http://proxy.example.com:3128
  - name: HTTPS_PROXY
    value: http://proxy.example.com:3128
  - name: NO_PROXY
    value: .dr-app,.svc,.local,.internal,localhost,127.0.0.1,kubernetes,auth-server-hydra-admin,auth-server-hydra-public,blob-view-service,browser-adls,browser-bigquery,browser-controller,browser-databricks,browser-datasphere,browser-s3,browser-snowflake,build-service,buzok-llm-gateway,buzok-web,cfx-session-port-proxy,compute-jobs-service,compute-spark,custom-apps-websocket-proxy,datarobot-analyticsbroker-api,datarobot-analyticsbroker-ingest,datarobot-apigateway-apigateway,datarobot-apps-builder-api,datarobot-apps-builder-internal-api,datarobot-auth-server,datarobot-datasets-service-api,datarobot-internal-api,datarobot-mmapp,datarobot-mmqueue,datarobot-nginx,datarobot-otel-collector,datarobot-pngexport,datarobot-prediction-server,datarobot-prediction-spooler,datarobot-predictions-gateway,datarobot-public-api,datarobot-rsyslog-master,datarobot-tileservergl,datarobot-upload,datavolt-service,identity-resource-service,nbx-audit-logs,nbx-code-assistance,nbx-code-nuggets,nbx-data-retention,nbx-env-vars,nbx-exec-envs,nbx-filesystems,nbx-ingress,nbx-notebook-import,nbx-notebook-revisions,nbx-notebooks,nbx-orchestrator,nbx-permissions,nbx-scheduling,nbx-session-port-proxy,nbx-terminals,nbx-usage-tracking,nbx-users,nbx-vcs,nbx-websockets,network-policy-consumer-service,notification-service,oauth-providers-service,ocr-service,pcs-elasticsearch,pcs-elasticsearch-master-hl,pcs-mongo-0,pcs-mongo-1,pcs-mongo-2,pcs-mongo-arbiter-headless,pcs-mongo-headless,pcs-pgpool,pcs-postgresql,pcs-postgresql-headless,pcs-rabbitmq,pcs-rabbitmq-headless,pcs-redis,pcs-redis-headless,pred-environments-api,service-registration-controller

Make sure to replace proxy.example.com:3128 with your desired proxy address.

For internal service communications in Kubernetes, this covers all <service>.<namespace>.svc.cluster.local fully-qualified names for DataRobot services.

You MUST also add the following to NO_PROXY based on your configuration:

The namespace where DataRobot is installed (e.g. .dr-app by default; replace it if you use another namespace)
The ClusterIP associated with the Kubernetes API accessed by pods, as returned by kubectl get svc kubernetes (e.g. 10.100.0.1 for EKS by default)
The global.domain host where users access your DataRobot application (e.g. datarobot.example.com)
Hostnames for cloud services endpoints when deployed into a VPC.

For cloud providers' service endpoints:

AWS: you can specify .amazonaws.com for all regions or .<region>.amazon.aws.com to limit to a specific region such as us-east-1. See AWS Service Endpoints for the supported list of hostnames.
GCP: you can specify .googleapis.com to allow list all services, otherwise refer to Access Google APIs through endpoints for supported list of hostnames.
Azure: see Azure Private Endpoints for a list of supported hostnames.

Additionally, you SHOULD add the following if they apply to your situation:

Internal IP addresses for corporate networks that shouldn't be accessed through the proxy (e.g. 1.2.3.4). NOTE - CIDR ranges aren't supported due to limitations in python.
Internal hostnames for resources that are accessible without the tunnel proxy (e.g. .example.com)

When upgrading to 10.2 or later, make sure to remove HTTP_PROXY, HTTPS_PROXY and NO_PROXY settings from other sections of the values YAML (e.g. core.common_env_vars). When set in global.extraEnvVars the settings should correctly be applied to all pods.

Network policy for custom workloads on Long running services (LRS)¶

Custom workloads (custom models, custom jobs, custom apps) running on LRS have a deny-by-default NetworkPolicy setup, accompanied by Network Policies to allow specific ingress or egress with external services as needed.

These pages explain the specifics to each workload:

To restrict egress for LRS to specific CIDR ranges, the CUSTOM_WORKLOADS_PUBLIC_ACCESS_IGNORE_CIDRS setting can be configured in chart values:

core:
  config_env_vars:
    CUSTOM_WORKLOADS_PUBLIC_ACCESS_IGNORE_CIDRS: '10.0.0.0/8,172.16.0.0/12,192.168.0.0/16'

This should be a comma-delimited list of CIDR ranges.

it's important that LRS can egress to the datarobot-nginx pod for workloads to use the Public API. Workloads should be using the internal server name datarobot-nginx instead of the external hostname (e.g. datarobot.example.com). In order for LRS to communicate with the Public API using Ingress, the Network Policy would need to be modified to allow egress to the CIDR range covering the ingress controller.

Wrangler on Spark with external object storage¶

If your cluster is configured with an external s3-compatible object store (e.g. Minio), you must ensure that LRS pods hosting the interactive spark session for Data Wrangling have egress to the object store.

If your cluster uses Cilium CNI, create the following CiliumNetworkPolicy. Make sure to replace DATAROBOT-NAMESPACE and MINIO-API-HOSTNAME-GOES-HERE accordingly.

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: allow-lrs-spark-app-access-to-external-object-store
  namespace: DATAROBOT-NAMESPACE
spec:
  endpointSelector:
    matchLabels:
      datarobot-type: lrs
  egress:
  - toFQDNs:
    - matchName: "MINIO-API-HOSTNAME-GOES-HERE"
    toPorts:
    - ports:
      - port: "443"
        protocol: TCP

Without Cilium, configure a NetworkPolicy to allow egress to the IP addresses that the DNS hostname resolves to. Make sure to replace DATAROBOT-NAMESPACE and X.X.X.X accordingly.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-lrs-spark-app-access-to-external-object-store
  namespace: DATAROBOT_NAMESPACE
spec:
  podSelector:
    matchLabels:
      datarobot-instance-name: datarobot-lrs
      datarobot-lrs-type: spark_app
  egress:
    - ports:
        - protocol: TCP
          port: 443
      to:
        - ipBlock:
            cidr: X.X.X.X/32
  policyTypes:
    - Egress

Setting up custom DNS policies¶

A default configuration would create two policies for DNS access from LRS pods. The first policy is for standard DNS access in k8s cluster which covers all egress from LRSes to kube-dns pods on 53/5353 TCP/UDP ports. And the second policy covers the similar traffic but for OpenShift installations. Here are both policies:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-kube-dns-egress
  namespace: <installation_namespace>
spec:
  egress:
    - ports:
        - port: 53
          protocol: UDP
        - port: 53
          protocol: TCP
        - port: 5353
          protocol: UDP
        - port: 5353
          protocol: TCP
      to:
        - namespaceSelector: {}
          podSelector:
            matchLabels:
              k8s-app: kube-dns
  podSelector:
    matchLabels:
      datarobot-type: lrs
  policyTypes:
    - Egress

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-openshift-dns-egress
  namespace: <installation_namespace>
spec:
  egress:
    - ports:
        - port: 53
          protocol: UDP
        - port: 53
          protocol: TCP
        - port: 5353
          protocol: UDP
        - port: 5353
          protocol: TCP
      to:
        - namespaceSelector: {}
          podSelector:
            matchLabels:
              dns.operator.openshift.io/daemonset-dns: default
  podSelector:
    matchLabels:
      datarobot-type: lrs
  policyTypes:
    - Egress

These policies should be enough to cover most cases. If any other custom policy is required then lrsEgressNetworkPolicies could be overwritten. For example, to define an egress DNS policy to pods with custom-label-name=custom-label-value selector on 53 UDP port in kube-system namespace. The overwritten piece should look like:

operator:
  lrsEgressNetworkPolicies:
    - name: "allow-custom-label-name-egress"
      podSelectorLabel: "custom-label-name"
      podSelectorValue: "custom-label-value"
      namespaceSelector: "kube-system"
      ports:
      - protocol: "UDP"
        port: 53

備考

The lrsEgressNetworkPolicies field can define any custom egress policy for LRSes, not only DNS-specific.