Skip to content

Custom Models Configuration

Custom Models can be configured with different configs.

依存関係

  • Image build service installed.
  • NGINX ingress controller installed with a valid hostname for the NGINX service (with HTTPS trusted by DR Core).

Configuration Values

To configure these option, refer to the Tuning Datarobot Environment Variables section of this guide.

設定 説明 デフォルト
IMAGE_BUILDER_SERVICE_URL URL for the image build service "http://build-service"
IMAGE_BUILDER_CUSTOM_MODELS_REGISTRY_HOST Hostname for registry used for custom model images DOCKER_REGISTRY_URL installation variable
IMAGE_BUILDER_CUSTOM_MODELS_REGISTRY_REPO Repostory in registry for custom model images managed-image
IMAGE_BUILDER_EPHEMERAL_CUSTOM_MODELS_REGISTRY_REPO Repostory in registry for ephemeral custom model images ephemeral-image
IMAGE_BUILDER_CUSTOM_MODELS_ENVIRONMENT_REGISTRY_REPO Repository in registry for environment images base-image
IMAGE_BUILDER_WORKERS Number of concurrent image build workers 4
CUSTOM_MODELS_FQDN NGINX Ingress controller service hostname(1) Value of global.domain in the umbrella charts
LONG_RUNNING_SERVICES_OPERATOR_INSTANCE_NAME Namespace where the LRS operator / chart is installed Namespace of the DR Core installation
LONG_RUNNING_SERVICES_RESOURCES_NAMESPACE Namespace where the model deployments are created. The value of LONG_RUNNING_SERVICES_OPERATOR_INSTANCE_NAME
LONG_RUNNING_SERVICES_SECURITY_CONTEXT_USER_ID User ID for built model images + runAsUser on deployment 1000
LONG_RUNNING_SERVICES_SECURITY_CONTEXT_GROUP_ID Group ID for built model images + runAsGroup on deployment 1000
LONG_RUNNING_SERVICES_READY_TIMEOUT_SECONDS The number of seconds to wait for a LRS resource to become ready. This time can include waiting for a new node to be spun up (if cluster auto-scaling is supported) and for the image to be pulled (if it is not already cached on the node) and finally for any configured health checks to succeed. 1920
LRS_CUSTOM_MODEL_STARTUP_TIMEOUT_SECONDS The number of seconds to wait for the LRS health status to return success. 960
LRS_CUSTOM_MODEL_STARTUP_PROBE_DISABLED The boolean to enable startup probe. False
LRS_CUSTOM_MODEL_STARTUP_PROBE_PORT Port for startup probe 8080
LRS_CUSTOM_MODEL_STARTUP_PROBE_INITIAL_DELAY_SECONDS Defines how much time the startup probe should wait until sending requests 1
LRS_CUSTOM_MODEL_STARTUP_PROBE_TIMEOUT_SECONDS Defines timeout for each startup probe request 5
LRS_CUSTOM_MODEL_STARTUP_PROBE_PERIOD_SECONDS Defines the period between the consequential requests 10
LRS_CUSTOM_MODEL_STARTUP_PROBE_SUCCESS_THRESHOLD Defines the amount of successful responses to claim the container started 1
LRS_CUSTOM_MODEL_STARTUP_PROBE_FAILURE_THRESHOLD Defines the amount of failed responses to claim the container failed 360
LRS_CUSTOM_MODEL_READINESS_PROBE_DISABLED The boolean to enable readiness probe False
LRS_CUSTOM_MODEL_READINESS_PROBE_PORT Port for readiness probe 8080
LRS_CUSTOM_MODEL_READINESS_PROBE_INITIAL_DELAY_SECONDS Defines how much time the readiness probe should wait until sending requests 1
LRS_CUSTOM_MODEL_READINESS_PROBE_TIMEOUT_SECONDS Defines timeout for each readiness probe request 5
LRS_CUSTOM_MODEL_READINESS_PROBE_PERIOD_SECONDS Defines the period between the consequential requests 5
LRS_CUSTOM_MODEL_READINESS_PROBE_SUCCESS_THRESHOLD Defines the amount of successful responses to claim the container ready 1
LRS_CUSTOM_MODEL_READINESS_PROBE_FAILURE_THRESHOLD Defines the amount of failed responses to claim the container failed 6
LRS_CUSTOM_MODEL_LIVENESS_PROBE_DISABLED The boolean to enable liveness probe False
LRS_CUSTOM_MODEL_LIVENESS_PROBE_PORT Port for liveness probe 8080
LRS_CUSTOM_MODEL_LIVENESS_PROBE_INITIAL_DELAY_SECONDS Defines how much time the liveness probe should wait until sending requests 3
LRS_CUSTOM_MODEL_LIVENESS_PROBE_TIMEOUT_SECONDS Defines timeout for each liveness probe request 5
LRS_CUSTOM_MODEL_LIVENESS_PROBE_PERIOD_SECONDS Defines the period between the consequential requests 10
LRS_CUSTOM_MODEL_LIVENESS_PROBE_SUCCESS_THRESHOLD Defines the amount of successful responses to claim the container alive 1
LRS_CUSTOM_MODEL_LIVENESS_PROBE_FAILURE_THRESHOLD Defines the amount of failed responses to claim the container failed 30
LRS_CUSTOM_MODEL_DEFAULT_NIM_HEALTH_ENDPOINT Defines a default Nvidia health check endpoint that would be used for all probes if a custom endpoint is not provided through the templates. /v1/health/ready
LRS_CUSTOM_MODEL_DEFAULT_NIM_PORT Defines a default Nvidia health check port that would be used for all probes if a custom port is not provided through the templates. 8000
ENABLE_PDB_FOR_CUSTOM_MODELS_LRS_WITH_MULTIPLE_REPLICAS The boolean to enable PDB for newly created model deployments LRSes with more than one replica False
ENABLE_HA_FOR_ALL_MODEL_DEPLOYMENTS The boolean to enable HA affinity rules for all newly created model deployments LRSes False

(1) See Ingress Settings for details on ingress-nginx. The hostname must support HTTPS. (2) LRS_CUSTOM_MODEL_STARTUP_TIMEOUT_SECONDS variable was removed. Instead, please use LRS_CUSTOM_MODEL_STARTUP_PROBE_PERIOD_SECONDS & LRS_CUSTOM_MODEL_STARTUP_PROBE_FAILURE_THRESHOLD vaiables to define this variable (startup_timeout_seconds = startup_probe_period_seconds x startup_probe_failure_threshold).

Image Build Performance Tuning

The image build worker (queue-exec-manager-build-service) starts build jobs by making calls to the image build service. Relevant configurations and chart values for image building:

  • By default, each worker pod is set to run a max of four concurrent builds. The concurrency level can be tuned using the environment variable IMAGE_BUILDER_WORKERS listed in Configuration Values section.
  • The replica count for queue-exec-manager-build-service can be changed by setting the value queue-exec-manager.component.build-service.replicaCount to the desired amount at install time in the umbrella chart.

The image build service manages resources and timeout for build pods. The resources can be tuned by setting the build-service.imageBuilder values in the umbrella chart. The following are the relevant values for tweaking the performance of image builds:

build-service:
  imageBuilder:
    podTimeout: "14400000"
    resources:
      limits:
        memory: "4G"
        cpu: "3"
      requests:
        memory: "4G"
        cpu: "3" 

Prediction Server Performance Tuning

For installations that are mostly used for serving custom models as opposed to native DataRobot models, we recommend tuning the amount of API workers to 2-3x of number of cores allocated to the prediction server, since the workers will be mostly I/O-heavy.

# datarobot umbrella chart values.yaml excerpt

prediction-server:
  component:
    server:
      # provided that prediction-server.component.server.computeResources.cpu is 2
      predictionApiWorkers: "6" 

In case of mixed native/custom model usage and autoscaling enabled for the prediction server, for smoother scaling we recommend lowering the CPU threshold:

# datarobot umbrella chart values.yaml excerpt

prediction-server:
  autoscaling:
    targetCPUUtilizationPercentage: 35 

Service Account for Model Deployments

A default service account is used for all custom model deployments / LRS resources. The service account can be changed by setting the value lrs-operator.operator.config.defaultServiceAccountName at install time in the umbrella chart.

Network Access

Custom Models support the configuration of network isolation for models. This functionality is based on standard Kubernetes network policies. The cluster must be running a plugin that supports network policies for this to be enforced (e.g. Project Calico).

If you have Calico, Cilium or anything else that enforces network policies, our DR provided policies for custom models requires no further configuration. By default a deny-all policy is created as part of installing custom models:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  annotations:
    ...
  labels:
    ...
  name: default-lrs-resource-denyall-policy
  namespace: ..
spec:
  podSelector:
    matchLabels:
      datarobot-type: lrs
  policyTypes:
  - Egress
  - Ingress 

GPU Configuration

Starting from version 10.0, DataRobot supports inference on GPU in Custom Models as a premium entitlement. After assuring all the steps below are complete, the customer will also need the ENABLE_CUSTOM_MODEL_GPU_INFERENCE feature flag (and it's dependencies) enabled as well to be able to use GPU resource bundles with custom models.

GPU Requirements

For simple deep learning models or LLMs that are no larger than 8 billion parameters, DataRobot recommends using the following hardware bundles:

Bundle Name GPU Device GPU Memory RAM CPUコア ストレージ VM Types
GPU - S Nvidia T4 16GiB 16GiB 4 200GiB Azure: Standard_NC4as_T4_v3
AWS: g4dn.xlarge
Google Cloud: n1-standard-4
GPU - M Nvidia T4 or Nvidia L4 16GiB 32GiB 8 200GiB Azure: Standard_NC8as_T4_v3
AWS: g4dn.2xlarge
Google Cloud: g2-standard-8
GPU - L Nvidia A10 or Nvidia A100 24GiB 32GiB 8 200GiB Azure: Standard_NC24ads_A100_v4
AWS: g5.2xlarge
Google Cloud: a2-highgpu-1g

If your hardware is substantially different from the recommended bundles above or you are interested in deploying LLMs that are larger than 8 billion parameters, please reach out to your DataRobot CFDS representative to identify an optimal configuration.

Enable the use of GPUs

After you have prepared your Kubernetes cluster and your GPU nodes (see general GPU instructions), enable the usage of GPU nodes for the Custom Models in the values.yaml file of the DataRobot helm chart. Please use the configuration example below as a starting point.

core:
  config_env_vars:
    LRS_GPU_CONTAINER_SIZES: |-
      [
        {
          "id": "gpu.medium",
          "name": "GPU - M",
          "description": "1 x NVIDIA A10 | 24GB VRAM | 8 CPU | 32GB RAM",
          "gpu_maker": "nvidia",
          "gpu_type_label": "nvidia-a10g-1x",
          "gpu_count": 1,
          "gpu_memory_mb": 24576,
          "cpu_count": 8,
          "memory_mb": 32768,
          "use_cases": ["customModel"]
        },
        {
          "id": "h100.one",
          "name": "GPU - L",
          "description": "1 x NVIDIA H100 | 80GB VRAM | 8 CPU | 32GB RAM",
          "gpu_maker": "nvidia",
          "gpu_type_label": "nvidia-h100-1x",
          "gpu_count": 1,
          "gpu_memory_mb": 81920,
          "cpu_count": 8,
          "memory_mb": 32768,
          "use_cases": ["customModel"]
        },
        {
          "id": "h100.two",
          "name": "GPU - 2XL",
          "description": "2 x NVIDIA H100 | 160GB VRAM | 16 CPU | 64GB RAM",
          "gpu_maker": "nvidia",
          "gpu_type_label": "nvidia-h100-2x",
          "gpu_count": 2,
          "gpu_memory_mb": 163840,
          "cpu_count": 16,
          "memory_mb": 65536,
          "use_cases": ["customModel"]
        }
      ]
lrs-operator:
  operator:
    config:
      nodeGroups: |-
        {
          "labelToNodeGroupMap": {
            "datarobot-gpu-type=nvidia-h100-1x": "gpu-h100-1x",
            "datarobot-gpu-type=nvidia-h100-2x": "gpu-h100-2x",
            "datarobot-gpu-type=nvidia-a10g-1x": "gpu-a10-1x"
          },
          "nodeGroupsByName":  {
            "gpu-h100-1x": {
              "taint": "nvidia.com/gpu=true:NoExecute",
              "affinityLabel": "intent=1xH100GpuWorker"
            },
            "gpu-h100-2x": {
              "taint": "nvidia.com/gpu=true:NoExecute",
              "affinityLabel": "intent=2xH100GpuWorker"
            },
            "gpu-a10-1x": {
              "taint": "nvidia.com/gpu=true:NoExecute",
              "affinityLabel": "intent=1xA10GpuWorker"
            }
          }
        } 

備考: - Field id must be unique across all the items of the list. - Fields name and description will be displayed in UI. - Field gpu_maker must be "nvidia". - Field gpu_type_label must have a matching item under labelToNodeGroupMap. - Keys under nodeGroupsByName should match values in labelToNodeGroupMap. - Value of sub-field taint within nodeGroupsByName will be applied as a toleration to the corresponding Kubernetes pods. If more than one toleration need to be applied, such tolerations need to be comma-separated. - Value of sub-field affinityLabel within nodeGroupsByName will be applied as requiredDuringSchedulingIgnoredDuringExecution node affinity type. Only one affinity item is supported. - Field gpu_memory_mb is used to help recommend the appropriate resource bundles for a given LLM based on heuristics. - Fields LRS_GPU_CONTAINER_SIZES and nodeGroups are JSON strings so take care to make sure it is valid (i.e. no trailing commas). Incorrect syntax results in an error being logged but the application will still start.

LLM Startup Tuning

In 10.2 we have moved the custom model startup to a dedicated queue so slow-to-start models do not block other operations in the application. As such, we believe to have tuned the LONG_RUNNING_SERVICES_READY_TIMEOUT_SECONDS and LRS_CUSTOM_MODEL_STARTUP_TIMEOUT_SECONDS configuration values appropriately. However, if you are finding custom models being killed while they are still in the process of starting up, consider increasing these values. See the table above for descriptions on each configuration. The values can be changed on a live system using System Configuration or via values.yaml.

Note: the LRS probe configuration has changed in 11.2.0 release. Please see the Configuration Values section.

Serving LLMs over multiple GPUs

In 10.2 and beyond, it is possible to serve a single model spread across multiple GPUs as long as the GPUs are present in the same node. In addition, it is recommended to serve an LLM over a homogenous set of GPUs. To run an LLM over multiple GPUs simply select a resource bundle with more than one GPU allocated, i.e. "gpu_count": 2 and they will automatically be reserved by Kubernetes and the inference server will make use of them.

Be aware that a single, large GPU will have better throughput performance than multiple small GPUs but you may see better latency characteristics using multiple, smaller GPUs.

Model-level training data assignment removal in 10.1

Starting in version 10.1, the model-level training data assignment API (api/v2/customModels/<custom model ID>/trainingData/) is removed after 3 deprecation cycles. Requests to the API return a 404. The new model version-level training data assignment must be used instead.

If your organization has critical flows using this API, it can be re-enabled for a 1-month grace period to migrate to the new logic. To enable the deprecated API, set the following feature flag configuration: DISABLE_CUSTOM_MODEL_DISABLE_MODEL_LEVEL_TRAINING_DATA_ASSIGNMENT = True.

High Availability scenarios for Custom Models

To enable high availability within the cluster, two configuration flags are available. These settings leverage Kubernetes Pod Disruption Budgets (PDBs) and pod anti-affinity rules.

Pod Disruption Budget Configuration

ENABLE_PDB_FOR_CUSTOM_MODELS_LRS_WITH_MULTIPLE_REPLICAS controls the application of PDB policies for custom model deployments. When enabled, any custom model with two or more replicas is assigned a PDB with maxUnavailable = 1. This ensures that Kubernetes will allow only a single replica to be taken down at any given time. By default, this setting is disabled.

Pod Anti-Affinity Configuration

Pod anti-affinity behavior is governed by the ENABLE_HA_FOR_ALL_MODEL_DEPLOYMENTS flag. When enabled, each replica of a custom model is scheduled onto a distinct node. The configuration uses requiredDuringSchedulingIgnoredDuringExecution, meaning Kubernetes can schedule only as many replicas as there are nodes available. For example, if a deployment requests four replicas but the cluster has only three nodes, only three replicas will be scheduled. By default, this setting is disabled.