Custom models configuration¶

Custom Models can be configured with different configs.

依存関係¶

Image build service installed.
NGINX ingress controller installed with a valid hostname for the NGINX service (with HTTPS trusted by DR Core).

Configuration values¶

To configure these option, refer to the Tuning DataRobot Environment Variables section of this guide.

設定	説明	デフォルト
`IMAGE_BUILDER_SERVICE_URL`	URL for the image build service	`"http://build-service"`
`IMAGE_BUILDER_CUSTOM_MODELS_REGISTRY_HOST`	Hostname for registry used for custom model images	DOCKER_REGISTRY_URL installation variable
`IMAGE_BUILDER_CUSTOM_MODELS_REGISTRY_REPO`	Repository in registry for custom model images	`managed-image`
`IMAGE_BUILDER_EPHEMERAL_CUSTOM_MODELS_REGISTRY_REPO`	Repository in registry for ephemeral custom model images	`ephemeral-image`
`IMAGE_BUILDER_CUSTOM_MODELS_ENVIRONMENT_REGISTRY_REPO`	Repository in registry for environment images	`base-image`
`IMAGE_BUILDER_WORKERS`	Number of concurrent image build workers	4
`CUSTOM_MODELS_FQDN`	NGINX Ingress controller service hostname(1)	Value of `global.domain` in the umbrella charts
`LONG_RUNNING_SERVICES_OPERATOR_INSTANCE_NAME`	Namespace where the LRS operator / chart is installed	Namespace of the DR Core installation
`LONG_RUNNING_SERVICES_RESOURCES_NAMESPACE`	Namespace where the model deployments are created.	The value of `LONG_RUNNING_SERVICES_OPERATOR_INSTANCE_NAME`
`LONG_RUNNING_SERVICES_SECURITY_CONTEXT_USER_ID`	User ID for built model images + `runAsUser` on deployment	`1000`
`LONG_RUNNING_SERVICES_SECURITY_CONTEXT_GROUP_ID`	Group ID for built model images + `runAsGroup` on deployment	`1000`
`LONG_RUNNING_SERVICES_READY_TIMEOUT_SECONDS`	The number of seconds to wait for a LRS resource to become ready. This time can include waiting for a new node to be spun up (if cluster auto-scaling is supported) and for the image to be pulled (if it's not already cached on the node) and finally for any configured health checks to succeed.	1920
`LRS_CUSTOM_MODEL_STARTUP_TIMEOUT_SECONDS`	The number of seconds to wait for the LRS health status to return success.	960
`LRS_CUSTOM_MODEL_STARTUP_PROBE_DISABLED`	The boolean to enable startup probe.	False
`LRS_CUSTOM_MODEL_STARTUP_PROBE_PORT`	Port for startup probe	8080
`LRS_CUSTOM_MODEL_STARTUP_PROBE_INITIAL_DELAY_SECONDS`	Defines how much time the startup probe should wait until sending requests	1
`LRS_CUSTOM_MODEL_STARTUP_PROBE_TIMEOUT_SECONDS`	Defines timeout for each startup probe request	5
`LRS_CUSTOM_MODEL_STARTUP_PROBE_PERIOD_SECONDS`	Defines the period between the consequential requests	10
`LRS_CUSTOM_MODEL_STARTUP_PROBE_SUCCESS_THRESHOLD`	Defines the amount of successful responses to claim the container started	1
`LRS_CUSTOM_MODEL_STARTUP_PROBE_FAILURE_THRESHOLD`	Defines the amount of failed responses to claim the container failed	360
`LRS_CUSTOM_MODEL_READINESS_PROBE_DISABLED`	The boolean to enable readiness probe	False
`LRS_CUSTOM_MODEL_READINESS_PROBE_PORT`	Port for readiness probe	8080
`LRS_CUSTOM_MODEL_READINESS_PROBE_INITIAL_DELAY_SECONDS`	Defines how much time the readiness probe should wait until sending requests	1
`LRS_CUSTOM_MODEL_READINESS_PROBE_TIMEOUT_SECONDS`	Defines timeout for each readiness probe request	5
`LRS_CUSTOM_MODEL_READINESS_PROBE_PERIOD_SECONDS`	Defines the period between the consequential requests	5
`LRS_CUSTOM_MODEL_READINESS_PROBE_SUCCESS_THRESHOLD`	Defines the amount of successful responses to claim the container ready	1
`LRS_CUSTOM_MODEL_READINESS_PROBE_FAILURE_THRESHOLD`	Defines the amount of failed responses to claim the container failed	6
`LRS_CUSTOM_MODEL_LIVENESS_PROBE_DISABLED`	The boolean to enable liveness probe	False
`LRS_CUSTOM_MODEL_LIVENESS_PROBE_PORT`	Port for liveness probe	8080
`LRS_CUSTOM_MODEL_LIVENESS_PROBE_INITIAL_DELAY_SECONDS`	Defines how much time the liveness probe should wait until sending requests	3
`LRS_CUSTOM_MODEL_LIVENESS_PROBE_TIMEOUT_SECONDS`	Defines timeout for each liveness probe request	5
`LRS_CUSTOM_MODEL_LIVENESS_PROBE_PERIOD_SECONDS`	Defines the period between the consequential requests	10
`LRS_CUSTOM_MODEL_LIVENESS_PROBE_SUCCESS_THRESHOLD`	Defines the amount of successful responses to claim the container alive	1
`LRS_CUSTOM_MODEL_LIVENESS_PROBE_FAILURE_THRESHOLD`	Defines the amount of failed responses to claim the container failed	30
`LRS_CUSTOM_MODEL_DEFAULT_NIM_HEALTH_ENDPOINT`	Defines a default Nvidia health check endpoint that would be used for all probes if a custom endpoint isn't provided through the templates.	/v1/health/ready
`LRS_CUSTOM_MODEL_DEFAULT_NIM_PORT`	Defines a default Nvidia health check port that would be used for all probes if a custom port isn't provided through the templates.	8000
`ENABLE_PDB_FOR_CUSTOM_MODELS_LRS_WITH_MULTIPLE_REPLICAS`	The boolean to enable PDB for newly created model deployments LRSes with more than one replica	False
`ENABLE_HA_FOR_ALL_MODEL_DEPLOYMENTS`	The boolean to enable HA affinity rules for all newly created model deployments LRSes	False

(1) See Ingress Settings for details on ingress-nginx. The hostname must support HTTPS. (2) LRS_CUSTOM_MODEL_STARTUP_TIMEOUT_SECONDS variable was removed. Instead, please use LRS_CUSTOM_MODEL_STARTUP_PROBE_PERIOD_SECONDS and LRS_CUSTOM_MODEL_STARTUP_PROBE_FAILURE_THRESHOLD variables to define this variable (startup_timeout_seconds = startup_probe_period_seconds x startup_probe_failure_threshold).

Image build performance tuning¶

The image build worker (queue-exec-manager-build-service) starts build jobs by making calls to the image build service. Relevant configurations and chart values for image building:

By default, each worker pod is set to run a max of four concurrent builds. The concurrency level can be tuned using the environment variable IMAGE_BUILDER_WORKERS listed in Configuration Values section.
The replica count for queue-exec-manager-build-service can be changed by setting the value queue-exec-manager.component.build-service.replicaCount to the desired amount at install time in the umbrella chart.

The image build service manages resources and timeout for build pods. The resources can be tuned by setting the build-service.imageBuilder values in the umbrella chart. The following are the relevant values for tweaking the performance of image builds:

build-service:
  imageBuilder:
    podTimeout: "14400000"
    resources:
      limits:
        memory: "4G"
        cpu: "3"
      requests:
        memory: "4G"
        cpu: "3"

Prediction server performance tuning¶

For installations that are mostly used for serving custom models as opposed to native DataRobot models, tune the amount of API workers to 2-3x the number of cores allocated to the prediction server, since the workers is mostly I/O-heavy.

# datarobot umbrella chart values.yaml excerpt

prediction-server:
  component:
    server:
      # provided that prediction-server.component.server.computeResources.cpu is 2
      predictionApiWorkers: "6"

In case of mixed native/custom model usage and autoscaling enabled for the prediction server, for smoother scaling, lower the CPU threshold:

# datarobot umbrella chart values.yaml excerpt

prediction-server:
  autoscaling:
    targetCPUUtilizationPercentage: 35

Service account for model deployments¶

A default service account is used for all custom model deployments / LRS resources. The service account can be changed by setting the value lrs-operator.operator.config.defaultServiceAccountName at install time in the umbrella chart.

ネットワークアクセス¶

Custom Models support the configuration of network isolation for models. This functionality is based on standard Kubernetes network policies. The cluster must be running a plugin that supports network policies for this to be enforced (e.g. Project Calico).

If you have Calico, Cilium, or anything else that enforces network policies, the DataRobot-provided policies for custom models require no further configuration. By default a deny-all policy is created as part of installing custom models:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  annotations:
    ...
  labels:
    ...
  name: default-lrs-resource-denyall-policy
  namespace: ..
spec:
  podSelector:
    matchLabels:
      datarobot-type: lrs
  policyTypes:
  - Egress
  - Ingress

GPU Configuration¶

Starting from version 10.0, DataRobot supports inference on GPU in Custom Models as a premium entitlement. After assuring all the steps below are complete, the customer also needs the ENABLE_CUSTOM_MODEL_GPU_INFERENCE feature flag (and it's dependencies) enabled as well to be able to use GPU resource bundles with custom models.

GPU Requirements¶

For simple deep learning models or LLMs that are no larger than 8 billion parameters, DataRobot recommends using the following hardware bundles:

Bundle Name	GPU Device	GPU Memory	RAM	CPUコア	ストレージ	VM Types
GPU - S	Nvidia T4	16GiB	16GiB	4	200GiB	Azure: Standard_NC4as_T4_v3 AWS: g4dn.xlarge Google Cloud: n1-standard-4
GPU - M	Nvidia T4 or Nvidia L4	16GiB	32GiB	8	200GiB	Azure: Standard_NC8as_T4_v3 AWS: g4dn.2xlarge Google Cloud: g2-standard-8
GPU - L	Nvidia A10 or Nvidia A100	24GiB	32GiB	8	200GiB	Azure: Standard_NC24ads_A100_v4 AWS: g5.2xlarge Google Cloud: a2-highgpu-1g

If your hardware is substantially different from the recommended bundles above or you are interested in deploying LLMs that are larger than 8 billion parameters, please reach out to your DataRobot CFDS representative to identify an optimal configuration.

Enable the use of GPUs¶

After you have prepared your Kubernetes cluster and your GPU nodes (see general GPU instructions), enable the usage of GPU nodes for the Custom Models in the values.yaml file of the DataRobot helm chart. Please use the configuration example below as a starting point.

core:
  config_env_vars:
    LRS_GPU_CONTAINER_SIZES: |-
      [
        {
          "id": "gpu.medium",
          "name": "GPU - M",
          "description": "1 x NVIDIA A10 | 24GB VRAM | 8 CPU | 32GB RAM",
          "gpu_maker": "nvidia",
          "gpu_type_label": "nvidia-a10g-1x",
          "gpu_count": 1,
          "gpu_memory_mb": 24576,
          "cpu_count": 8,
          "memory_mb": 32768,
          "use_cases": ["customModel"]
        },
        {
          "id": "h100.one",
          "name": "GPU - L",
          "description": "1 x NVIDIA H100 | 80GB VRAM | 8 CPU | 32GB RAM",
          "gpu_maker": "nvidia",
          "gpu_type_label": "nvidia-h100-1x",
          "gpu_count": 1,
          "gpu_memory_mb": 81920,
          "cpu_count": 8,
          "memory_mb": 32768,
          "use_cases": ["customModel"]
        },
        {
          "id": "h100.two",
          "name": "GPU - 2XL",
          "description": "2 x NVIDIA H100 | 160GB VRAM | 16 CPU | 64GB RAM",
          "gpu_maker": "nvidia",
          "gpu_type_label": "nvidia-h100-2x",
          "gpu_count": 2,
          "gpu_memory_mb": 163840,
          "cpu_count": 16,
          "memory_mb": 65536,
          "use_cases": ["customModel"]
        }
      ]
lrs-operator:
  operator:
    config:
      nodeGroups: |-
        {
          "labelToNodeGroupMap": {
            "datarobot-gpu-type=nvidia-h100-1x": "gpu-h100-1x",
            "datarobot-gpu-type=nvidia-h100-2x": "gpu-h100-2x",
            "datarobot-gpu-type=nvidia-a10g-1x": "gpu-a10-1x"
          },
          "nodeGroupsByName":  {
            "gpu-h100-1x": {
              "taint": "nvidia.com/gpu=true:NoExecute",
              "affinityLabel": "intent=1xH100GpuWorker"
            },
            "gpu-h100-2x": {
              "taint": "nvidia.com/gpu=true:NoExecute",
              "affinityLabel": "intent=2xH100GpuWorker"
            },
            "gpu-a10-1x": {
              "taint": "nvidia.com/gpu=true:NoExecute",
              "affinityLabel": "intent=1xA10GpuWorker"
            }
          }
        }

備考： - Field id must be unique across all the items of the list. - Fields name and description is displayed in UI. - Field gpu_maker must be "nvidia". - Field gpu_type_label must have a matching item under labelToNodeGroupMap. - Keys under nodeGroupsByName should match values in labelToNodeGroupMap. - Value of sub-field taint within nodeGroupsByName is applied as a toleration to the corresponding Kubernetes pods. If more than one toleration need to be applied, such tolerations need to be comma-separated. - Value of sub-field affinityLabel within nodeGroupsByName is applied as requiredDuringSchedulingIgnoredDuringExecution node affinity type. Only one affinity item is supported. - Field gpu_memory_mb is used to help recommend the appropriate resource bundles for a given LLM based on heuristics. - Fields LRS_GPU_CONTAINER_SIZES and nodeGroups are JSON strings so take care to make sure it's valid (i.e. no trailing commas). Incorrect syntax results in an error being logged but the application still starts.

LLM Startup tuning¶

In 10.2, custom model startup was moved to a dedicated queue so slow-to-start models don't block other operations in the application. The LONG_RUNNING_SERVICES_READY_TIMEOUT_SECONDS and LRS_CUSTOM_MODEL_STARTUP_TIMEOUT_SECONDS configuration values have been tuned appropriately. However, if you are finding custom models being killed while they're still in the process of starting up, consider increasing these values. See the table above for descriptions on each configuration. The values can be changed on a live system using System Configuration or via values.yaml.

Note: the LRS probe configuration has changed in 11.2.0 release. Please see the Configuration Values section.

Serving LLMs over multiple GPUs¶

In 10.2 and beyond, it's possible to serve a single model spread across multiple GPUs as long as the GPUs are present in the same node. In addition, it's recommended to serve an LLM over a homogeneous set of GPUs. To run an LLM over multiple GPUs simply select a resource bundle with more than one GPU allocated, i.e. "gpu_count": 2 and they're automatically reserved by Kubernetes and the inference server makes use of them.

Be aware that a single, large GPU has better throughput performance than multiple small GPUs but you may see better latency characteristics using multiple, smaller GPUs.

Model-level training data assignment removal in 10.1¶

Starting in version 10.1, the model-level training data assignment API (api/v2/customModels/<custom model ID>/trainingData/) is removed after 3 deprecation cycles. Requests to the API return a 404. The new model version-level training data assignment must be used instead.

If your organization has critical flows using this API, it can be re-enabled for a 1-month grace period to migrate to the new logic. To enable the deprecated API, set the following feature flag configuration: DISABLE_CUSTOM_MODEL_DISABLE_MODEL_LEVEL_TRAINING_DATA_ASSIGNMENT = True.

High availability scenarios for Custom models¶

To enable high availability within the cluster, two configuration flags are available. These settings leverage Kubernetes Pod Disruption Budgets (PDBs) and pod anti-affinity rules.

Pod disruption budget configuration¶

ENABLE_PDB_FOR_CUSTOM_MODELS_LRS_WITH_MULTIPLE_REPLICAS controls the application of PDB policies for custom model deployments. When enabled, any custom model with two or more replicas is assigned a PDB with maxUnavailable = 1. This ensures that Kubernetes allows only a single replica to be taken down at any given time. By default, this setting is disabled.

Pod anti-affinity configuration¶

Pod anti-affinity behavior is governed by the ENABLE_HA_FOR_ALL_MODEL_DEPLOYMENTS flag. When enabled, each replica of a custom model is scheduled onto a distinct node. The configuration uses requiredDuringSchedulingIgnoredDuringExecution, meaning Kubernetes can schedule only as many replicas as there are nodes available. For example, if a deployment requests four replicas but the cluster has only three nodes, only three replicas is scheduled. By default, this setting is disabled.