Skip to content

Runtime settings

A running Workload's configuration is mutable, but the mutations split across two endpoints with different semantics. Identity fields—name, description, importance—update in place via PATCH /workloads/{id}. Runtime and resource changes—replica count, autoscaling, CPU, memory, GPU, and bundle selection—go through PATCH /workloads/{id}/settings and queue a rolling replacement so the Workload stays available during the swap. The sections that follow cover the fields exposed at each layer and the PATCH body shapes.

Importance and metadata

Use PATCH /workloads/{id} to update name, description, and importance. The importance field is mutable on a running Workload:

curl -X PATCH "${DATAROBOT_ENDPOINT}/workloads/${WORKLOAD_ID}" \
  -H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{"importance": "critical"}' 

Runtime, autoscaling, and replicas

Update a Workload's runtime configuration (replica count, autoscaling policies, per-container resource allocation, resource bundles) with PATCH /workloads/{workload_id}/settings. The PATCH returns a Replacement (HTTP 202) and queues a rolling replacement onto the new runtime; see Replace and roll out.

All runtime fields are nested under runtime.containerGroups[], with each entry matched to the artifact's container group by name:

curl -X PATCH "${DATAROBOT_ENDPOINT}/workloads/${WORKLOAD_ID}/settings" \
  -H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{
    "runtime": {
      "containerGroups": [{
        "name": "default",
        "replicaCount": 5,
        "autoscaling": {"enabled": false}
      }]
    }
  }' 

You can also set autoscaling policies at Workload creation time inside the initial POST /workloads/ payload—see Tutorial: Deploy a production-ready container.

Scaling metrics

Each AutoscalingPolicy sets scalingMetric to one of the predefined values in the following table, or (NIM 2.0 only) a custom metric name string such as vllm:kv_cache_usage_perc.

scalingMetric Applies to 動作
cpuAverageUtilization All artifacts Scales replicas to maintain a target average CPU utilization across pods. Does not scale to zero; minCount must be at least 1.
httpRequestsConcurrency All artifacts Scales replicas based on concurrent HTTP requests. Scales to zero replicas when the proton is idle—set minCount: 0 to allow scale-to-zero. Other metrics keep at least one replica.
gpuCacheUtilization NIM artifacts only Scales on model GPU memory cache utilization when the runtime exposes it.
gpuRequestQueueDepth NIM artifacts only Scales on inference request queue depth.

For scale-to-zero examples, see Replace with autoscaling settings. The Console maps the same values in Configure autoscaling.

Resource allocation and bundles

All resource configuration is runtime-side—the artifact describes container topology only and carries no CPU, memory, or GPU fields. The runtime declares resources at two layers:

Layer フィールド What it declares
Per-container runtime.containerGroups[].containers[].resourceAllocation (a ContainerOverride.resourceAllocation) with cpu, memory, gpu. What an individual container in the group gets at runtime. Required for multi-container groups.
Per-group runtime.containerGroups[].resourceBundles (array of bundle IDs; supply exactly one) and runtime.containerGroups[].bundleSelectionPolicy (only availability is supported). The bundle the scheduler places the group on; the applied bundle is reflected in the read-only resolvedBundle field. Passing more than one bundle ID returns a validation error.

memory accepts either a human-readable string with one of B, KB, MB, GB (1000-based—for example, "4GB", "512MB") or a raw byte integer. cpu has a minimum of 0.1. The gpu field controls GPU count only; GPU model and VRAM are determined entirely by the selected resource bundle—there is no per-container GPU type or VRAM field. Each override's name follows DNS-label syntax (lowercase letters, digits, and hyphens; must start with a letter and end with a letter or digit; up to 63 characters) and must match a container declared in the artifact group.

Set both layers on the Workload's runtime. Changing them on a running Workload via PATCH /workloads/{workload_id}/settings queues a rolling replacement using the strategies in Replace and roll out.

curl -X PATCH "${DATAROBOT_ENDPOINT}/workloads/${WORKLOAD_ID}/settings" \
  -H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{
    "runtime": {
      "containerGroups": [{
        "name": "default",
        "resourceBundles": ["cpu.medium"],
        "containers": [{
          "name": "agent",
          "resourceAllocation": {"cpu": 2, "memory": "4GB"}
        }]
      }]
    }
  }'