Runtime settings¶
A running Workload's configuration is mutable, but the mutations split across two endpoints with different semantics. Identity fields—name, description, importance—update in place via PATCH /workloads/{id}. Runtime and resource changes—replica count, autoscaling, CPU, memory, GPU, and bundle selection—go through PATCH /workloads/{id}/settings and queue a rolling replacement so the Workload stays available during the swap. The sections that follow cover the fields exposed at each layer and the PATCH body shapes.
Importance and metadata¶
Use PATCH /workloads/{id} to update name, description, and importance. The importance field is mutable on a running Workload:
curl -X PATCH "${DATAROBOT_ENDPOINT}/workloads/${WORKLOAD_ID}" \
-H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" \
-H "Content-Type: application/json" \
-d '{"importance": "critical"}'
Runtime, autoscaling, and replicas¶
Update a Workload's runtime configuration (replica count, autoscaling policies, per-container resource allocation, resource bundles) with PATCH /workloads/{workload_id}/settings. The PATCH returns a Replacement (HTTP 202) and queues a rolling replacement onto the new runtime; see Replace and roll out.
All runtime fields are nested under runtime.containerGroups[], with each entry matched to the artifact's container group by name:
curl -X PATCH "${DATAROBOT_ENDPOINT}/workloads/${WORKLOAD_ID}/settings" \
-H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" \
-H "Content-Type: application/json" \
-d '{
"runtime": {
"containerGroups": [{
"name": "default",
"replicaCount": 5,
"autoscaling": {"enabled": false}
}]
}
}'
You can also set autoscaling policies at Workload creation time inside the initial POST /workloads/ payload—see Tutorial: Deploy a production-ready container.
Scaling metrics¶
Each AutoscalingPolicy sets scalingMetric to one of the predefined values in the following table, or (NIM 2.0 only) a custom metric name string such as vllm:kv_cache_usage_perc.
scalingMetric |
Applies to | Behavior |
|---|---|---|
cpuAverageUtilization |
All artifacts | Scales replicas to maintain a target average CPU utilization across pods. Does not scale to zero; minCount must be at least 1. |
httpRequestsConcurrency |
All artifacts | Scales replicas based on concurrent HTTP requests. Scales to zero replicas when the proton is idle—set minCount: 0 to allow scale-to-zero. Other metrics keep at least one replica. |
gpuCacheUtilization |
NIM artifacts only | Scales on model GPU memory cache utilization when the runtime exposes it. |
gpuRequestQueueDepth |
NIM artifacts only | Scales on inference request queue depth. |
For scale-to-zero examples, see Replace with autoscaling settings. The Console maps the same values in Configure autoscaling.
Resource allocation and bundles¶
All resource configuration is runtime-side—the artifact describes container topology only and carries no CPU, memory, or GPU fields. The runtime declares resources at two layers:
| Layer | Field | What it declares |
|---|---|---|
| Per-container | runtime.containerGroups[].containers[].resourceAllocation (a ContainerOverride.resourceAllocation) with cpu, memory, gpu. |
What an individual container in the group gets at runtime. Required for multi-container groups. |
| Per-group | runtime.containerGroups[].resourceBundles (array of bundle IDs; supply exactly one) and runtime.containerGroups[].bundleSelectionPolicy (only availability is supported). |
The bundle the scheduler places the group on; the applied bundle is reflected in the read-only resolvedBundle field. Passing more than one bundle ID returns a validation error. |
memory accepts either a human-readable string with one of B, KB, MB, GB (1000-based—for example, "4GB", "512MB") or a raw byte integer. cpu has a minimum of 0.1. The gpu field controls GPU count only; GPU model and VRAM are determined entirely by the selected resource bundle—there is no per-container GPU type or VRAM field. Each override's name follows DNS-label syntax (lowercase letters, digits, and hyphens; must start with a letter and end with a letter or digit; up to 63 characters) and must match a container declared in the artifact group.
Set both layers on the Workload's runtime. Changing them on a running Workload via PATCH /workloads/{workload_id}/settings queues a rolling replacement using the strategies in Replace and roll out.
curl -X PATCH "${DATAROBOT_ENDPOINT}/workloads/${WORKLOAD_ID}/settings" \
-H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" \
-H "Content-Type: application/json" \
-d '{
"runtime": {
"containerGroups": [{
"name": "default",
"resourceBundles": ["cpu.medium"],
"containers": [{
"name": "agent",
"resourceAllocation": {"cpu": 2, "memory": "4GB"}
}]
}]
}
}'