Workload API > Operate running Workloads > Runtime settings

Runtime settings¶

A running Workload's configuration is mutable, but the mutations split across two endpoints with different semantics. Identity fields—name, description, importance—update in place via PATCH /workloads/{id}. Runtime and resource changes—replica count, autoscaling, CPU, memory, GPU, and bundle selection—go through PATCH /workloads/{id}/settings and queue a rolling replacement so the Workload stays available during the swap. The sections that follow cover the fields exposed at each layer and the PATCH body shapes.

Importance and metadata¶

Use PATCH /workloads/{id} to update name, description, and importance. The importance field is mutable on a running Workload:

curl -X PATCH "${DATAROBOT_ENDPOINT}/workloads/${WORKLOAD_ID}" \
  -H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{"importance": "critical"}'

Runtime, autoscaling, and replicas¶

Update a Workload's runtime configuration (replica count, autoscaling policies, per-container resource allocation, resource bundles) with PATCH /workloads/{workload_id}/settings. The PATCH returns a Replacement (HTTP 202) and queues a rolling replacement onto the new runtime; see Replace and roll out.

All runtime fields are nested under runtime.containerGroups[], with each entry matched to the artifact's container group by name:

# Fixed scaling
curl -X PATCH "${DATAROBOT_ENDPOINT}/workloads/${WORKLOAD_ID}/settings" \
  -H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{
    "runtime": {
      "containerGroups": [{
        "name": "default",
        "replicaCount": 5
      }]
    }
  }'

# Dynamic scaling
curl -X PATCH "${DATAROBOT_ENDPOINT}/workloads/${WORKLOAD_ID}/settings" \
  -H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{
    "runtime": {
      "containerGroups": [{
        "name": "default",
        "autoscaling": {
          "enabled": true,
          "minReplicaCount": 2,
          "maxReplicaCount": 10,
          "policies": [{
            "scalingMetric": "httpRequestsConcurrency",
            "target": 20
          }]
        }
      }]
    }
  }'

You can also set autoscaling policies at Workload creation time inside the initial POST /workloads/ payload—see Tutorial: Deploy a production-ready container.

Scaling metrics¶

The autoscaling object accepts minReplicaCount and maxReplicaCount at the same level as policies—these set the replica bounds for the Workload regardless of how many policies are defined. Each AutoscalingPolicy then sets scalingMetric and target only. maxReplicaCount must be at least 1; minReplicaCount must be less than or equal to maxReplicaCount. Custom metric names (NIM 2.0 only) follow Prometheus/OpenMetrics naming conventions: up to 63 characters, matching the pattern [a-zA-Z_:][a-zA-Z0-9_:]*.

replicaCount and autoscaling are mutually exclusive

Setting replicaCount alongside autoscaling.enabled: true returns a 422. Use one or the other: replicaCount for fixed scaling, autoscaling for dynamic scaling.

`scalingMetric`	Applies to	Behavior
`cpuAverageUtilization`	All artifacts	Scales replicas to maintain a target average CPU utilization across pods. Does not scale to zero; `minReplicaCount` must be at least 1.
`httpRequestsConcurrency`	All artifacts	Scales replicas based on concurrent HTTP requests. Scales to zero replicas when the proton is idle—set `minReplicaCount: 0` to allow scale-to-zero. Other metrics keep at least one replica.
`gpuCacheUtilization`	NIM artifacts only	Scales on model GPU memory cache utilization when the runtime exposes it.
`gpuRequestQueueDepth`	NIM artifacts only	Scales on inference request queue depth.

For scale-to-zero examples, see Configure autoscaling in the same call. The Console maps the same values in Configure autoscaling.

Self-managed clusters

Scale-to-zero for protons (httpRequestsConcurrency with minReplicaCount: 0) requires the KEDA HTTP Add-on v0.12.0 or later. See KEDA HTTP Add-on version requirements.

Resource allocation and bundles¶

All resource configuration is runtime-side—the artifact describes container topology only and carries no CPU, memory, or GPU fields. The runtime declares resources at two layers:

Layer	Field	What it declares
Per-container	`runtime.containerGroups[].containers[].resourceAllocation` (a `ContainerOverride.resourceAllocation`) with `cpu`, `memory`, `gpu`.	What an individual container in the group gets at runtime. Required for multi-container groups.
Per-group	`runtime.containerGroups[].resourceBundles` (array of bundle IDs; supply exactly one) and `runtime.containerGroups[].bundleSelectionPolicy` (only `availability` is supported).	The bundle the scheduler places the group on; the applied bundle is reflected in the read-only `resolvedBundle` field. Passing more than one bundle ID returns a validation error.

resourceAllocation is requested; resolvedBundle is actual

The resourceAllocation fields in an API response carry the values you requested at runtime configuration time. The read-only resolvedBundle field shows the bundle the scheduler actually placed the container group on, which determines the CPU, memory, and GPU the pod receives. These may differ—for example, requesting cpu: 0.5 may resolve to a cpu.micro bundle that provides 1.0 core. Always read resolvedBundle to verify what was actually provisioned.

To list the resource bundle IDs available for Workloads, query GET /mlops/compute/bundles/ with useCases=workload:

curl -s "${DATAROBOT_ENDPOINT}/mlops/compute/bundles/?useCases=workload" \
  -H "Authorization: Bearer ${DATAROBOT_API_TOKEN}"

Only bundles returned by this query are valid values for runtime.containerGroups[].resourceBundles.

memory accepts either a human-readable string with one of B, KB, MB, GB (1000-based—for example, "4GB", "512MB") or a raw byte integer. Kubernetes-style binary suffixes (Mi, Gi) are not supported and return a validation error. cpu has a minimum of 0.1. The gpu field controls GPU count only; GPU model and VRAM are determined entirely by the selected resource bundle—there is no per-container GPU type or VRAM field. Each override's name follows DNS-label syntax (lowercase letters, digits, and hyphens; must start with a letter and end with a letter or digit; up to 63 characters) and must match a container declared in the artifact group.

Set both layers on the Workload's runtime. Changing them on a running Workload via PATCH /workloads/{workload_id}/settings queues a rolling replacement using the strategies in Replace and roll out.

curl -X PATCH "${DATAROBOT_ENDPOINT}/workloads/${WORKLOAD_ID}/settings" \
  -H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{
    "runtime": {
      "containerGroups": [{
        "name": "default",
        "resourceBundles": ["cpu.medium"],
        "containers": [{
          "name": "agent",
          "resourceAllocation": {"cpu": 2, "memory": "4GB"}
        }]
      }]
    }
  }'