# Runtime settings

> Runtime settings - Change importance, metadata, runtime, autoscaling, and resources on a Workload.

This Markdown file sits beside the HTML page at the same path (with a `.md` suffix). It summarizes the topic and lists links for tools and LLM context.

Companion generated at `2026-06-22T16:50:38.254345+00:00` (UTC).

## Primary page

- [Runtime settings](https://docs.datarobot.com/en/docs/workload-api/operate-workloads/runtime-settings.html.md): Full documentation for this topic (Markdown sidecar).

## Sections on this page

- [Importance and metadata](https://docs.datarobot.com/en/docs/workload-api/operate-workloads/runtime-settings.html.md#importance-metadata): In-page section heading.
- [Runtime, autoscaling, and replicas](https://docs.datarobot.com/en/docs/workload-api/operate-workloads/runtime-settings.html.md#runtime-autoscaling): In-page section heading.
- [Scaling metrics](https://docs.datarobot.com/en/docs/workload-api/operate-workloads/runtime-settings.html.md#scaling-metrics): In-page section heading.
- [Resource allocation and bundles](https://docs.datarobot.com/en/docs/workload-api/operate-workloads/runtime-settings.html.md#resource-bundles): In-page section heading.

## Related documentation

- [Workload API](https://docs.datarobot.com/en/docs/workload-api/index.html.md): Linked from this page.
- [Operate running Workloads](https://docs.datarobot.com/en/docs/workload-api/operate-workloads/index.html.md): Linked from this page.
- [Replace and roll out](https://docs.datarobot.com/en/docs/workload-api/update-workloads/replace-artifact-rollouts.html.md): Linked from this page.
- [Tutorial: Deploy a production-ready container](https://docs.datarobot.com/en/docs/workload-api/create-workloads/tutorial-production-ready-container.html.md): Linked from this page.
- [Replace with autoscaling settings](https://docs.datarobot.com/en/docs/workload-api/update-workloads/tutorial-replace-artifacts.html.md#replace-with-autoscaling-settings): Linked from this page.
- [Configure autoscaling](https://docs.datarobot.com/en/docs/workload-api/operate-workloads/operate-ui.html.md#configure-autoscaling): Linked from this page.

## Documentation content

A running Workload's configuration is mutable, but the mutations split across two endpoints with different semantics. Identity fields—name, description, importance—update in place via `PATCH /workloads/{id}`. Runtime and resource changes—replica count, autoscaling, CPU, memory, GPU, and bundle selection—go through `PATCH /workloads/{id}/settings` and queue a rolling replacement so the Workload stays available during the swap. The sections that follow cover the fields exposed at each layer and the PATCH body shapes.

## Importance and metadata

Use `PATCH /workloads/{id}` to update `name`, `description`, and `importance`. The `importance` field is mutable on a running Workload:

```
curl -X PATCH "${DATAROBOT_ENDPOINT}/workloads/${WORKLOAD_ID}" \
  -H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{"importance": "critical"}'
```

## Runtime, autoscaling, and replicas

Update a Workload's runtime configuration (replica count, autoscaling policies, per-container resource allocation, resource bundles) with `PATCH /workloads/{workload_id}/settings`. The PATCH returns a `Replacement` (HTTP 202) and queues a rolling replacement onto the new runtime; see [Replace and roll out](https://docs.datarobot.com/en/docs/workload-api/update-workloads/replace-artifact-rollouts.html.md).

All runtime fields are nested under `runtime.containerGroups[]`, with each entry matched to the artifact's container group by `name`:

```
curl -X PATCH "${DATAROBOT_ENDPOINT}/workloads/${WORKLOAD_ID}/settings" \
  -H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{
    "runtime": {
      "containerGroups": [{
        "name": "default",
        "replicaCount": 5,
        "autoscaling": {"enabled": false}
      }]
    }
  }'
```

You can also set autoscaling policies at Workload creation time inside the initial `POST /workloads/` payload—see [Tutorial: Deploy a production-ready container](https://docs.datarobot.com/en/docs/workload-api/create-workloads/tutorial-production-ready-container.html.md).

### Scaling metrics

Each `AutoscalingPolicy` sets `scalingMetric` to one of the predefined values in the following table, or (NIM 2.0 only) a custom metric name string such as `vllm:kv_cache_usage_perc`.

| scalingMetric | Applies to | Behavior |
| --- | --- | --- |
| cpuAverageUtilization | All artifacts | Scales replicas to maintain a target average CPU utilization across pods. Does not scale to zero; minCount must be at least 1. |
| httpRequestsConcurrency | All artifacts | Scales replicas based on concurrent HTTP requests. Scales to zero replicas when the proton is idle—set minCount: 0 to allow scale-to-zero. Other metrics keep at least one replica. |
| gpuCacheUtilization | NIM artifacts only | Scales on model GPU memory cache utilization when the runtime exposes it. |
| gpuRequestQueueDepth | NIM artifacts only | Scales on inference request queue depth. |

For scale-to-zero examples, see [Replace with autoscaling settings](https://docs.datarobot.com/en/docs/workload-api/update-workloads/tutorial-replace-artifacts.html.md#replace-with-autoscaling-settings). The Console maps the same values in [Configure autoscaling](https://docs.datarobot.com/en/docs/workload-api/operate-workloads/operate-ui.html.md#configure-autoscaling).

## Resource allocation and bundles

All resource configuration is runtime-side—the artifact describes container topology only and carries no CPU, memory, or GPU fields. The runtime declares resources at two layers:

| Layer | Field | What it declares |
| --- | --- | --- |
| Per-container | runtime.containerGroups[].containers[].resourceAllocation (a ContainerOverride.resourceAllocation) with cpu, memory, gpu. | What an individual container in the group gets at runtime. Required for multi-container groups. |
| Per-group | runtime.containerGroups[].resourceBundles (array of bundle IDs; supply exactly one) and runtime.containerGroups[].bundleSelectionPolicy (only availability is supported). | The bundle the scheduler places the group on; the applied bundle is reflected in the read-only resolvedBundle field. Passing more than one bundle ID returns a validation error. |

`memory` accepts either a human-readable string with one of `B`, `KB`, `MB`, `GB` (1000-based—for example, `"4GB"`, `"512MB"`) or a raw byte integer.`cpu` has a minimum of `0.1`. The `gpu` field controls GPU count only; GPU model and VRAM are determined entirely by the selected resource bundle —there is no per-container GPU type or VRAM field. Each override's `name` follows DNS-label syntax (lowercase letters, digits, and hyphens; must start with a letter and end with a letter or digit; up to 63 characters) and must match a container declared in the artifact group.

Set both layers on the Workload's `runtime`. Changing them on a running Workload via `PATCH /workloads/{workload_id}/settings` queues a rolling replacement using the strategies in [Replace and roll out](https://docs.datarobot.com/en/docs/workload-api/update-workloads/replace-artifact-rollouts.html.md).

```
curl -X PATCH "${DATAROBOT_ENDPOINT}/workloads/${WORKLOAD_ID}/settings" \
  -H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{
    "runtime": {
      "containerGroups": [{
        "name": "default",
        "resourceBundles": ["cpu.medium"],
        "containers": [{
          "name": "agent",
          "resourceAllocation": {"cpu": 2, "memory": "4GB"}
        }]
      }]
    }
  }'
```
