Workload API > Operate running Workloads > Runtime settings

Runtime settings¶

A running Workload's configuration is mutable, but the mutations split across two endpoints with different semantics. Identity fields—name, description, importance—update in place via PATCH /workloads/{id}. Runtime and resource changes—replica count, autoscaling, CPU, memory, GPU, and bundle selection—go through PATCH /workloads/{id}/settings and queue a rolling replacement so the Workload stays available during the swap. The sections that follow cover the fields exposed at each layer and the PATCH body shapes.

Importance and metadata¶

Use PATCH /workloads/{id} to update name, description, and importance. The importance field is mutable on a running Workload:

curl -X PATCH "${DATAROBOT_ENDPOINT}/workloads/${WORKLOAD_ID}" \
  -H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{"importance": "critical"}'

Runtime, autoscaling, and replicas¶

Update a Workload's runtime configuration (replica count, autoscaling policies, per-container resource allocation, resource bundles) with PATCH /workloads/{workload_id}/settings. The PATCH returns a Replacement (HTTP 202) and queues a rolling replacement onto the new runtime; see Replace and roll out.

All runtime fields are nested under runtime.containerGroups[], with each entry matched to the artifact's container group by name:

# Fixed scaling
curl -X PATCH "${DATAROBOT_ENDPOINT}/workloads/${WORKLOAD_ID}/settings" \
  -H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{
    "runtime": {
      "containerGroups": [{
        "name": "default",
        "replicaCount": 5
      }]
    }
  }'

# Dynamic scaling
curl -X PATCH "${DATAROBOT_ENDPOINT}/workloads/${WORKLOAD_ID}/settings" \
  -H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{
    "runtime": {
      "containerGroups": [{
        "name": "default",
        "autoscaling": {
          "enabled": true,
          "minReplicaCount": 2,
          "maxReplicaCount": 10,
          "policies": [{
            "scalingMetric": "httpRequestsConcurrency",
            "target": 20
          }]
        }
      }]
    }
  }'

You can also set autoscaling policies at Workload creation time inside the initial POST /workloads/ payload—see Tutorial: Deploy a production-ready container.

Scaling metrics¶

autoscalingオブジェクトは、policiesと同じレベルでminReplicaCountおよびmaxReplicaCountを受け付けます。これらは、定義されているポリシーの数に関係なく、ワークロードのレプリカの上限と下限を設定します。各AutoscalingPolicyは、scalingMetricとtargetのみを設定します。 maxReplicaCountは1以上である必要があります。また、minReplicaCountはmaxReplicaCount以下である必要があります。カスタムメトリクス名（NIM 2.0のみ）は、Prometheus/OpenMetricsの命名規則に従います。最大63文字で、[a-zA-Z_:][a-zA-Z0-9_:]*というパターンに一致する必要があります。

replicaCountとautoscalingは同時に使用できません

replicaCountとautoscaling.enabled: trueを同時に設定すると、422が返されます。どちらか一方を使用してください。固定スケーリングにはreplicaCountを、動的スケーリングにはautoscalingを使用します。

`scalingMetric`	Applies to	動作
`cpuAverageUtilization`	All artifacts	Scales replicas to maintain a target average CPU utilization across pods. ゼロまでスケーリングすることはできません。`minReplicaCount`は1以上である必要があります。
`httpRequestsConcurrency`	All artifacts	Scales replicas based on concurrent HTTP requests. Protonがアイドル状態のときは、レプリカ数をゼロまでスケーリングします。ゼロスケーリングを許可するには、`minReplicaCount: 0`を設定します。 Other metrics keep at least one replica.
`gpuCacheUtilization`	NIM artifacts only	Scales on model GPU memory cache utilization when the runtime exposes it.
`gpuRequestQueueDepth`	NIM artifacts only	Scales on inference request queue depth.

ゼロスケーリングの例については、同じ呼び出しでオートスケーリングを設定するを参照してください。 The Console maps the same values in Configure autoscaling.

セルフマネージドクラスター

Protonのゼロスケーリング（minReplicaCount: 0を指定したhttpRequestsConcurrency）には、KEDA HTTPアドオンv0.12.0以降が必要です。 KEDA HTTPアドオンのバージョンに関する要件を参照してください。

Resource allocation and bundles¶

All resource configuration is runtime-side—the artifact describes container topology only and carries no CPU, memory, or GPU fields. The runtime declares resources at two layers:

Layer	フィールド	What it declares
Per-container	`runtime.containerGroups[].containers[].resourceAllocation` (a `ContainerOverride.resourceAllocation`) with `cpu`, `memory`, `gpu`.	What an individual container in the group gets at runtime. Required for multi-container groups.
Per-group	`runtime.containerGroups[].resourceBundles` (array of bundle IDs; supply exactly one) and `runtime.containerGroups[].bundleSelectionPolicy` (only `availability` is supported).	The bundle the scheduler places the group on; the applied bundle is reflected in the read-only `resolvedBundle` field. Passing more than one bundle ID returns a validation error.

resourceAllocationはリクエストされた値、resolvedBundleは実際の値です

API応答内のresourceAllocationフィールドには、ランタイム設定時にリクエストした値が含まれます。読み取り専用のresolvedBundleフィールドには、スケジューラーが実際にコンテナグループを配置したバンドルが表示され、これによりポッドが割り当てられるCPU、メモリー、GPUが決定されます。これらは異なる場合があります。たとえば、cpu: 0.5をリクエストしても、1.0コアを提供するcpu.microバンドルに解決されることがあります。実際にプロビジョニングされた内容を確認するには、常にresolvedBundleを確認してください。

ワークロードで使用可能なリソースバンドルIDを一覧表示するには、useCases=workloadを指定してGET /mlops/compute/bundles/にクエリーを実行します。

curl -s "${DATAROBOT_ENDPOINT}/mlops/compute/bundles/?useCases=workload" \
  -H "Authorization: Bearer ${DATAROBOT_API_TOKEN}"

このクエリーによって返されるバンドルのみが、runtime.containerGroups[].resourceBundlesで有効な値となります。

memory accepts either a human-readable string with one of B, KB, MB, GB (1000-based—for example, "4GB", "512MB") or a raw byte integer. Kubernetes形式のバイナリサフィックス（Mi、Gi）はサポートされておらず、検証エラーが返されます。 cpu has a minimum of 0.1. The gpu field controls GPU count only; GPU model and VRAM are determined entirely by the selected resource bundle—there is no per-container GPU type or VRAM field. Each override's name follows DNS-label syntax (lowercase letters, digits, and hyphens; must start with a letter and end with a letter or digit; up to 63 characters) and must match a container declared in the artifact group.

Set both layers on the Workload's runtime. Changing them on a running Workload via PATCH /workloads/{workload_id}/settings queues a rolling replacement using the strategies in Replace and roll out.

curl -X PATCH "${DATAROBOT_ENDPOINT}/workloads/${WORKLOAD_ID}/settings" \
  -H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{
    "runtime": {
      "containerGroups": [{
        "name": "default",
        "resourceBundles": ["cpu.medium"],
        "containers": [{
          "name": "agent",
          "resourceAllocation": {"cpu": 2, "memory": "4GB"}
        }]
      }]
    }
  }'