# Configure Workload capacity

> Configure Workload capacity - Manage and enforce throughput, reserved capacity, and rate limits for
> a deployed Workload.

This Markdown file sits beside the HTML page at the same path (with a `.md` suffix). It summarizes the topic and lists links for tools and LLM context.

Companion generated at `2026-06-22T16:50:38.252332+00:00` (UTC).

## Primary page

- [Configure Workload capacity](https://docs.datarobot.com/en/docs/workload-api/operate-workloads/capacity-ui.html.md): Full documentation for this topic (Markdown sidecar).

## Sections on this page

- [Capacity configuration](https://docs.datarobot.com/en/docs/workload-api/operate-workloads/capacity-ui.html.md#capacity-configuration): In-page section heading.
- [Reserved capacity for entities](https://docs.datarobot.com/en/docs/workload-api/operate-workloads/capacity-ui.html.md#reserved-capacity-for-entities): In-page section heading.
- [Set rate limits](https://docs.datarobot.com/en/docs/workload-api/operate-workloads/capacity-ui.html.md#set-rate-limits): In-page section heading.
- [Per-entity exceptions](https://docs.datarobot.com/en/docs/workload-api/operate-workloads/capacity-ui.html.md#per-entity-exceptions): In-page section heading.

## Related documentation

- [Workload API](https://docs.datarobot.com/en/docs/workload-api/index.html.md): Linked from this page.
- [Operate running Workloads](https://docs.datarobot.com/en/docs/workload-api/operate-workloads/index.html.md): Linked from this page.

## Documentation content

The Capacity tab provides controls for managing and enforcing usage on Workloads. Workload owners can protect shared Workload infrastructure and guarantee minimum throughput for critical agents and users when multiple consumers share one Workload.

Set capacity and the utilization threshold for the Workload as a whole; those are global to the Workload.Quotas —the default rules and optional per-entity limits below—define what happens when utilization reaches that threshold:

- Default throughput configuration: Configure a Workload's capacity, utilization threshold, and baseline usage rules that apply to any entity that can access the Workload. Entities without their own overrides use these defaults.
- Entity rate limits: Rate limits are optional settings that provide a higher priority for specific Workloads, users, or groups. Use reserved capacity to guarantee a share of Workload capacity for each entity, or per-entity rate limits to control the total Workload throughput.

> [!TIP] Learn more
> For decision guidance, load test examples, and sizing recommendations—including when rate limits alone are sufficient—see [Rate limiting vs. quota reservations: a practical guide for platform teams](https://www.datarobot.com/blog/rate-limiting-quota-reservations/) on the DataRobot blog.

> [!NOTE] Rate limit application
> Rate limit changes may take a few minutes to apply.

## Capacity configuration

Capacity is the throughput you expect a Workload to sustain expressed as units per time window (e.g., requests per minute or tokens per minute). It defines the baseline “pipe size” used for Workload-wide quota enforcement and for sizing reservations.

When choosing capacity values, common inputs include:

- Load tests that measure how the Workload behaves under target traffic.
- Model or hosting limits imposed by the model, runtime, or infrastructure.
- Latency budgets you need to meet at expected concurrency and payload sizes.
- Operational experience from comparable Workloads or historical usage.

Throughput configuration governs how a Workload applies limits:

- It sets capacity as the overall ceiling for requests or tokens.
- It sets the utilization threshold as how full that capacity can get before the Workload enforces its default quota behavior.
- Below the threshold , it relaxes enforcement so the gateway can allow short bursts and treat traffic more permissively.
- Above the threshold , it applies the Workload's quota rules dynamically as utilization rises.
- With reserved capacity , it guarantees entitled entities a share when consumers compete for the Workload.
- Under sustained overload , it can reject excess traffic to protect the model and shared infrastructure.

To configure capacity:

1. ClickSet throughputto configure the capacity settings for a Workload.
2. Choose a metric to track (requests per minute or tokens per minute).
3. Define the capacity of requests or tokens per minute by providing a value. These values are not inferred automatically by DataRobot, so plan these values accordingly based on Workload usage.
4. Set the utilization threshold as a percentage of the capacity. This leaves room for bursts of usage before enforcement tightens.
5. After configuring each capacity setting, clickSave.

### Reserved capacity for entities

Reserved capacity is configured per entity (agent Workload, user, or group). It defines how much of the Workload’s capacity you guarantee to the selected entity when utilization is above the utilization threshold and consumers compete for the Workload.

- Floor, not a ceiling : A reservation guarantees a minimum share; an entity can often use more than its reserved portion when spare capacity exists.
- Leave unreserved headroom : Keep part of Workload capacity unreserved so ad-hoc traffic, new consumers, and overflow still have room.

To configure reserved capacity, you must already have the Capacity settings configured.

1. Once capacity settings are configured, clickAdd entity.
2. Select an entity from theWorkloads,Users, orGroupslist.
3. Set the percentage of the capacity to reserve for the selected entity.
4. Perform this process for one or more entities (depending on your organization's needs) and clickSave.

## Set rate limits

On the Capacity page, manage per-entity settings in the Rate limits section:

1. ClickAdd policyto modify the rate limit settings for the Workload.
2. ClickAdd metricto begin configuration. Adding metricsA new policy row appears each time you clickAdd metric, until a row is present for every metric available.
3. In the new row, select aMetric, enter aLimit, and choose a timeInterval. MetricDescriptionRequestsControls the number of requests a Workload can handle in the selected time window, defined by the resolution setting.TokensControls how many tokens a Workload can process in the selected time window, defined by the resolution setting. This limit includes all types of tokens (input and output).Input sequence lengthControls the number of tokens in the prompt or query sent to the model.Concurrent requestsControls the number of requests a Workload can process concurrently.
4. Perform this process for one or more metrics (depending on your organization's needs) and clickSave.

### Per-entity exceptions

You can make exceptions to rate limits for specific entities.

To configure per-entity exceptions:

1. ClickAdd entity.
2. Select an entity from theWorkloads,Users, orGroupslist.
3. ClickAdd metricto begin configuration.
4. In the new row, select aMetric, enter aLimit, and choose a timeInterval. The selected resolution applies to each metric-based quota defined here. For more information, seeSet rate limits.
5. Perform this process for one or more metrics (depending on the entity's required configuration) and clickSave.
