# Configure quota settings

> Configure quota settings - For deployed models, you can access the Quota tab to edit usage limit
> settings.

This Markdown file sits beside the HTML page at the same path (with a `.md` suffix). It summarizes the topic and lists links for tools and LLM context.

Companion generated at `2026-04-24T16:03:56.725447+00:00` (UTC).

## Primary page

- [Configure quota settings](https://docs.datarobot.com/en/docs/workbench/nxt-console/nxt-settings/nxt-quota-settings.html): Full documentation for this topic (HTML).

## Sections on this page

- [Set the default quota configuration](https://docs.datarobot.com/en/docs/workbench/nxt-console/nxt-settings/nxt-quota-settings.html#set-the-default-quota-configuration): In-page section heading.
- [Set entity rate limits](https://docs.datarobot.com/en/docs/workbench/nxt-console/nxt-settings/nxt-quota-settings.html#set-entity-rate-limits): In-page section heading.
- [Agent API keys](https://docs.datarobot.com/en/docs/workbench/nxt-console/nxt-settings/nxt-quota-settings.html#agent-api-keys): In-page section heading.

## Related documentation

- [NextGen UI documentation](https://docs.datarobot.com/en/docs/workbench/index.html): Linked from this page.
- [Console](https://docs.datarobot.com/en/docs/workbench/nxt-console/index.html): Linked from this page.
- [Deployment settings](https://docs.datarobot.com/en/docs/workbench/nxt-console/nxt-settings/index.html): Linked from this page.
- [API keys and toolssection of the user settings](https://docs.datarobot.com/en/docs/platform/acct-settings/api-key-mgmt.html): Linked from this page.

## Documentation content

# Configure quota settings

The Settings > Quota tab provides controls for managing and enforcing usage limits on DataRobot and external deployments. This allows deployment owners to control access to shared deployment infrastructure, ensure fair resource allocation across different agents, and prevent a single agent from monopolizing the resources. Two different quota configuration methods are available:

- Default quota configuration: Baseline usage limits that apply to all agents (referred to as "entities") that have access to the deployment. If an agent does not have a specific limit set, these default rules will apply to them.
- Entity rate limits (optional): Individual usage limits that are a higher priority than the default limit configuration. Deployment owners can override the default limits for specific agents by creating individual rate limits.

> [!NOTE] Quota policy application
> Quota policy changes may take up to 5 minutes to apply. This delay occurs because the gateway updates its quota cache every 5 minutes.

## Set the default quota configuration

On the Quota settings page, manage the default quota limits in the Default quota configuration section:

1. ClickEditto modify the quota settings for the deployment.
2. Set a timeResolutionfor the time-based metrics:Minute,Hour, orDay. The selected resolution applies to each metric-based quota defined here.
3. If a default quota configuration isn't set, clickAdd metricto begin configuration. Adding metricsA new quota row appears each time you clickAdd metric, until a row is present for every metric available. To remove a row, click the delete icon.
4. In the new quota row, select aMetricand enter aLimit. The quota settings allow defining limits on three key metrics: MetricDescriptionRequestsControls the number of prediction requests a deployed model can handle in the selected time window, defined by the resolution setting. The default is 300 requests per minute.TokensControls how many tokens a deployed model can process in the selected time window, defined by the resolution setting. This limit includes all types of tokens (input and output).Input sequence lengthControls the number of tokens in the prompt or query sent to the model. | Concurrent requests | Controls the number of prediction requests a deployed model can process at the same time. The default is 50 concurrent requests. |
| Output sequence length | Controls the number of tokens generated by the model as a response. |
5. Perform this process for one or more metrics (depending on your organization's needs) and clickSave.

## Set entity rate limits

On the Quota settings page, manage the entity limits in the Entity rate limits (optional) section:

1. ClickEditto modify the entity-based quota settings for the deployment.
2. Select an entity from theDeployments,Users, orGroupslist.
3. Set a timeResolutionfor the time-based metrics:Minute,Hour, orDay. The selected resolution applies to each metric-based quota defined here.
4. ClickAdd metricto begin configuration. Adding metricsA new quota row appears each time you clickAdd metric, until a row is present for every metric available.
5. In the new quota row, select aMetricand enter aLimit. The quota settings allow defining limits on three key metrics: MetricDescriptionRequestsControls the number of prediction requests a deployed model can handle in the selected time window, defined by the resolution setting. The default is 300 requests per minute.TokensControls how many tokens a deployed model can process in the selected time window, defined by the resolution setting. This limit includes all types of tokens (input and output).Input sequence lengthControls the number of tokens in the prompt or query sent to the model. | Concurrent requests | Controls the number of prediction requests a deployed model can process at the same time. The default is 50 concurrent requests. |
| Output sequence length | Controls the number of tokens generated by the model as a response. |
6. Perform this process for one or more metrics (depending on your organization's needs) and clickSave.

## Agent API keys

To differentiate between various applications and agents using a deployment, agent API keys are generated automatically when a new Agentic workflow deployment is created. These keys appear in the [API keys and toolssection of the user settings](https://docs.datarobot.com/en/docs/platform/acct-settings/api-key-mgmt.html), on the Agent API keys tab. The Agent API keys tab displays a table with the key's name, the key, the connected deployment, the creation date, and the last used date. These keys can be edited (renamed) or deleted.

> [!NOTE] Important
> When a key is deleted, all agents using it will be disabled.
