NextGen UI documentation > Console > Deployment settings > Configure quota settings

Configure quota settings¶

The Settings > Quota tab provides controls for managing and enforcing usage limits on DataRobot and external deployments. This allows deployment owners to control access to shared deployment infrastructure, ensure fair resource allocation across different agents, and prevent a single agent from monopolizing the resources. Two different quota configuration methods are available:

Default quota configuration: Baseline usage limits that apply to all agents (referred to as "entities") that have access to the deployment. If an agent does not have a specific limit set, these default rules will apply to them.
Entity rate limits (optional): Individual usage limits that are a higher priority than the default limit configuration. Deployment owners can override the default limits for specific agents by creating individual rate limits.

Quota policy application

Quota policy changes may take up to 5 minutes to apply. This delay occurs because the gateway updates its quota cache every 5 minutes.

Set the default quota configuration¶

On the Quota settings page, manage the default quota limits in the Default quota configuration section:

Click Edit to modify the quota settings for the deployment.
Set a time Resolution for the time-based metrics: Minute, Hour, or Day. The selected resolution applies to each metric-based quota defined here.
If a default quota configuration isn't set, click Add metric to begin configuration.

Adding metrics

A new quota row appears each time you click Add metric, until a row is present for every metric available. To remove a row, click the delete icon .

In the new quota row, select a Metric and enter a Limit. The quota settings allow defining limits on three key metrics:

Metric	Description
Requests	Controls the number of prediction requests a deployed model can handle in the selected time window, defined by the resolution setting. The default is 300 requests per minute.
Tokens	Controls how many tokens a deployed model can process in the selected time window, defined by the resolution setting. This limit includes all types of tokens (input and output).
Input sequence length	Controls the number of tokens in the prompt or query sent to the model.

Perform this process for one or more metrics (depending on your organization's needs) and click Save.

Set entity rate limits¶

On the Quota settings page, manage the entity limits in the Entity rate limits (optional) section:

Click Edit to modify the entity-based quota settings for the deployment.
Select an entity from the Deployments, Users, or Groups list.
Set a time Resolution for the time-based metrics: Minute, Hour, or Day. The selected resolution applies to each metric-based quota defined here.
Click Add metric to begin configuration.

Adding metrics

A new quota row appears each time you click Add metric, until a row is present for every metric available.

In the new quota row, select a Metric and enter a Limit. The quota settings allow defining limits on three key metrics:

Metric	Description
Requests	Controls the number of prediction requests a deployed model can handle in the selected time window, defined by the resolution setting. The default is 300 requests per minute.
Tokens	Controls how many tokens a deployed model can process in the selected time window, defined by the resolution setting. This limit includes all types of tokens (input and output).
Input sequence length	Controls the number of tokens in the prompt or query sent to the model.

Perform this process for one or more metrics (depending on your organization's needs) and click Save.

Agent API keys¶

To differentiate between various applications and agents using a deployment, agent API keys are generated automatically when a new Agentic workflow deployment is created. These keys appear in the API keys and tools section of the user settings, on the Agent API keys tab. The Agent API keys tab displays a table with the key's name, the key, the connected deployment, the creation date, and the last used date. These keys can be edited (renamed) or deleted.

Important

When a key is deleted, all agents using it will be disabled.