# Deployment usage

> Deployment usage - Monitor quota usage, tokens, and rate limits for agentic workflow and related
> serverless deployments.

This Markdown file sits beside the HTML page at the same path (with a `.md` suffix). It summarizes the topic and lists links for tools and LLM context.

Companion generated at `2026-04-24T16:03:56.232673+00:00` (UTC).

## Primary page

- [Deployment usage](https://docs.datarobot.com/en/docs/agentic-ai/agentic-monitor/agent-usage.html): Full documentation for this topic (HTML).

## Sections on this page

- [Quota usage monitoring](https://docs.datarobot.com/en/docs/agentic-ai/agentic-monitor/agent-usage.html#quota-usage-monitoring): In-page section heading.
- [Quota monitoring charts](https://docs.datarobot.com/en/docs/agentic-ai/agentic-monitor/agent-usage.html#quota-monitoring-charts): In-page section heading.
- [Request tracing table](https://docs.datarobot.com/en/docs/agentic-ai/agentic-monitor/agent-usage.html#request-tracing-table): In-page section heading.
- [Filter tracing logs](https://docs.datarobot.com/en/docs/agentic-ai/agentic-monitor/agent-usage.html#filter-tracing-logs): In-page section heading.
- [Tracing table OTel attributes](https://docs.datarobot.com/en/docs/agentic-ai/agentic-monitor/agent-usage.html#tracing-table-otel-attributes): In-page section heading.
- [Rate limited requests table](https://docs.datarobot.com/en/docs/agentic-ai/agentic-monitor/agent-usage.html#rate-limited-requests-table): In-page section heading.

## Related documentation

- [Agentic AI](https://docs.datarobot.com/en/docs/agentic-ai/index.html): Linked from this page.
- [Monitor](https://docs.datarobot.com/en/docs/agentic-ai/agentic-monitor/index.html): Linked from this page.
- [Usage](https://docs.datarobot.com/en/docs/workbench/nxt-console/nxt-monitoring/nxt-usage.html): Linked from this page.
- [Data Explorationtab](https://docs.datarobot.com/en/docs/agentic-ai/agentic-monitor/agent-data-exploration.html#explore-deployment-data-tracing): Linked from this page.
- [full deployment logs](https://docs.datarobot.com/en/docs/workbench/nxt-console/nxt-activity-log/nxt-otel-logs.html): Linked from this page.
- [Implement tracing](https://docs.datarobot.com/en/docs/agentic-ai/agentic-develop/agentic-tracing-code.html#surface-tool-names-in-the-tracing-table): Linked from this page.

## Documentation content

# Deployment usage

For text generation, VDB, and MCP custom model deployments, the Usage tab follows the standard prediction-processing views described in [Usage](https://docs.datarobot.com/en/docs/workbench/nxt-console/nxt-monitoring/nxt-usage.html). For agentic workflow (and NIM) deployments, the Quota monitoring experience below is the primary usage view.

## Quota usage monitoring

On the Monitoring > Usage tab for agentic workflow and NIM deployments, the Quota monitoring dashboard visualizes historical usage segmented by user or agent. Other serverless generative deployments use the same Console controls when quota monitoring applies.

The Quota monitoring dashboard displays three key metric tiles at the top of the page:

| Metric | Description |
| --- | --- |
| Total requests | The total number of requests made during the selected time range, along with the average requests per minute. |
| Total rate limited requests | The total number of requests that were rate limited during the selected time range, along with the average rate limited requests per minute. |
| Total token count | The total number of tokens consumed during the selected time range, along with the average tokens per minute. |
| Average concurrent requests | The average number of simultaneous API calls processed by the agent service over the defined interval, tracked as a key metric for observability and used to enforce the system's quota limit on simultaneous operations. |

Each metric displays the value for the selected time frame and the average per minute in green. Click the metric tile to review the corresponding chart below:

- Total requests
- Total rate limited requests
- Total token count
- Average concurrent requests

You can configure the Quota monitoring dashboard to focus the visualized statistics on specific entities and time frames. The following controls are available:

| Filter | Description |
| --- | --- |
| Model | Select the model version to monitor. The Current option displays data for the active model version. |
| Range (UTC) | Select the date and time range for the data displayed. Use the date pickers to set the start and end times in UTC. |
| Resolution | Select the time resolution for aggregating data: Hourly, Daily, or Weekly. |
| Entity | Filter by entity type: All, User, or Agent. |
| Refresh | Updates the dashboard with the latest data based on the current filter settings. |
| Reset | Resets all filters to their default values. |

### Quota monitoring charts

The Quota monitoring charts display an area chart showing the distribution of requests over time, rate limited requests over time, or token count over time. This chart is a stacked chart (or stacked graph), a chart stacking multiple data series on top of each other to visualize how each entity contributes to the total over time and across categories. Each chart is segmented by entity (user or agent). Each entity is represented by a different color in the chart legend.

|  | Chart element | Description |
| --- | --- | --- |
| (1) | Entity filter | Displays all entities (users or agents) included in the selected time range. Each entity is represented by a dot that matches the area in the chart. |
| (2) | Entity legend | Displays all entities (users or agents) included in the selected time range. Each entity is represented by a dot that matches the area in the chart. |
| (3) | Time range (X-axis) | Displays the time range selected in the filters, showing the date range from start to end. |
| (4) | Metric (Y-axis) | Displays the number of requests, rate limited requests, or tokens on the vertical axis. |
| (5) | Request areas | Overlapping areas show the volume of requests per entity over time. The height of each area at any point represents the number of requests for that entity at that time. This chart is a stacked chart (or stacked graph), a chart stacking multiple data series on top of each other to visualize how each entity contributes to the total over time and across categories. |
| (6) | Tracing | Click Show tracing to view tracing data for the requests. |
| (7) | Export | Click Export to download a .csv file. |

Hover over the chart to view detailed information about the number of requests for each entity at specific time points.

### Request tracing table

> [!NOTE] Premium
> Tracing is a premium feature. Contact your DataRobot representative or administrator for information on enabling this feature.

On any Quota monitoring chart, click Show tracing to view tracing data for the deployment. This tracing chart functions similarly to the tracing chart on the [Data Explorationtab](https://docs.datarobot.com/en/docs/agentic-ai/agentic-monitor/agent-data-exploration.html#explore-deployment-data-tracing).

Traces represent the path taken by a request to a model or agentic workflow. DataRobot uses the [OpenTelemetry framework for tracing](https://opentelemetry.io/docs/concepts/signals/traces/). A trace follows the entire end-to-end path of a request, from origin to resolution. Each trace contains one or more spans, starting with the root span. The root span represents the entire path of the request and contains a child span for each individual step in the process. The root (or parent) span and each child span share the same Trace ID.

> [!NOTE] Access and retention
> The tracing table is available for all custom and external model deployments. Tracing data is stored for a retention period of 30 days, after which it is automatically deleted.

In the Tracing table, you can review the following fields related to each trace:

| Column | Description |
| --- | --- |
| Timestamp | The date and time of the trace in YYYY-MM-DD HH:MM format. |
| Status | The overall status of the trace, including all spans. The Status will be Error if any dependent task fails. |
| Trace ID | A unique identifier for the trace. |
| Duration | The amount of time, in milliseconds, it took for the trace to complete. This value is equal to the duration of the root span (rounded) and includes all actions represented by child spans. |
| Spans count | The number of completed spans (actions) included in the trace. |
| Cost | If cost data is provided, the total cost of the trace. |
| Prompt | The user prompt related to the trace. |
| Completion | The agent or model response (completion) associated with the prompt for the trace. |
| Tools | The tool or tools called during the request represented by the trace. |

Click Filter to filter by Min span duration, Max span duration, Min trace cost, and Max trace cost. The unit for span filters is nanoseconds (ns), the chart displays spans in milliseconds (ms).

> [!TIP] Filter accessibility
> The Filter button is hidden when a span is expanded to detail view. To return to the chart view with the filter, click Hide details panel.

To review the [spans](https://opentelemetry.io/docs/concepts/signals/traces/#spans) contained in a trace, along with trace details, click a trace row in the Tracing table. The span colors correspond to a Span service, usually a deployment.Restricted span appears when you don’t have access to the deployment or service associated with the span. You can view spans in Chart format or List format.

> [!TIP] Span detail controls
> From either view, you can click Hide table to collapse the Timestamps table or Hide details panel to return to the expanded Tracing table view.

**Chart view:**
[https://docs.datarobot.com/en/docs/images/nxt-tracing-table-spans.png](https://docs.datarobot.com/en/docs/images/nxt-tracing-table-spans.png)

**List view:**
[https://docs.datarobot.com/en/docs/images/nxt-tracing-table-spans-list.png](https://docs.datarobot.com/en/docs/images/nxt-tracing-table-spans-list.png)

> [!NOTE] Trace details
> In list view, you can click Trace details to view the Input/Output ( Prompt and Completion) and Evaluation details about the trace associated with the current span.


For either view, click the Span service name to access the deployment or resource (if you have access). Additional information, dependent on the configuration of the generative AI model or agentic workflow, is available on the Info, Resources, Events, Input/Output, Error, and Logs tabs. The Error tab only appears when an error occurs in a trace.

**Chart view:**
[https://docs.datarobot.com/en/docs/images/nxt-tracing-table-span-tabs.png](https://docs.datarobot.com/en/docs/images/nxt-tracing-table-span-tabs.png)

**List view:**
[https://docs.datarobot.com/en/docs/images/nxt-tracing-table-span-tabs-list-view.png](https://docs.datarobot.com/en/docs/images/nxt-tracing-table-span-tabs-list-view.png)


### Filter tracing logs

From the list view, you can display OTel logs for a span. The results shown are a subset of the [full deployment logs](https://docs.datarobot.com/en/docs/workbench/nxt-console/nxt-activity-log/nxt-otel-logs.html), and are accessed as follows:

1. Open the list view and select a span underTrace details.
2. Click theLogstab.
3. ClickShow logs.

### Tracing table OTel attributes

For Cost, Prompt, Completion, and Tools, DataRobot reads specific span attributes across all spans that belong to the trace. Other columns (such as Timestamp and Duration) come from trace and span metadata rather than these attributes.

| Column | OpenTelemetry mapping |
| --- | --- |
| Cost | Sums numeric values from the datarobot.moderation.cost attribute on spans in the trace (when that attribute is present). |
| Prompt | Uses the gen_ai.prompt attribute. If more than one span includes gen_ai.prompt, the first value encountered in trace order is shown. |
| Completion | Uses the gen_ai.completion attribute. If more than one span includes gen_ai.completion, the last value encountered in trace order is shown. |
| Tools | Collects every distinct value of the tool_name attribute found on spans in the trace and lists those tool names in the column. |

Attribute keys must match exactly (including the underscore in `gen_ai`). Names such as `genai.prompt` or `GenAI.prompt` are not read for the Prompt and Completion columns.

Automatic instrumentation (including DataRobot agent templates) often sets `gen_ai.prompt`, `gen_ai.completion`, and sometimes `tool_name`. For custom or external models, frameworks differ: tool execution may not emit `tool_name` even when tools run (for example, some LangGraph callback flows). In that case Prompt and Completion can populate while Tools remains empty until `tool_name` is configured on a span that runs inside the tool—see [Implement tracing](https://docs.datarobot.com/en/docs/agentic-ai/agentic-develop/agentic-tracing-code.html#surface-tool-names-in-the-tracing-table).

### Rate limited requests table

The Rate limited requests table provides a detailed breakdown of rate limiting by entity:

|  | Table element | Description |
| --- | --- | --- |
| (1) | Entity type filter | Filter the table by entity type (user or agent). |
| (2) | Rate limited percentage filter | Filter entities by their rate limited percentage threshold (zero, low, medium, or high). |
| (3) | Search box | Search for specific entities by name or identifier. |
| (4) | Entity column | Displays the entity identifier (user email or agent name). |
| (5) | Rate limited requests column | Shows the number of rate limited requests and the percentage of total requests that were rate limited. The percentage is highlighted in red when it exceeds a threshold, or displayed in gray when it is 0%. |
| (6) | Requests column | Displays the number of requests that were rate limited due to exceeding the request quota. |
| (7) | Token count column | Displays the number of requests that were rate limited due to exceeding the token quota. |
| (8) | Concurrent requests column | Displays the number of requests that were rate limited due to exceeding the concurrent requests quota. |

The table helps identify which entities are experiencing rate limiting and to what extent, allowing you to adjust quotas or usage patterns accordingly.
