# Monitoring concepts

> Monitoring concepts - Monitoring surfaces, capabilities, and retention semantics for Workloads.

This Markdown file sits beside the HTML page at the same path (with a `.md` suffix). It summarizes the topic and lists links for tools and LLM context.

Companion generated at `2026-06-22T16:50:38.250232+00:00` (UTC).

## Primary page

- [Monitoring concepts](https://docs.datarobot.com/en/docs/workload-api/monitor-workloads/monitoring-concepts.html.md): Full documentation for this topic (Markdown sidecar).

## Sections on this page

- [Which surface answers which question](https://docs.datarobot.com/en/docs/workload-api/monitor-workloads/monitoring-concepts.html.md#which-surface): In-page section heading.
- [Monitoring capabilities](https://docs.datarobot.com/en/docs/workload-api/monitor-workloads/monitoring-concepts.html.md#monitoring-capabilities): In-page section heading.
- [Console visibility](https://docs.datarobot.com/en/docs/workload-api/monitor-workloads/monitoring-concepts.html.md#console-visibility): In-page section heading.
- [Access telemetry](https://docs.datarobot.com/en/docs/workload-api/monitor-workloads/monitoring-concepts.html.md#access-telemetry): In-page section heading.
- [Retention summary](https://docs.datarobot.com/en/docs/workload-api/monitor-workloads/monitoring-concepts.html.md#retention-summary): In-page section heading.
- [Instrument your container with OpenTelemetry](https://docs.datarobot.com/en/docs/workload-api/monitor-workloads/monitoring-concepts.html.md#instrument-with-otel): In-page section heading.
- [Reset Workload stats](https://docs.datarobot.com/en/docs/workload-api/monitor-workloads/monitoring-concepts.html.md#reset-workload-stats): In-page section heading.

## Related documentation

- [Workload API](https://docs.datarobot.com/en/docs/workload-api/index.html.md): Linked from this page.
- [Monitor telemetry and health](https://docs.datarobot.com/en/docs/workload-api/monitor-workloads/index.html.md): Linked from this page.
- [Instrument a Workload with OpenTelemetry](https://docs.datarobot.com/en/docs/workload-api/monitor-workloads/instrument-with-otel.html.md): Linked from this page.
- [REST: Monitor Workloads](https://docs.datarobot.com/en/docs/workload-api/monitor-workloads/monitoring-rest-endpoints.html.md): Linked from this page.
- [Monitor deployed Workloads](https://docs.datarobot.com/en/docs/workload-api/monitor-workloads/monitoring-ui/index.html.md): Linked from this page.
- [View deployed Workload activity](https://docs.datarobot.com/en/docs/workload-api/monitor-workloads/activity-logs-ui/index.html.md): Linked from this page.

## Documentation content

The Workload API exposes monitoring through two complementary surfaces. The Workload API itself owns request statistics, lifecycle events, per-replica status, and the audit trail—all keyed by Workload ID. Application OpenTelemetry (OTel) telemetry —traces, logs, and metrics emitted by your container—is exposed through a separate observability surface; the Workload API does not proxy it.

Draft and locked Workloads have the same monitoring capabilities. The only difference is retention period.

## Which surface answers which question

Pick the surface that answers the question. Most production debugging touches both.

| If you want to know… | Use |
| --- | --- |
| How many requests is the Workload serving? What's the error rate? p50/p95 latency? | Workload API stats—GET /workloads/{id}/stats |
| Did the Workload restart, scale, or have its artifact replaced? When? | Workload API events—GET /workloads/{id}/events |
| Why is a specific replica unhealthy? What does its log tail say? | Workload API per-replica detail—GET /workloads/{id}/protons/{proton_id}/statusDetails |
| How long did this LLM call take? Which tool was invoked? What was the prompt? | OTel traces—GET /api/v2/otel/workload/{id}/traces/ |
| How many tokens were consumed this hour? What's the cache hit rate? | OTel metrics |
| What did my application code log? | OTel logs (pushed via the OTel logging handler—see Instrument a Workload with OpenTelemetry) |

## Monitoring capabilities

The following capabilities apply to both draft and locked Workloads, with retention as the only difference.

| Capability | Draft artifact | Locked artifact | Description |
| --- | --- | --- | --- |
| Service health | Yes | Yes | Reports request counts, latency, error rate, and requests per minute. |
| Resource utilization | Yes | Yes | Reports replica count and per-container CPU and memory consumption. |
| OTel logs | Yes | Yes | Application logs that the container emits via OpenTelemetry. |
| OTel traces | Yes | Yes | Distributed traces that the container emits via OpenTelemetry. |
| OTel metrics | Yes | Yes | Application metrics that the container emits via OpenTelemetry. |
| Events | Yes | Yes | Lifecycle audit events, including create, start, stop, replace, scale, and error events. |
| Statistics | Yes | Yes | Aggregate request statistics, including total requests, error rate, response time, and related counters. |
| Retention | 24 hours | 30 days | Matches lifecycle expectations for draft vs. locked Workloads. |

## Console visibility

All Workloads (draft and locked) appear in Console. The draft filter is off by default, so all Workloads are shown.

## Access telemetry

Workload-API surfaces (request statistics, lifecycle events, replacement history, per-replica status) are documented in [REST: Monitor Workloads](https://docs.datarobot.com/en/docs/workload-api/monitor-workloads/monitoring-rest-endpoints.html.md). Application OTel telemetry is exposed through DataRobot's separate observability surface and rendered in the Console—see [Monitor deployed Workloads](https://docs.datarobot.com/en/docs/workload-api/monitor-workloads/monitoring-ui/index.html.md) and [View deployed Workload activity](https://docs.datarobot.com/en/docs/workload-api/monitor-workloads/activity-logs-ui/index.html.md).

## Retention summary

Telemetry retention depends on the artifact's lifecycle status.

| Artifact status | Telemetry retention |
| --- | --- |
| draft | 24 hours. |
| locked | 30 days. |

## Instrument your container with OpenTelemetry

Container `stdout` and `stderr` are captured automatically at every lifecycle stage (startup, running, errored)—they appear in the Workload's Activity log > Logs tab without any SDK setup.

> [!NOTE] Logs require OTLP push—stdout scraping does not apply
> The conventional OTel pattern for logs is a collector DaemonSet that scrapes pod stdout from the host filesystem.DataRobot's collector does not scrape container stdout.Plain `print()` calls and unconfigured `logging` appear in the Activity log > Logs tab but do not reach the OTel observability surface. To get logs into the OTel surface as structured records, install the OTel logging handler so the application pushes log records via OTLP HTTP—the same transport used by traces and metrics.

Explicit OTel instrumentation is needed when you want traces, application metrics, or structured logs on the OTel observability surface. When your application emits OTel signals, the platform handles transport: the OTel exporters read `OTEL_EXPORTER_OTLP_ENDPOINT` from the environment, which the platform sets when the container starts.

For copy-ready Python SDK snippets that configure tracing, logging, and metrics against the platform's OTLP endpoint, see [Instrument a Workload with OpenTelemetry](https://docs.datarobot.com/en/docs/workload-api/monitor-workloads/instrument-with-otel.html.md). The same snippets are surfaced in the Console on the Workload's Endpoints > Instrumentation sub-tab.

## Reset Workload stats

`POST /workloads/{workload_id}/promote` resets statistics automatically so production starts from a clean baseline. To reset stats manually—for a specific time window or for a specific proton—use `DELETE /workloads/{workload_id}/stats`:

```
# Reset all stats for the current proton
curl -X DELETE "${DATAROBOT_ENDPOINT}/workloads/${WORKLOAD_ID}/stats" \
  -H "Authorization: Bearer ${DATAROBOT_API_TOKEN}"

# Reset stats for a time window
curl -X DELETE "${DATAROBOT_ENDPOINT}/workloads/${WORKLOAD_ID}/stats?startTime=2026-04-01T00:00:00Z&endTime=2026-04-15T00:00:00Z" \
  -H "Authorization: Bearer ${DATAROBOT_API_TOKEN}"
```

`protonId`, `startTime`, and `endTime` are optional query parameters; omit them to clear stats for the current proton across the full retention window.
