Workload API > Monitor telemetry and health > Monitoring concepts

Monitoring concepts¶

The Workload API exposes monitoring through two complementary surfaces. The Workload API itself owns request statistics, lifecycle events, per-replica status, and the audit trail—all keyed by Workload ID. Application OpenTelemetry (OTel) telemetry—traces, logs, and metrics emitted by your container—is exposed through a separate observability surface; the Workload API does not proxy it.

Draft and locked Workloads have the same monitoring capabilities. OTelの保持期間は、組織レベルで設定されます。

Which surface answers which question¶

Pick the surface that answers the question. Most production debugging touches both.

If you want to know…	使用
How many requests is the Workload serving? What's the error rate? p50/p95 latency?	Workload API stats—`GET /workloads/{id}/stats`
Did the Workload restart, scale, or have its artifact replaced? When?	Workload API events—`GET /workloads/{id}/events`
Why is a specific replica unhealthy? What does its log tail say?	Workload API per-replica detail—`GET /workloads/{id}/protons/{proton_id}/statusDetails`
How long did this LLM call take? Which tool was invoked? What was the prompt?	OTel traces—`GET /api/v2/otel/workload/{id}/traces/`
How many tokens were consumed this hour? What's the cache hit rate?	OTelメトリクス
What did my application code log?	OTel logs (pushed via the OTel logging handler—see Instrument a Workload with OpenTelemetry)

Monitoring capabilities¶

The following capabilities apply to both draft and locked Workloads, with retention as the only difference.

機能	Draft artifact	Locked artifact	説明
サービスの正常性	はい	はい	Reports request counts, latency, error rate, and requests per minute.
Resource utilization	はい	はい	Reports replica count and per-container CPU and memory consumption.
OTel logs	はい	はい	Application logs that the container emits via OpenTelemetry.
OTel traces	はい	はい	Distributed traces that the container emits via OpenTelemetry.
OTelメトリクス	はい	はい	Application metrics that the container emits via OpenTelemetry.
イベント	はい	はい	Lifecycle audit events, including create, start, stop, replace, scale, and error events.
Statistics	はい	はい	Aggregate request statistics, including total requests, error rate, response time, and related counters.
Retention	組織で設定	組織で設定	保持期間の概要を参照してください。

Console visibility¶

All Workloads (draft and locked) appear in Console. The draft filter is off by default, so all Workloads are shown.

Access telemetry¶

Workload-API surfaces (request statistics, lifecycle events, replacement history, per-replica status) are documented in REST: Monitor Workloads. Application OTel telemetry is exposed through DataRobot's separate observability surface and rendered in the Console—see Monitor deployed Workloads and View deployed Workload activity.

Retention summary¶

OTelのトレース、ログ、およびメトリクスの保持期間は、組織レベルで設定されます。デフォルトの保持期間は、トライアル版利用の組織では14日、有料版利用の組織では30日です。組織の管理者は、ライセンスで許可されている場合、シグナルタイプごとに保持期間を60日、90日、180日、または360日に延長することができます。

有効な削除期間とは、設定された期間に、最大1つのロールオーバー間隔を加えた期間のことです（各ティアの範囲については、OTelの保持期間の設定を参照してください）。保持期間は、組織内のすべてのユーザーに一律に適用されます。

Workload APIの統計情報、ライフサイクルイベント、およびレプリカごとのステータスは、OTelの保持期間ポリシーの対象外です。

Instrument your container with OpenTelemetry¶

Container stdout and stderr are captured automatically at every lifecycle stage (startup, running, errored)—they appear in the Workload's Activity log > Logs tab without any SDK setup.

Logs require OTLP push—stdout scraping does not apply

The conventional OTel pattern for logs is a collector DaemonSet that scrapes pod stdout from the host filesystem. DataRobot's collector does not scrape container stdout. Plain print() calls and unconfigured logging appear in the Activity log > Logs tab but do not reach the OTel observability surface. To get logs into the OTel surface as structured records, install the OTel logging handler so the application pushes log records via OTLP HTTP—the same transport used by traces and metrics.

Explicit OTel instrumentation is needed when you want traces, application metrics, or structured logs on the OTel observability surface. When your application emits OTel signals, the platform handles transport: the OTel exporters read OTEL_EXPORTER_OTLP_ENDPOINT from the environment, which the platform sets when the container starts.

For copy-ready Python SDK snippets that configure tracing, logging, and metrics against the platform's OTLP endpoint, see Instrument a Workload with OpenTelemetry. The same snippets are surfaced in the Console on the Workload's Endpoints > Instrumentation sub-tab.

Reset Workload stats¶

POST /workloads/{workload_id}/promote resets statistics automatically so production starts from a clean baseline. To reset stats manually—for a specific time window or for a specific proton—use DELETE /workloads/{workload_id}/stats:

# Reset all stats for the current proton
curl -X DELETE "${DATAROBOT_ENDPOINT}/workloads/${WORKLOAD_ID}/stats" \
  -H "Authorization: Bearer ${DATAROBOT_API_TOKEN}"

# Reset stats for a time window
curl -X DELETE "${DATAROBOT_ENDPOINT}/workloads/${WORKLOAD_ID}/stats?startTime=2026-04-01T00:00:00Z&endTime=2026-04-15T00:00:00Z" \
  -H "Authorization: Bearer ${DATAROBOT_API_TOKEN}"

protonId, startTime, and endTime are optional query parameters; omit them to clear stats for the current proton across the full retention window.