# Instrument a Workload with OpenTelemetry (Python)

> Instrument a Workload with OpenTelemetry (Python) - Emit traces, metrics, and logs from a Python
> container into DataRobot's observability surface.

This Markdown file sits beside the HTML page at the same path (with a `.md` suffix). It summarizes the topic and lists links for tools and LLM context.

Companion generated at `2026-06-22T16:50:38.249774+00:00` (UTC).

## Primary page

- [Instrument a Workload with OpenTelemetry (Python)](https://docs.datarobot.com/en/docs/workload-api/monitor-workloads/instrument-with-otel.html.md): Full documentation for this topic (Markdown sidecar).

## Sections on this page

- [Prerequisites](https://docs.datarobot.com/en/docs/workload-api/monitor-workloads/instrument-with-otel.html.md#prerequisites): In-page section heading.
- [How the three signals reach the platform](https://docs.datarobot.com/en/docs/workload-api/monitor-workloads/instrument-with-otel.html.md#how-signals-reach-platform): In-page section heading.
- [Step 1: Install dependencies](https://docs.datarobot.com/en/docs/workload-api/monitor-workloads/instrument-with-otel.html.md#install-dependencies): In-page section heading.
- [Step 2: Instrument traces](https://docs.datarobot.com/en/docs/workload-api/monitor-workloads/instrument-with-otel.html.md#instrument-traces): In-page section heading.
- [Agent frameworks emit traces for you](https://docs.datarobot.com/en/docs/workload-api/monitor-workloads/instrument-with-otel.html.md#agent-framework-tracing): In-page section heading.
- [Step 3: Instrument metrics](https://docs.datarobot.com/en/docs/workload-api/monitor-workloads/instrument-with-otel.html.md#instrument-metrics): In-page section heading.
- [Step 4: Instrument logs](https://docs.datarobot.com/en/docs/workload-api/monitor-workloads/instrument-with-otel.html.md#instrument-logs): In-page section heading.
- [Step 5: Put it together in a small handler](https://docs.datarobot.com/en/docs/workload-api/monitor-workloads/instrument-with-otel.html.md#put-it-together): In-page section heading.
- [Step 6: Verify data is flowing](https://docs.datarobot.com/en/docs/workload-api/monitor-workloads/instrument-with-otel.html.md#verify-data): In-page section heading.
- [Troubleshooting](https://docs.datarobot.com/en/docs/workload-api/monitor-workloads/instrument-with-otel.html.md#troubleshooting): In-page section heading.
- [Declarative configuration (Pulumi)](https://docs.datarobot.com/en/docs/workload-api/monitor-workloads/instrument-with-otel.html.md#declaratively): In-page section heading.

## Related documentation

- [Workload API](https://docs.datarobot.com/en/docs/workload-api/index.html.md): Linked from this page.
- [Monitor telemetry and health](https://docs.datarobot.com/en/docs/workload-api/monitor-workloads/index.html.md): Linked from this page.
- [Monitoring concepts](https://docs.datarobot.com/en/docs/workload-api/monitor-workloads/monitoring-concepts.html.md): Linked from this page.
- [Application OpenTelemetry telemetry](https://docs.datarobot.com/en/docs/workload-api/monitor-workloads/otel-telemetry.html.md): Linked from this page.
- [Tutorial: Deploy a production-ready container](https://docs.datarobot.com/en/docs/workload-api/create-workloads/tutorial-production-ready-container.html.md): Linked from this page.
- [Tutorial: Replace the artifact behind a running Workload](https://docs.datarobot.com/en/docs/workload-api/update-workloads/tutorial-replace-artifacts.html.md): Linked from this page.
- [Manage Workloads with Pulumi](https://docs.datarobot.com/en/docs/workload-api/workload-pulumi/index.html.md): Linked from this page.

## Documentation content

The Workload API automatically captures request-level metrics—count, error rate, response time, concurrency—from the HTTP traffic your container serves. To see what your code is doing inside each request, instrument the container with OpenTelemetry and emit traces, metrics, and logs.

This guide walks through wiring up all three signals from a Python Workload, then verifying the data flows into the DataRobot observability surface. By the end you'll have a container that emits structured traces, custom metrics, and OTLP-shipped logs that show up in the same place as the platform's built-in Workload stats.

For the reference docs on what the observability surface exposes, see [Monitoring concepts](https://docs.datarobot.com/en/docs/workload-api/monitor-workloads/monitoring-concepts.html.md) and [Application OpenTelemetry telemetry](https://docs.datarobot.com/en/docs/workload-api/monitor-workloads/otel-telemetry.html.md).

## Prerequisites

You need the following before starting.

| Prerequisite | Notes |
| --- | --- |
| A Python-based Workload | A locked artifact is recommended. See Tutorial: Deploy a production-ready container. |
| Ability to rebuild and redeploy | Either rebuild the container image, or roll out a new artifact version via Tutorial: Replace the artifact behind a running Workload. |
| API endpoint and token in the shell | Set the environment variables shown next. |

```
export DATAROBOT_ENDPOINT=https://app.datarobot.com/api/v2
export DATAROBOT_API_TOKEN=<your-api-token>
export WORKLOAD_ID=<your-workload-id>
```

## How the three signals reach the platform

All three signals—traces, metrics, logs—ship over OTLP HTTP to an endpoint the platform injects into the container as `OTEL_EXPORTER_OTLP_ENDPOINT`. The application does not hardcode anything; the OTel exporters pick the environment variable up automatically.

> [!NOTE] Logs require OTLP push—stdout scraping does not apply
> The conventional OTel pattern for logs is: the application writes to stdout, and an OTel collector DaemonSet on the cluster scrapes the pod log files.DataRobot's collector does not scrape container stdout for the OTel observability surface.Plain `print()` calls and unconfigured stdlib `logging` still appear in the Workload's Activity log > Logs tab (via automatic stdout capture), but they do not reach the OTel observability stack. To get logs into the observability surface as structured records, install the OTel logging handler as shown later in this guide—the application pushes log records via OTLP HTTP, the same transport as traces and metrics.

## Step 1: Install dependencies

Add the OTel SDK and the OTLP HTTP exporter to the container's `requirements.txt` (or equivalent):

```
pip install opentelemetry-sdk opentelemetry-exporter-otlp
```

The `opentelemetry-exporter-otlp` meta-package bundles all three exporters (traces, metrics, logs) for both HTTP and gRPC. The snippets in this guide use the HTTP variants because they're what DataRobot's collector accepts.

## Step 2: Instrument traces

Set up a global tracer provider, then wrap the units of work in your code with span calls. Spans become the structured representation of "what this request did"—they nest, carry attributes, and record exceptions.

```
"""
Required: opentelemetry-sdk, opentelemetry-exporter-otlp
"""
from opentelemetry import trace
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter

# Resource describing the service. Pick a stable service.namespace
# so spans can be filtered in the observability UI.
resource = Resource.create({"service.namespace": "my-service"})

def configure_tracer() -> TracerProvider:
    trace_exporter = OTLPSpanExporter()  # picks up OTEL_EXPORTER_OTLP_ENDPOINT
    trace_provider = TracerProvider(resource=resource)
    trace_provider.add_span_processor(BatchSpanProcessor(trace_exporter))
    trace.set_tracer_provider(trace_provider)
    return trace_provider

# Initialize once, at app startup.
trace_provider = configure_tracer()
tracer = trace.get_tracer(__name__)
```

Then use the tracer in request-handling code:

```
with tracer.start_as_current_span("Generate Text") as span:
    span.set_attribute("foo", "bar")
    span.add_event(name="ack", attributes={"john": "doe"})

    # Inner span: spans nest naturally inside their parent.
    with tracer.start_as_current_span("Fake an Error") as inner:
        try:
            raise Exception("This is a fake error for demonstration purposes")
        except Exception as e:
            inner.record_exception(e)
            inner.set_status(trace.StatusCode.ERROR, str(e))
```

What to span: anything you'd want to time or attribute later. Typical units are model calls, vector-store lookups, downstream HTTP calls, and tool invocations. Set attributes for any value you'd want to filter or group by later (model name, user tier, retrieval strategy). Use `record_exception` plus `set_status(StatusCode.ERROR, ...)` in except blocks so failures show up correctly in the trace UI.

### Agent frameworks emit traces for you

Several popular agent frameworks are OTel-native—once a `TracerProvider` is configured as shown in Step 2, the framework auto-emits spans for every agent run, tool call, model request, and retrieval step. Custom spans are only needed for logic outside the framework (a data-prep step, a downstream non-LLM HTTP call).

| Framework | OTel support |
| --- | --- |
| Google ADK (Python ≥ 1.17, ADK Go ≥ 1.0) | Native. Plug in a TracerProvider and ADK emits spans for every agent run, tool call, and model request. |
| CrewAI | Emits native OTel-compliant spans. |
| LangChain / LangGraph | Native OTel support, plus auto-instrumentation through OpenInference and OpenLLMetry for older versions. |
| LlamaIndex | OTel through the OpenInference auto-instrumentation package. |
| AutoGen / AG2 | Emits OTel-compliant spans. |
| Semantic Kernel | Provides framework-specific OTel instrumentation. |

The spans these frameworks emit follow the OpenTelemetry GenAI semantic conventions —a standard `gen_ai.*` attribute namespace (model name, token counts, finish reason, tool inputs and outputs) so traces from different frameworks query uniformly. The conventions are still marked experimental but are supported by most observability vendors.[OpenInference](https://github.com/Arize-ai/openinference) auto-instrumentations emit both the OpenInference attributes and the OTel GenAI attributes for forward compatibility.

> [!NOTE] Auto-instrumentation covers traces only
> Metrics and logs still need explicit wiring. Continue to the next two sections for custom counters, histograms, and application logs.

## Step 3: Instrument metrics

Metrics are best for counts and rates that don't fit cleanly inside a single request span—token consumption, cache hits, queue depth, model-selection distribution. Set up a meter provider with a periodic reader, then create counters, gauges, or histograms as needed.

```
"""
Required: opentelemetry-sdk, opentelemetry-exporter-otlp
"""
from opentelemetry import metrics
from opentelemetry.exporter.otlp.proto.http.metric_exporter import OTLPMetricExporter
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader
from opentelemetry.sdk.resources import Resource

resource = Resource.create({"service.namespace": "my-service"})

def configure_metrics(resource: Resource) -> MeterProvider:
    metric_exporter = OTLPMetricExporter()  # picks up OTEL_EXPORTER_OTLP_ENDPOINT
    reader = PeriodicExportingMetricReader(metric_exporter, export_interval_millis=5000)
    meter_provider = MeterProvider(resource=resource, metric_readers=[reader])
    metrics.set_meter_provider(meter_provider)
    return meter_provider

metric_provider = configure_metrics(resource)
meter = metric_provider.get_meter(__name__)

# Define instruments once, at startup.
my_counter = meter.create_counter(
    name="my.counter",
    description="Example custom counter.",
    unit="1",
)
```

Then record values from request-handling code:

```
my_counter.add(1, {"environment": "demo"})
```

The `PeriodicExportingMetricReader` ships batched metrics on its `export_interval_millis` cadence—5 seconds in the preceding example. Pick higher intervals (15–60 seconds) for high-cardinality Workloads to avoid overwhelming the collector.

The OTel SDK provides three instrument types to reach for:

| Type | OTel method | Example use cases |
| --- | --- | --- |
| Counter | create_counter | Monotonically increasing values. Use for request counts, tokens consumed, retry attempts. |
| Histogram | create_histogram | Value distributions. Use for latencies, token-per-request, payload sizes. |
| Observable gauge | create_observable_gauge | Sampled values. Use for queue depth, cache size, connection count. |

## Step 4: Instrument logs

This step is non-optional for logs to reach the observability surface. The OTel logging handler bridges Python's stdlib `logging` module into OTLP HTTP exports, so every `logger.info(...)` in code becomes a log record the platform can ingest.

```
"""
Required: opentelemetry-sdk, opentelemetry-exporter-otlp
"""
import logging
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk._logs import LoggerProvider, LoggingHandler
from opentelemetry.sdk._logs.export import BatchLogRecordProcessor
from opentelemetry.exporter.otlp.proto.http._log_exporter import OTLPLogExporter
from opentelemetry._logs import set_logger_provider

resource = Resource.create({"service.namespace": "my-service"})

def configure_logging() -> LoggerProvider:
    log_exporter = OTLPLogExporter()  # picks up OTEL_EXPORTER_OTLP_ENDPOINT
    log_provider = LoggerProvider(resource=resource)
    log_provider.add_log_record_processor(BatchLogRecordProcessor(log_exporter))
    set_logger_provider(log_provider)

    # Bridge Python's stdlib logging into OpenTelemetry.
    root_logger = logging.getLogger()
    otel_handler = LoggingHandler(level=logging.NOTSET, logger_provider=log_provider)
    root_logger.addHandler(otel_handler)
    root_logger.setLevel(logging.DEBUG)  # capture every level; filter downstream
    return log_provider

log_provider = configure_logging()
logger = logging.getLogger(__name__)
```

From here on, normal stdlib logging calls are automatically OTLP-exported:

```
logger.info("Logging info.", extra={"extra": "INFO details"})
logger.warning("Logging warning.", extra={"extra": "WARNING details"})
logger.error("Logging error.", extra={"extra": "ERROR details"})
logger.debug("Logging debug.", extra={"extra": "DEBUG details"})
```

The `extra=` dict attaches as structured attributes on the log record, which means the observability UI can filter on them without parsing message strings. Use `extra` for everything that should be queryable; reserve the message for the human-readable summary.

> [!TIP] One initializer for the whole app
> Configure the tracer, meter, and logger providers once at app startup—ideally in a single `observability.py` module that the entrypoint imports before anything else. Re-initializing on every request leaks background threads and drops exports.

## Step 5: Put it together in a small handler

The following minimal FastAPI app configures all three signals and emits something on every request:

```
import logging
from fastapi import FastAPI
from opentelemetry import trace, metrics
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader
from opentelemetry.sdk._logs import LoggerProvider, LoggingHandler
from opentelemetry.sdk._logs.export import BatchLogRecordProcessor
from opentelemetry._logs import set_logger_provider
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.exporter.otlp.proto.http.metric_exporter import OTLPMetricExporter
from opentelemetry.exporter.otlp.proto.http._log_exporter import OTLPLogExporter

resource = Resource.create({"service.namespace": "my-agent"})

# Traces
tp = TracerProvider(resource=resource)
tp.add_span_processor(BatchSpanProcessor(OTLPSpanExporter()))
trace.set_tracer_provider(tp)
tracer = trace.get_tracer(__name__)

# Metrics
mp = MeterProvider(
    resource=resource,
    metric_readers=[PeriodicExportingMetricReader(OTLPMetricExporter(), export_interval_millis=5000)],
)
metrics.set_meter_provider(mp)
meter = mp.get_meter(__name__)
request_counter = meter.create_counter("requests.handled", unit="1")

# Logs
lp = LoggerProvider(resource=resource)
lp.add_log_record_processor(BatchLogRecordProcessor(OTLPLogExporter()))
set_logger_provider(lp)
logging.getLogger().addHandler(LoggingHandler(level=logging.NOTSET, logger_provider=lp))
logging.getLogger().setLevel(logging.INFO)
logger = logging.getLogger(__name__)

app = FastAPI()

@app.get("/healthz")
def healthz():
    return {"ok": True}

@app.post("/generate")
def generate(prompt: str):
    with tracer.start_as_current_span("generate") as span:
        span.set_attribute("prompt.length", len(prompt))
        logger.info("Handling generate request", extra={"prompt_length": len(prompt)})
        request_counter.add(1, {"route": "/generate"})
        # ... your model call here ...
        return {"answer": "hello"}
```

Build this into a container image, deploy it as a Workload, and invoke `/generate` a few times.

## Step 6: Verify data is flowing

The platform exposes each signal at its own read endpoint. Hit them after sending a few requests to the Workload:

```
curl -s "${DATAROBOT_ENDPOINT}/otel/workload/${WORKLOAD_ID}/traces/" \
  -H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" | jq '.'

curl -s "${DATAROBOT_ENDPOINT}/otel/workload/${WORKLOAD_ID}/metrics/" \
  -H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" | jq '.'

curl -s "${DATAROBOT_ENDPOINT}/otel/workload/${WORKLOAD_ID}/logs/" \
  -H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" | jq '.'
```

Expected results:

- A generate span (with a nested span if the real handler creates one), with service.namespace=my-agent .
- A requests.handled counter incrementing on the /generate route.
- The "Handling generate request" log line with prompt_length as a structured attribute.

If the trace or metric calls return empty payloads but the log call works (or vice versa), the most common cause is forgetting to register the corresponding provider at app startup—recheck Step 5 to make sure all three `set_*_provider` calls happen before the first request.

## Troubleshooting

| Symptom | Likely cause |
| --- | --- |
| No traces, no metrics, no logs | OTEL_EXPORTER_OTLP_ENDPOINT is not set in the container's environment. Confirm with statusDetails on the proton (environment variables are visible in the replica detail) or by printing os.environ.get("OTEL_EXPORTER_OTLP_ENDPOINT") at startup. |
| Traces and metrics work, logs do not | The OTel LoggingHandler is not installed on the root logger. Plain print() and unconfigured stdlib logging do not reach the observability surface. See Step 4. |
| Logs show up but with no attributes | Fields are passed as positional args instead of extra={...}. Use logger.info("msg", extra={"key": "value"}) for queryable attributes. |
| Metric values stuck or never appear | PeriodicExportingMetricReader has not ticked yet—its first export only happens after export_interval_millis elapses. Wait one cycle, or lower the interval during development. |
| Spans do not nest | Child spans are started with start_span instead of start_as_current_span. The "as current" variant sets the context for nested calls. |

## Declarative configuration (Pulumi)

Once a stable set of OTel-related environment variables is in place (resource attributes, sampling overrides), bake them into the artifact's `environmentVars` so every deployment of that artifact gets the same observability config:

```
import pulumi
import pulumi_datarobot as datarobot

artifact = datarobot.Artifact(
    "my-agent-artifact",
    name="my-agent-artifact",
    type="service",
    spec={"container_groups": [{"containers": [{
        "name": "agent",
        "image_uri": "ghcr.io/myorg/my-agent:v1",
        "port": 8080,
        "primary": True,
        "environment_vars": [
            # OTEL_EXPORTER_OTLP_ENDPOINT is injected by the platform; do not override it.
            {"name": "OTEL_SERVICE_NAME", "value": "my-agent"},
            {"name": "OTEL_RESOURCE_ATTRIBUTES", "value": "service.namespace=my-service,deployment.environment=prod"},
            # Optional: tune sampling for high-traffic workloads.
            {"name": "OTEL_TRACES_SAMPLER", "value": "parentbased_traceidratio"},
            {"name": "OTEL_TRACES_SAMPLER_ARG", "value": "0.1"},
        ],
        "readiness_probe": {"path": "/healthz", "port": 8080},
    }]}]},
)
```

The `OTEL_EXPORTER_OTLP_ENDPOINT` variable is injected by the platform; set it explicitly only if telemetry is routed to a custom collector. Everything else (sampling, resource attributes, service name) is yours to control. See [Manage Workloads with Pulumi](https://docs.datarobot.com/en/docs/workload-api/workload-pulumi/index.html.md) for the full Pulumi setup.
