# Container health and readiness

> Container health and readiness - How container probes drive Workload readiness and traffic gating.

This Markdown file sits beside the HTML page at the same path (with a `.md` suffix). It summarizes the topic and lists links for tools and LLM context.

Companion generated at `2026-06-22T16:50:38.249403+00:00` (UTC).

## Primary page

- [Container health and readiness](https://docs.datarobot.com/en/docs/workload-api/monitor-workloads/health-readiness.html.md): Full documentation for this topic (Markdown sidecar).

## Sections on this page

- [Readiness probe](https://docs.datarobot.com/en/docs/workload-api/monitor-workloads/health-readiness.html.md#readiness-probe): In-page section heading.
- [Liveness probe](https://docs.datarobot.com/en/docs/workload-api/monitor-workloads/health-readiness.html.md#liveness-probe): In-page section heading.
- [Startup probe](https://docs.datarobot.com/en/docs/workload-api/monitor-workloads/health-readiness.html.md#startup-probe): In-page section heading.
- [Probe configuration reference](https://docs.datarobot.com/en/docs/workload-api/monitor-workloads/health-readiness.html.md#probe-config): In-page section heading.
- [How readiness drives lifecycle states](https://docs.datarobot.com/en/docs/workload-api/monitor-workloads/health-readiness.html.md#readiness-and-lifecycle): In-page section heading.

## Related documentation

- [Workload API](https://docs.datarobot.com/en/docs/workload-api/index.html.md): Linked from this page.
- [Monitor telemetry and health](https://docs.datarobot.com/en/docs/workload-api/monitor-workloads/index.html.md): Linked from this page.
- [Workload concepts: Lifecycle states](https://docs.datarobot.com/en/docs/workload-api/create-workloads/workload-concepts.html.md#lifecycle-states): Linked from this page.
- [Tutorial: Deploy a production-ready container](https://docs.datarobot.com/en/docs/workload-api/create-workloads/tutorial-production-ready-container.html.md): Linked from this page.
- [Hello, Workload](https://docs.datarobot.com/en/docs/workload-api/get-started-workloads/tutorial-hello-world.html.md): Linked from this page.

## Documentation content

Workload status reflects the health of the containers that back it. Three probe types—readiness, liveness, and startup—are configurable per container and determine when a Workload is ready for traffic, when to restart an unhealthy container, and how long the platform waits before declaring a slow-starting container failed.

## Readiness probe

Service artifacts must declare a `readinessProbe` so the platform can tell when a container is safe to receive traffic. The platform polls `readinessProbe.path` until it returns a 2xx response; until that happens, the Workload stays in `launching` rather than `running`, and the Workload's invoke endpoint does not route to that container.

```
{
  "readinessProbe": {
    "path": "/healthz",
    "port": 8080,
    "scheme": "HTTP",
    "initialDelaySeconds": 30,
    "periodSeconds": 30,
    "timeoutSeconds": 30,
    "failureThreshold": 3
  }
}
```

`path` is required; the rest are optional with the defaults shown.

## Liveness probe

`livenessProbe` lets the platform restart a container that has become unresponsive while still running. It uses the same `ProbeConfig` shape as the readiness probe. Use it when your service can wedge in a state that the readiness probe wouldn't catch (deadlock, stuck event loop) but a fresh process would recover from.

## Startup probe

`startupProbe` gives slow-to-initialize containers more time to come up before the readiness and liveness probes take over. Useful for large model loads, JIT warmup, or any container whose initial startup is much longer than its steady-state response time.

## Probe configuration reference

All three probe types use the `ProbeConfig` schema:

| Field | Default | Description |
| --- | --- | --- |
| path | (required) | HTTP path to query for the health check. |
| port | 8080 | Port to access on the container (1–65535). |
| scheme | HTTP | Accepts HTTP or HTTPS. |
| host | (pod IP) | Optional host header override. |
| httpHeaders | none | Additional headers to send with the probe request. |
| initialDelaySeconds | 30 | Seconds to wait before the first probe runs. |
| periodSeconds | 30 | Seconds between probes. |
| timeoutSeconds | 30 | Seconds before a probe is considered failed. |
| failureThreshold | 3 | Consecutive failures before the probe is considered failed. |

## How readiness drives lifecycle states

The platform computes the Workload's status bottom-up: each container's readiness rolls into its replica's phase, replica phases roll into the proton's status, and proton status is what the Workload reports. A Workload only reaches `running` when every replica's containers have all passed readiness. See [Workload concepts: Lifecycle states](https://docs.datarobot.com/en/docs/workload-api/create-workloads/workload-concepts.html.md#lifecycle-states) for the full aggregation rules and [Tutorial: Deploy a production-ready container](https://docs.datarobot.com/en/docs/workload-api/create-workloads/tutorial-production-ready-container.html.md) for an end-to-end example with `readinessProbe.path` set to `/healthz` on the FastAPI gateway image (or `/` on a minimal whoami Workload in [Hello, Workload](https://docs.datarobot.com/en/docs/workload-api/get-started-workloads/tutorial-hello-world.html.md)).

For per-replica diagnostics when readiness gates aren't being met (image pull issues, crash loops, unmet probe conditions), use `GET /workloads/{workload_id}/protons/{proton_id}/statusDetails`.
