Container health and readiness¶
Workload status reflects the health of the containers that back it. Three probe types—readiness, liveness, and startup—are configurable per container and determine when a Workload is ready for traffic, when to restart an unhealthy container, and how long the platform waits before declaring a slow-starting container failed.
Readiness probe¶
Service artifacts must declare a readinessProbe so the platform can tell when a container is safe to receive traffic. The platform polls readinessProbe.path until it returns a 2xx response; until that happens, the Workload stays in launching rather than running, and the Workload's invoke endpoint does not route to that container.
{
"readinessProbe": {
"path": "/healthz",
"port": 8080,
"scheme": "HTTP",
"initialDelaySeconds": 30,
"periodSeconds": 30,
"timeoutSeconds": 30,
"failureThreshold": 3
}
}
path is required; the rest are optional with the defaults shown.
Liveness probe¶
livenessProbe lets the platform restart a container that has become unresponsive while still running. It uses the same ProbeConfig shape as the readiness probe. Use it when your service can wedge in a state that the readiness probe wouldn't catch (deadlock, stuck event loop) but a fresh process would recover from.
Startup probe¶
startupProbe gives slow-to-initialize containers more time to come up before the readiness and liveness probes take over. Useful for large model loads, JIT warmup, or any container whose initial startup is much longer than its steady-state response time.
Probe configuration reference¶
All three probe types use the ProbeConfig schema:
| フィールド | デフォルト | 説明 |
|---|---|---|
path |
(必須) | HTTP path to query for the health check. |
port |
8080 |
Port to access on the container (1–65535). |
scheme |
HTTP |
Accepts HTTP or HTTPS. |
host |
(pod IP) | Optional host header override. |
httpHeaders |
なし | Additional headers to send with the probe request. |
initialDelaySeconds |
30 |
Seconds to wait before the first probe runs. |
periodSeconds |
30 |
Seconds between probes. |
timeoutSeconds |
30 |
Seconds before a probe is considered failed. |
failureThreshold |
3 |
Consecutive failures before the probe is considered failed. |
How readiness drives lifecycle states¶
The platform computes the Workload's status bottom-up: each container's readiness rolls into its replica's phase, replica phases roll into the proton's status, and proton status is what the Workload reports. A Workload only reaches running when every replica's containers have all passed readiness. See Workload concepts: Lifecycle states for the full aggregation rules and Tutorial: Deploy a production-ready container for an end-to-end example with readinessProbe.path set to /healthz on the FastAPI gateway image (or / on a minimal whoami Workload in Hello, Workload).
For per-replica diagnostics when readiness gates aren't being met (image pull issues, crash loops, unmet probe conditions), use GET /workloads/{workload_id}/protons/{proton_id}/statusDetails.