# Lifecycle states

> Lifecycle states - Workload states, pod-based computation, protons, and debugging.

This Markdown file sits beside the HTML page at the same path (with a `.md` suffix). It summarizes the topic and lists links for tools and LLM context.

Companion generated at `2026-06-22T16:50:38.253485+00:00` (UTC).

## Primary page

- [Lifecycle states](https://docs.datarobot.com/en/docs/workload-api/operate-workloads/lifecycle-states.html.md): Full documentation for this topic (Markdown sidecar).

## Sections on this page

- [Typical transitions](https://docs.datarobot.com/en/docs/workload-api/operate-workloads/lifecycle-states.html.md#transitions): In-page section heading.
- [Pod-based truth](https://docs.datarobot.com/en/docs/workload-api/operate-workloads/lifecycle-states.html.md#pod-based-truth): In-page section heading.
- [Why errored is sticky](https://docs.datarobot.com/en/docs/workload-api/operate-workloads/lifecycle-states.html.md#why-errored-sticky): In-page section heading.
- [Debug an errored Workload](https://docs.datarobot.com/en/docs/workload-api/operate-workloads/lifecycle-states.html.md#debug-an-errored-workload): In-page section heading.
- [Protons: runtime backing and aggregation](https://docs.datarobot.com/en/docs/workload-api/operate-workloads/lifecycle-states.html.md#protons-lifecycle): In-page section heading.
- [Object hierarchy](https://docs.datarobot.com/en/docs/workload-api/operate-workloads/lifecycle-states.html.md#proton-object-hierarchy): In-page section heading.
- [How proton status is computed](https://docs.datarobot.com/en/docs/workload-api/operate-workloads/lifecycle-states.html.md#proton-status-computation): In-page section heading.
- [Multi-container examples](https://docs.datarobot.com/en/docs/workload-api/operate-workloads/lifecycle-states.html.md#multi-container-examples): In-page section heading.
- [Inspect protons](https://docs.datarobot.com/en/docs/workload-api/operate-workloads/lifecycle-states.html.md#inspect-protons): In-page section heading.

## Related documentation

- [Workload API](https://docs.datarobot.com/en/docs/workload-api/index.html.md): Linked from this page.
- [Operate running Workloads](https://docs.datarobot.com/en/docs/workload-api/operate-workloads/index.html.md): Linked from this page.

## Documentation content

A Workload's lifecycle is the set of states it transitions through from creation to teardown—and what triggers each transition. The platform derives states from the Kubernetes pods backing the Workload's protons.

The platform tracks state at two levels.`ProtonStatus` is the full runtime view computed from the Kubernetes pods backing each proton.`WorkloadStatus` is the user-facing subset surfaced on the Workload itself—it omits internal proton-lifecycle states ( `initializing`, `warming`, `draining`, `restarting`) that the Workload-level API never reports.

The combined state table uses the Surfaces on column to indicate where each value appears.

| State | Surfaces on | Description |
| --- | --- | --- |
| unknown | Workload + proton | State not yet reported. The platform has not produced a snapshot yet. |
| submitted | Workload + proton | Accepted by the API, not yet scheduled. Pod is Pending with no node assigned. |
| initializing | Proton only | In-place update or restart while containers are being recreated and not yet ready. Workloads collapse this into adjacent Workload-level states (launching or running). |
| provisioning | Workload + proton | Cluster resources are being allocated—resource-bundle scheduling, PVC creation, and secret injection. Pod is Pending with a node assigned. |
| launching | Workload + proton | Image pull and container start in progress, or containers are up but readiness probes have not passed yet. Pod is Running with not-yet-ready containers. |
| running | Workload + proton | Healthy and serving requests. Pod is Running and all containers are ready. |
| suspended | Workload + proton | Intentionally paused. Pods are stopped, but the Workload's identity and configuration are preserved; resume restores them without re-scheduling from scratch. |
| warming | Proton only | Running in the warmup window during a replacement. Reflected from the candidate proton. |
| draining | Proton only | Alive but no longer receiving traffic. Old proton during a replacement. |
| interrupted | Workload + proton | Runtime preempted—node eviction, spot reclaim, or scheduler-driven displacement. The platform reschedules automatically once capacity frees up. |
| restarting | Proton only | Proton being recreated in place after a configuration change that requires a pod restart but preserves proton identity. The Workload-level status stays on the adjacent visible state. |
| stopping | Workload + proton | Graceful shutdown is in progress. Pod is terminating. |
| stopped | Workload + proton | Stopped; can be restarted. No pods are present, or the pod phase is Succeeded for run-to-completion artifacts. |
| errored | Workload + proton | Failed. CrashLoopBackOff, ImagePullBackOff, or pod phase Failed. |
| terminated | Workload + proton | Permanently torn down. Workload was deleted; the proton no longer exists. |

The platform evaluates pod-state predicates in priority order to derive `ProtonStatus`; `WorkloadStatus` is then derived by collapsing the proton-only states.

## Typical transitions

The following table lists the common state sequences for create, update, replace, stop, failure, and promote operations. The sequences are written at the proton level to show the full transition path; at the Workload level the proton-only states ( `initializing`, `warming`, `draining`, `restarting`) collapse into adjacent Workload-visible states.

| Operation | Transition |
| --- | --- |
| Create | submitted -> provisioning -> launching -> running |
| Update | running -> initializing -> running |
| Replace | running -> running(active) + initializing/warming/running(candidate) -> switch -> draining(active) + running(new active) -> running(single active) |
| Stop | running -> stopping -> stopped |
| Suspend | running -> suspended |
| Failure | launching\|running -> errored |
| Promote | running (draft) -> running (locked) (no restart) |

## Pod-based truth

The platform computes Workload status from pod predicates in priority order:

- For multi-replica Workloads, worst-state-wins aggregation applies.
- Sidecars factor into container readiness and failure evaluation.
- Init container statuses don't factor into the predicates—they affect startup but do not directly trigger Workload-level errored .

## Why errored is sticky

Once any container hits `CrashLoopBackOff`, the Workload reports `errored` and won't return to `running` until the offending pod is replaced or the failing container starts succeeding again. A Workload also stays in `launching` until every container's readiness probe passes—a typo in `readinessProbe.path` or a sidecar that never becomes ready keeps the Workload from reaching `running` even when the primary container is up.

## Debug an errored Workload

`statusDetails` on a proton includes the failing container name and reason. The `/events` endpoint gives a lifecycle audit trail; logs give application-level context.

To pull lifecycle events and per-proton status to pinpoint where a Workload went wrong:

```
# Lifecycle events: what changed and when
curl -s "${DATAROBOT_ENDPOINT}/workloads/${WORKLOAD_ID}/events" \
  -H "Authorization: Bearer ${DATAROBOT_API_TOKEN}"

# Per-proton status
curl -s "${DATAROBOT_ENDPOINT}/workloads/${WORKLOAD_ID}/protons" \
  -H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" | jq '.data[] | {id, role, status}'
```

## Protons: runtime backing and aggregation

A proton is the runtime primitive behind a Workload—the actual running container instances. Workloads are the governed identity; protons are the execution.

### Object hierarchy

A Workload always has at least one proton. During a replace it can temporarily have two—old and new—with Workload traffic routing deciding which proton receives requests.

A proton consists of one or more Kubernetes pods:

- A single-replica Workload maps to one pod.
- A multi-replica Workload maps to one pod per replica.

A Workload is the stable address and governance wrapper; a proton is the runtime backing that executes the artifact behind that address.

### How proton status is computed

The platform computes a proton's status by aggregating pod states using predicates in priority order.

For multi-replica protons, worst-state-wins aggregation applies.

Predicates are evaluated in priority order; the highest-priority match wins.

| Pod condition | Proton state | Workload state | Priority |
| --- | --- | --- | --- |
| Any container in CrashLoopBackOff | errored | errored | 7 (highest) |
| Any container in ImagePullBackOff | errored | errored | 7 |
| Pod phase Failed | errored | errored | 6 |
| Pod phase Succeeded | stopped | stopped | 5 |
| Pod phase Pending (no node yet) | submitted | submitted | 4 |
| Pod phase Pending (scheduled to node) | provisioning | provisioning | 4 |
| Pod phase Running, not all containers ready | launching | launching | 3 |
| Pod phase Running, all containers ready | running | running | 2 |
| Node eviction in progress | interrupted | interrupted | — |
| User-issued suspend | suspended | suspended | — |
| Proton recreated after config change | restarting | (proton-only—Workload state unchanged) | — |
| No pods present | stopped (during shutdown) or errored (unexpected) | same | — |

Predicates examine every container in each pod, including sidecars. If a sidecar has a wrong image URI, the proton reports `errored` even if the primary container is healthy. Init container statuses are not evaluated by the predicates—they affect startup but do not directly trigger `errored`.

#### Multi-container examples

| Scenario | Pod phase | Container states | Workload state |
| --- | --- | --- | --- |
| App and sidecar both healthy | Running | app ready, sidecar ready | running |
| App running, sidecar in ImagePullBackOff | Running | app ready, sidecar ImagePullBackOff | errored |
| App in CrashLoopBackOff | Running | app CrashLoopBackOff | errored |
| App starting, sidecar running | Running | app ContainerCreating, sidecar ready | launching |
| Pod scheduled, containers waiting | Pending | both Waiting (node assigned) | provisioning |
| Pod pending, no node yet | Pending | both Waiting (no node) | submitted |

### Inspect protons

Most users interact through Workloads. Proton endpoints are useful for inspecting runtime state and testing candidates during replace operations.

To list the protons backing a Workload to see the active and any candidate instance:

```
# List protons for a workload
curl -s "${DATAROBOT_ENDPOINT}/workloads/${WORKLOAD_ID}/protons" \
  -H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" | jq '.'
```

To see the full set of proton endpoints available on the API:

```
GET /workloads/{workload_id}/protons
GET /workloads/{workload_id}/protons/{proton_id}
GET /workloads/{workload_id}/protons/{proton_id}/statusDetails
```

OpenTelemetry traces, logs, and metrics are exposed on the platform observability surface at `/api/v2/otel/workload/{workload_id}/{logs,metrics,traces}` (singular `workload`, keyed by Workload ID—not proton ID). The Workload API itself does not expose `/otel/*` routes.
