Tutorial: Replace the artifact behind a running Workload¶
This tutorial walks through a rolling replacement end-to-end—swap a running Workload's artifact for a new one while keeping the endpoint URL, governance, and identity. The platform brings up a candidate proton, waits for it to reach running, and tears the old one down. For the conceptual model (replace rules by artifact status, the candidate-running predicates, the state machine, and IaC trade-offs), see Replace and roll out.
Prerequisites¶
You need the following before starting.
| Prerequisite | Notes |
|---|---|
| A running Workload | $WORKLOAD_ID references a Workload backed by a locked service artifact. |
| A new locked artifact | $NEW_ARTIFACT_ID is the locked service artifact you want to roll out to. |
| Matching artifact status | Both artifacts must be locked (production) or both draft (development). See Replace and roll out for the matrix and why cross-status replace isn't allowed. |
Terminal with curl and jq |
Used for the HTTP calls and JSON parsing in this tutorial. |
export DATAROBOT_ENDPOINT=https://app.datarobot.com/api/v2
export DATAROBOT_API_TOKEN=<your-api-token>
Step 1: Start the replacement¶
curl -X POST "${DATAROBOT_ENDPOINT}/workloads/${WORKLOAD_ID}/replacement" \
-H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" \
-H "Content-Type: application/json" \
-d '{
"artifactId": "'"${NEW_ARTIFACT_ID}"'",
"strategy": "rolling",
"config": {
"warmupDurationMinutes": 10,
"keepOldVersionMinutes": 60
}
}'
Returns 202 Accepted plus a Replacement body. The platform creates a candidate proton from ${NEW_ARTIFACT_ID}, waits for it to pass readiness probes, then promotes it to active and tears down the old proton.
Common error responses:
| Code | Cause |
|---|---|
404 |
Workload or artifact not found, or not accessible to the caller. |
422 |
A replacement is already in progress, the Workload has no active proton, or the new artifact fails compatibility validation. |
Optional: configure autoscaling in the same call¶
Include runtime in the request so the candidate proton starts with autoscaling already configured:
curl -X POST "${DATAROBOT_ENDPOINT}/workloads/${WORKLOAD_ID}/replacement" \
-H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" \
-H "Content-Type: application/json" \
-d '{
"artifactId": "'"${NEW_ARTIFACT_ID}"'",
"strategy": "rolling",
"config": {"warmupDurationMinutes": 10, "keepOldVersionMinutes": 60},
"runtime": {
"containerGroups": [{
"name": "default",
"containers": [{"name": "agent", "resourceAllocation": {"cpu": 2, "memory": "4GB"}}],
"autoscaling": {
"enabled": true,
"policies": [{
"scalingMetric": "cpuAverageUtilization",
"target": 70, "minCount": 2, "maxCount": 10
}]
}
}]
}
}'
Scale-to-zero alternative
For bursty or low-traffic Workloads, use httpRequestsConcurrency with minCount: 0 so replicas drop to zero when idle. The trade-off is a cold start when traffic resumes after an idle period. Use minCount: 1 for latency-sensitive Workloads. See Scaling metrics.
If runtime is omitted, the Workload's current runtime is reused on the new artifact.
Step 2: Monitor the replacement¶
The replacement state is observable two ways—directly via GET /replacement, or by inspecting the protons.
# Active replacement snapshot
curl -s "${DATAROBOT_ENDPOINT}/workloads/${WORKLOAD_ID}/replacement" \
-H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" | jq '{status, strategy, candidateArtifactId, candidateProtonIds}'
# Per-proton view: summary state only
curl -s "${DATAROBOT_ENDPOINT}/workloads/${WORKLOAD_ID}/protons" \
-H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" | jq '.data[] | {id, artifactId, role, status}'
The replacement snapshot (ReplacementStatus) progresses through the following states:
submitted -> initializing -> promoting -> finalizing -> completed
During the run, the candidate proton (ProtonStatus) progresses through the following states:
{"id": "lat_abc123", "artifactId": "art_v1", "role": "active", "status": "running"}
{"id": "lat_def456", "artifactId": "art_v2", "role": "candidate", "status": "warming"}
Wait until the replacement reaches completed. The platform drives every transition automatically—there's no client action between promoting and finalizing. If the candidate hangs at provisioning, launching, or warming, see Lifecycle states for the predicate table and debugging.
Workload-level status stays on the active proton during a replace
The candidate's intermediate states—including warming, which is proton-only and never surfaces at the Workload level—are visible only via /workloads/{id}/protons/. Calling GET /workloads/{id} during a replace still shows the active proton's running status. See Lifecycle states for the Workload/proton projection.
For per-replica detail on the candidate (log tail, per-container readiness, restart counts), call statusDetails on the candidate proton directly—the list and get proton endpoints don't embed it:
CANDIDATE_ID=$(curl -s "${DATAROBOT_ENDPOINT}/workloads/${WORKLOAD_ID}/protons" \
-H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" \
| jq -r '.data[] | select(.role == "candidate") | .id')|
curl -s "${DATAROBOT_ENDPOINT}/workloads/${WORKLOAD_ID}/protons/${CANDIDATE_ID}/statusDetails" \
-H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" | jq '.'
Watch the lifecycle events¶
The /events endpoint records the replacement narrative—replacement created, candidate reached running, traffic switched, old proton drained:
curl -s "${DATAROBOT_ENDPOINT}/workloads/${WORKLOAD_ID}/events" \
-H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" | jq '.data[] | {timestamp, eventType, details}'
Monitor with OpenTelemetry during the replace¶
OpenTelemetry logs, traces, and metrics for the Workload are exposed through the platform observability surface at /api/v2/otel/workload/{workload_id}/... (note: singular workload—the entityType enum is deployment, use_case, experiment_container, custom_application, or workload):
curl -s "${DATAROBOT_ENDPOINT}/otel/workload/${WORKLOAD_ID}/logs/" \
-H "Authorization: Bearer ${DATAROBOT_API_TOKEN}"
curl -s "${DATAROBOT_ENDPOINT}/otel/workload/${WORKLOAD_ID}/traces/" \
-H "Authorization: Bearer ${DATAROBOT_API_TOKEN}"
curl -s "${DATAROBOT_ENDPOINT}/otel/workload/${WORKLOAD_ID}/metrics/" \
-H "Authorization: Bearer ${DATAROBOT_API_TOKEN}"
The OTel surface aggregates across both active and candidate protons during a replacement. For per-proton or per-replica isolation, use the Workload API's /protons/{proton_id}/statusDetails endpoint described under Monitor the replacement.
Step 3: Verify the final state¶
Once the replacement reaches completed, the candidate has been promoted and the old proton torn down. Run the following commands to confirm the final state:
curl -s "${DATAROBOT_ENDPOINT}/workloads/${WORKLOAD_ID}/protons" \
-H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" | jq '.data[] | {id, artifactId, role, status}'
You should see a single proton with the new artifact and role active:
{"id": "lat_def456", "artifactId": "art_v2", "role": "active", "status": "running"}
The Workload-level view should look like this:
curl -s "${DATAROBOT_ENDPOINT}/workloads/${WORKLOAD_ID}" \
-H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" | jq '{status, artifactId, importance, endpoint}'
artifactId should be ${NEW_ARTIFACT_ID}, and endpoint should be unchanged from before the replacement—that's the point of the API-driven path.
Replacement history¶
Past replacements show up in /workloads/{id}/history (paginated Replacement entries):
curl -s "${DATAROBOT_ENDPOINT}/workloads/${WORKLOAD_ID}/history" \
-H "Authorization: Bearer ${DATAROBOT_API_TOKEN}"
Useful for questions like "which artifact was active at 14:30 yesterday?" or for chaining replacements—each entry carries the candidate and active proton IDs and timestamps.
Cancel a replacement¶
If the candidate is failing and you don't want to wait for the rolling window to time out, cancel the active replacement:
curl -X DELETE "${DATAROBOT_ENDPOINT}/workloads/${WORKLOAD_ID}/replacement" \
-H "Authorization: Bearer ${DATAROBOT_API_TOKEN}"
Returns 202 Accepted. The platform sets the replacement to status: deleting and reverts the candidate (state moves through finalizing), and the Workload returns to its pre-replacement state with the original artifact still active.
Infrastructure-as-code alternative (Pulumi)¶
Infrastructure-as-code models one shape of replace—the Workload-level rebuild—by treating artifact_id as a replace-on-change attribute. Bumping it triggers Pulumi or Terraform to bring a new Workload up and tear the old one down, with delete_before_replace = false to preserve uptime.
import pulumi
import pulumi_datarobot as datarobot
# Requires pulumi-datarobot >= 0.10.38
prod = datarobot.Workload(
"agent-prod",
name="agent-prod",
artifact_id=new_artifact_id, # bump this -> triggers IaC replace
importance="high",
runtime={"container_groups": [{"replica_count": 2, "containers": [{
"name": "agent",
"resource_allocation": {"cpu": 2, "memory": 4294967296},
}]}]},
opts=pulumi.ResourceOptions(delete_before_replace=False),
)
Critical distinction: IaC replaces the Workload; the API replaces the protons
IaC replace rebuilds the Workload, which means the endpoint URL changes. POST /workloads/{id}/replacement swaps protons under a stable Workload, which preserves the endpoint. Both are rolling—only one preserves the endpoint. See Manage Workloads with Pulumi for the full IaC comparison and the "when to use each" guidance.