Tutorial: Deploy a production-ready container¶
Deploy a containerized AI service with full governance: locked artifact, importance, sharing, and monitoring. Unlike a draft Workload (see Hello, workload), this one is long-lived and production-grade.
This tutorial deploys a FastAPI-based agent service that exposes:
- OpenAI-compatible
/chat/completions: proxies requests to a DataRobot-deployed LLM. - LangGraph
/agentendpoint: a ReAct agent with ArXiv search. /healthz,/readyz,/health: liveness, readiness, detailed status.
Tutorial format¶
When appropriate, steps below show the call using curl, followed by a code cell that runs the same call via requests. The check(resp) helper raises on non-2xx and prints the API's response body so any validation detail is visible alongside the traceback.
Locked-artifact Workloads at a glance¶
| Property | Value |
|---|---|
| Lifetime | Indefinite. Persists until explicitly stopped or deleted. |
| Artifact mutability | Immutable once locked. |
importance |
Optional; defaults to low. Set explicitly for production (critical, high, moderate, or low). |
| Workloads per artifact | Unlimited. One locked artifact can back many Workloads. |
| Replace | Supported. Replace locked with locked only. |
Connect to DataRobot¶
To connect to DataRobot, you need the following:
- DataRobot API endpoint,
DATAROBOT_ENDPOINT. - DataRobot API token,
DATAROBOT_API_TOKEN. - For the cURL examples, a terminal with
curlandjq. - For the Python Notebook examples, Python
requestsanddatarobot.
The connection details are set automatically inside this DataRobot Notebook. To run the cURL commands directly, export DATAROBOT_ENDPOINT and DATAROBOT_API_TOKEN:
export DATAROBOT_ENDPOINT=https://app.datarobot.com/api/v2
export DATAROBOT_API_TOKEN=<your-api-token>
The cell below connects this notebook to DataRobot and sets up the check(resp) helper.
import json
import time
import datarobot as dr
import requests
client = dr.Client()
# client = dr.Client(endpoint="https://app.datarobot.com/api/v2", token="YOUR_API_TOKEN")
API_BASE = client.endpoint.rstrip("/")
HEADERS = {
"Authorization": f"Bearer {client.token}",
"Content-Type": "application/json",
}
def check(resp):
if not resp.ok:
print("Status:", resp.status_code, resp.reason)
try:
print("Body:", json.dumps(resp.json(), indent=2))
except ValueError:
print("Body:", resp.text)
resp.raise_for_status()
return resp
print("Connected:", API_BASE)
Configure variables¶
Set the values that vary per user before running the rest of the notebook.
| Variable | Purpose |
|---|---|
MODEL |
Model name passed to the container's MODEL env var and used in the chat-completions request. |
LLM_DEPLOYMENT_ID |
DataRobot LLM deployment the container routes requests to (via the DEPLOYMENT_ID env var). |
RECIPIENT_USER_ID |
User, group, or organization ID to share the Workload with. Leave blank to skip sharing. |
If running the curl commands from a terminal, export these values as shell variables:
export MODEL="datarobot-deployed-llm"
export LLM_DEPLOYMENT_ID="<your-deployment-id>"
export RECIPIENT_USER_ID="" # Leave blank to skip sharing
If running the Python Notebook, set the variables in the cell below:
MODEL = "datarobot-deployed-llm" # Use "datarobot-deployed-llm" unless otherwise specified by the deployment.
LLM_DEPLOYMENT_ID = "" # e.g. "6543abc..."
RECIPIENT_USER_ID = "" # Leave blank to skip sharing
assert MODEL, "Set MODEL before running the rest of the notebook."
assert LLM_DEPLOYMENT_ID, "Set LLM_DEPLOYMENT_ID before running the rest of the notebook."
Create the Workload¶
Artifacts are always created as draft, so you create the Workload first with importance set, then lock the artifact. Locking an artifact flips its backing Workload into the locked (production) lifecycle: indefinite lifetime, immutable spec, eligible for locked-to-locked replace.
curl -s -X POST "${DATAROBOT_ENDPOINT}/workloads" \
-H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" \
-H "Content-Type: application/json" \
-d "$(jq -n \
--arg endpoint "$DATAROBOT_ENDPOINT" \
--arg token "$DATAROBOT_API_TOKEN" \
'{
"name": "agent-service",
"importance": "high",
"artifact": {
"name": "agent-service-artifact",
"type": "service",
"spec": {
"containerGroups": [{
"name": "default",
"containers": [{
"name": "agent",
"imageUri": "otkachnlp/fastapi-server-example:latest",
"port": 8080,
"primary": true,
"entrypoint": ["python", "server.py"],
"environmentVars": [
{"name": "MODEL", "value": "azure/gpt-5-nano-2025-08-07"},
{"name": "DATAROBOT_ENDPOINT", "value": $endpoint},
{"name": "DATAROBOT_API_TOKEN", "value": $token}
],
"readinessProbe": {"path": "/readyz", "port": 8080}
}]
}]
}
},
"runtime": {
"containerGroups": [{
"name": "default",
"replicaCount": 1,
"containers": [{
"name": "agent",
"resourceAllocation": {"cpu": 1, "memory": "512MB"}
}]
}]
}
}'
)" | tee /tmp/workload.json
When following this tutorial, consider the following:
- **Artifact vs. runtime.** The artifact's `spec` defines the container topology (image, port, entrypoint, env vars, probes), which is anything that travels with the artifact across deployments. Replica count, CPU/memory, autoscaling, and resource bundles are deployment-time concerns and live in `runtime.containerGroups[]`. Entries are looked up by group `name` (default group is `"default"`), and per-container allocations by container `name`. Both objects need names when you're customizing resources.
- - **OTel logs.** The artifact image supports OTel logs. To enable them, add `OTEL_EXPORTER_OTLP_ENDPOINT` to `environmentVars`. The container validates the required environment variables at startup and exits with an explicit error if any are missing, and the `/ready`z probe surfaces startup errors rather than masking them.
- `readinessProbe.path` is the gate to `running`. The platform polls that path on the container's port and only transitions the Workload to `running` once it returns 2xx. This tutorial points the probe at `/healthz`, which returns 2xx as soon as the FastAPI process is up. The container also exposes a /readyz endpoint that exercises the LLM connection, but keep the readiness probe pointed at /healthz. Gating running on an external dependency causes the Workload's status to flap whenever that dependency has issues. Keep the deep checks reachable as explicit endpoints your monitoring and runbooks can hit; don't make them block startup.
- **API token in env vars.** The FastAPI server authenticates against the DataRobot LLM deployment on every request, so the container needs an API token at runtime. The artifact spec passes `OPENAI_API_KEY` and `DATAROBOT_API_TOKEN` (the same value, in both common names) so whichever the server reads is populated. Baking a literal token into the artifact is fine for staging exploration; for production, switch to a `dr-credential`-sourced environment variable so the token isn't stored in the artifact spec.
payload = {
"name": f"agent-service-{int(time.time())}",
"importance": "high",
"artifact": {
"name": f"agent-service-artifact-{int(time.time())}",
"type": "service",
"spec": {
"containerGroups": [{
"name": "default",
"containers": [{
"name": "agent",
"imageUri": "abdodatarobot/fastapi-server-example:latest",
"port": 8080,
"primary": True,
"entrypoint": ["python", "server.py"],
"environmentVars": [
{"name": "MODEL", "value": MODEL},
{"name": "DEPLOYMENT_ID", "value": LLM_DEPLOYMENT_ID},
{"name": "OPENAI_API_KEY", "value": client.token},
{"name": "DATAROBOT_API_TOKEN", "value": client.token},
{"name": "DATAROBOT_ENDPOINT", "value": API_BASE},
],
"readinessProbe": {"path": "/healthz", "port": 8080},
}]
}]
},
},
"runtime": {
"containerGroups": [{
"name": "default",
"replicaCount": 1,
"containers": [{
"name": "agent",
"resourceAllocation": {"cpu": 1, "memory": "512MB"},
}],
}]
},
}
resp = requests.post(f"{API_BASE}/workloads", headers=HEADERS, json=payload, timeout=120)
check(resp)
body = resp.json()
workload_id = body["id"]
artifact_id = body.get("artifactId") or body.get("artifact", {}).get("id")
print("Workload ID:", workload_id)
print("Artifact ID:", artifact_id)
Lock the artifact¶
Transition the artifact from draft to locked. Because this Workload is the only one backing the draft artifact, the Workload's lifecycle transitions to locked alongside it.
curl -X PATCH "${DATAROBOT_ENDPOINT}/artifacts/${ARTIFACT_ID}" \
-H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" \
-H "Content-Type: application/json" \
-d '{"status": "locked"}'
PATCH /artifacts/{artifact_id}acceptsname,description,spec, andstatus. Locking is one-way: locked artifacts cannot return to draft.
resp = requests.patch(
f"{API_BASE}/artifacts/{artifact_id}",
headers=HEADERS,
json={"status": "locked"},
timeout=120,
)
check(resp)
print("Artifact status:", resp.json().get("status"))
Wait for running¶
Poll the Workload's .status until it reaches running. Expected happy-path progression: submitted → provisioning → launching → running. The status may briefly read errored during startup; keep polling.
curl -s "${DATAROBOT_ENDPOINT}/workloads/${WORKLOAD_ID}" \
-H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" | jq -r '.status'
deadline = time.time() + 600
while time.time() < deadline:
r = requests.get(f"{API_BASE}/workloads/{workload_id}", headers=HEADERS, timeout=60)
check(r)
status = r.json().get("status")
print("status:", status)
if status == "running":
break
time.sleep(5)
else:
raise TimeoutError("Timed out waiting for running")
Invoke the service¶
Read the invoke URL from the Workload's .endpoint field, then call your application routes against it.
ENDPOINT=$(curl -s "${DATAROBOT_ENDPOINT}/workloads/${WORKLOAD_ID}" \
-H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" | jq -r '.endpoint')
curl -X POST "${ENDPOINT}/chat/completions" \
-H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" \
-H "Content-Type: application/json" \
-d '{"model": "'"${MODEL}"'", "messages": [{"role": "user", "content": "Hello!"}]}'
r = requests.get(f"{API_BASE}/workloads/{workload_id}", headers=HEADERS, timeout=60)
check(r)
invoke_url = r.json()["endpoint"]
if invoke_url.startswith("http://"):
invoke_url = "https://" + invoke_url.removeprefix("http://")
print("Invoke URL:", invoke_url)
resp = requests.post(
invoke_url.rstrip("/") + "/chat/completions",
headers={
"Authorization": f"Bearer {client.token}",
"Content-Type": "application/json",
},
json={"model": MODEL, "messages": [{"role": "user", "content": "Hello!"}]},
timeout=120,
)
check(resp)
print(json.dumps(resp.json(), indent=2)[:2000])
Govern the Workload¶
Now that it's a production Workload, wire up importance, sharing, and access controls.
Raise importance to critical:
curl -X PATCH "${DATAROBOT_ENDPOINT}/workloads/${WORKLOAD_ID}" \
-H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" \
-H "Content-Type: application/json" \
-d '{"importance": "critical"}'
Share with another user, group, or organization:
curl -X PATCH "${DATAROBOT_ENDPOINT}/workloads/${WORKLOAD_ID}/sharedRoles" \
-H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" \
-H "Content-Type: application/json" \
-d '{
"operation": "updateRoles",
"roles": [{"id": "<recipient-id>", "role": "USER", "shareRecipientType": "user"}]
}'
PATCH /workloads/{id}acceptsname,description, andimportance. For runtime changes (replicas, resources) usePATCH /workloads/{id}/settings, which triggers a rolling replacement.
resp = requests.patch(
f"{API_BASE}/workloads/{workload_id}",
headers=HEADERS,
json={"importance": "critical"},
timeout=60,
)
check(resp)
print("Importance:", resp.json().get("importance"))
if not RECIPIENT_USER_ID:
print("RECIPIENT_USER_ID is blank; skipping share.")
else:
share = requests.patch(
f"{API_BASE}/workloads/{workload_id}/sharedRoles",
headers=HEADERS,
json={
"operation": "updateRoles",
"roles": [{"id": RECIPIENT_USER_ID, "role": "USER", "shareRecipientType": "user"}],
},
timeout=60,
)
check(share)
print("Share status:", share.status_code)
The server also contains an agent with a tool call to Arxiv. To test the agent use the /agent route:
curl -X POST "${ENDPOINT}/agent" \
-H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" \
-H "Content-Type: application/json" \
-d '{"query": "Find latest research on solar power"}'
Observe¶
Locked Workloads expose the full monitoring surface with 30-day retention.
| Capability | Endpoint |
|---|---|
| Service health, latency, error rate | GET /workloads/{id} (computed fields on the Workload) |
| Lifecycle events (audit trail) | GET /workloads/{id}/events |
| Aggregate request statistics | GET /workloads/{id}/stats |
| Per-metric time series | GET /workloads/{id}/stats/{metric_name} |
| Per-replica status | GET /workloads/{id}/protons/{proton_id}/statusDetails |
curl -s "${DATAROBOT_ENDPOINT}/workloads/${WORKLOAD_ID}/events" \
-H "Authorization: Bearer ${DATAROBOT_API_TOKEN}"
curl -s "${DATAROBOT_ENDPOINT}/workloads/${WORKLOAD_ID}/stats" \
-H "Authorization: Bearer ${DATAROBOT_API_TOKEN}"
ev = requests.get(f"{API_BASE}/workloads/{workload_id}/events", headers=HEADERS, timeout=60)
check(ev)
print("Events:", json.dumps(ev.json(), indent=2)[:2000])
stats = requests.get(f"{API_BASE}/workloads/{workload_id}/stats", headers=HEADERS, timeout=60)
check(stats)
print("\nStats:", json.dumps(stats.json(), indent=2)[:2000])