Workload API > Create Workloads > Tutorial: Deploy a production-ready container

Tutorial: Deploy a production-ready container¶

Deploy a containerized AI service with full governance: locked artifact, importance, sharing, and monitoring. Unlike a draft Workload (see Hello, workload), this one is long-lived and production-grade.

This tutorial deploys a FastAPI-based agent service that exposes:

OpenAI-compatible /chat/completions: connects to the DataRobot LLM Gateway — no separate LLM deployment required.
LangGraph /agent endpoint: a ReAct agent with ArXiv search.
/healthz, /readyz, /health: liveness, readiness, detailed status.

Tutorial format¶

When appropriate, steps below show the call using curl, followed by a code cell that runs the same call via requests. The check(resp) helper raises on non-2xx and prints the API's response body so any validation detail is visible alongside the traceback.

Locked-artifact Workloads at a glance¶

Property	Value
Lifetime	Indefinite. Persists until explicitly stopped or deleted.
Artifact mutability	Immutable once locked.
`importance`	Optional; defaults to `low`. Set explicitly for production (`critical`, `high`, `moderate`, or `low`).
Workloads per artifact	Unlimited. One locked artifact can back many Workloads.
Replace	Supported. Replace locked with locked only.

Connect to DataRobot¶

To connect to DataRobot, you need the following:

DataRobot API endpoint, DATAROBOT_ENDPOINT.
DataRobot API token, DATAROBOT_API_TOKEN.
For the cURL examples, a terminal with curl and jq.
For the Python Notebook examples, Python requests and datarobot.

The connection details are set automatically inside this DataRobot Notebook. To run the cURL commands directly, export DATAROBOT_ENDPOINT and DATAROBOT_API_TOKEN:

export DATAROBOT_ENDPOINT=https://app.datarobot.com/api/v2
export DATAROBOT_API_TOKEN=<your-api-token>

The cell below connects this notebook to DataRobot and sets up the check(resp) helper.

In [ ]:

Copied!





import json
import time

import datarobot as dr
import requests

client = dr.Client()
# client = dr.Client(endpoint="https://app.datarobot.com/api/v2", token="YOUR_API_TOKEN")

API_BASE = client.endpoint.rstrip("/")
HEADERS = {
    "Authorization": f"Bearer {client.token}",
    "Content-Type": "application/json",
}


def check(resp):
    if not resp.ok:
        print("Status:", resp.status_code, resp.reason)
        try:
            print("Body:", json.dumps(resp.json(), indent=2))
        except ValueError:
            print("Body:", resp.text)
        resp.raise_for_status()
    return resp


print("Connected:", API_BASE)
import json
import time

import datarobot as dr
import requests

client = dr.Client()
# client = dr.Client(endpoint="https://app.datarobot.com/api/v2", token="YOUR_API_TOKEN")

API_BASE = client.endpoint.rstrip("/")
HEADERS = {
    "Authorization": f"Bearer {client.token}",
    "Content-Type": "application/json",
}


def check(resp):
    if not resp.ok:
        print("Status:", resp.status_code, resp.reason)
        try:
            print("Body:", json.dumps(resp.json(), indent=2))
        except ValueError:
            print("Body:", resp.text)
        resp.raise_for_status()
    return resp


print("Connected:", API_BASE)

Configure variables¶

Set the values that vary per user before running the rest of the notebook.

Variable	Purpose
`MODEL`	Model name passed to the container's `MODEL` env var and sent in the chat-completions request. The container routes to the DataRobot LLM Gateway using `DATAROBOT_ENDPOINT` and `DATAROBOT_API_TOKEN`.
`RECIPIENT_USER_ID`	User, group, or organization ID to share the Workload with. Leave blank to skip sharing.

If running the curl commands from a terminal, export these values as shell variables:

export MODEL="azure/gpt-5-nano-2025-08-07"
export RECIPIENT_USER_ID=""  # Leave blank to skip sharing

If running the Python Notebook, set the variables in the cell below:

In [ ]:

Copied!

MODEL = "azure/gpt-5-nano-2025-08-07"
RECIPIENT_USER_ID = ""   # Leave blank to skip sharing

assert MODEL, "Set MODEL before running the rest of the notebook."
MODEL = "azure/gpt-5-nano-2025-08-07"
RECIPIENT_USER_ID = ""   # Leave blank to skip sharing

assert MODEL, "Set MODEL before running the rest of the notebook."

Create the Workload¶

Artifacts are always created as draft, so you create the Workload first with importance set, then lock the artifact. Locking an artifact flips its backing Workload into the locked (production) lifecycle: indefinite lifetime, immutable spec, eligible for locked-to-locked replace.

curl -s -X POST "${DATAROBOT_ENDPOINT}/workloads" \
  -H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" \
  -H "Content-Type: application/json" \
  -d "$(jq -n \
    --arg endpoint "$DATAROBOT_ENDPOINT" \
    --arg token "$DATAROBOT_API_TOKEN" \
    '{
      "name": "agent-service",
      "importance": "high",
      "artifact": {
        "name": "agent-service-artifact",
        "type": "service",
        "spec": {
          "containerGroups": [{
            "name": "default",
            "containers": [{
              "name": "agent",
              "imageUri": "otkachnlp/fastapi-server-example:latest",
              "port": 8080,
              "primary": true,
              "entrypoint": ["python", "server.py"],
              "environmentVars": [
                {"name": "MODEL", "value": "azure/gpt-5-nano-2025-08-07"},
                {"name": "DATAROBOT_ENDPOINT", "value": $endpoint},
                {"name": "DATAROBOT_API_TOKEN", "value": $token}
              ],
              "readinessProbe": {"path": "/readyz", "port": 8080}
            }]
          }]
        }
      },
      "runtime": {
        "containerGroups": [{
          "name": "default",
          "replicaCount": 1,
          "containers": [{
            "name": "agent",
            "resourceAllocation": {"cpu": 1, "memory": "512MB"}
          }]
        }]
      }
    }'
  )" | tee /tmp/workload.json

When following this tutorial, consider the following:

Artifact vs. runtime. The artifact's spec defines the container topology (image, port, entrypoint, env vars, probes), which is anything that travels with the artifact across deployments. Replica count, CPU/memory, autoscaling, and resource bundles are deployment-time concerns and live in runtime.containerGroups[]. Entries are looked up by group name (default group is "default"), and per-container allocations by container name. Both objects need names when you're customizing resources.
OTel logs. The artifact image supports OTel logs. To enable them, add OTEL_EXPORTER_OTLP_ENDPOINT to environmentVars. The container validates the required environment variables at startup and exits with an explicit error if any are missing; the /readyz probe surfaces startup errors rather than masking them.
readinessProbe.path is the gate to running. The platform polls that path on the container's port and only transitions the Workload to running once it returns 2xx. This tutorial points the probe at /healthz, which returns 2xx as soon as the FastAPI process is up. The container also exposes a /readyz endpoint that exercises the LLM connection, but keep the readiness probe pointed at /healthz. Gating running on an external dependency causes the Workload's status to flap whenever that dependency has issues. Keep the deep checks reachable as explicit endpoints your monitoring and runbooks can hit; don't make them block startup.
API token in env vars. The container authenticates against the DataRobot LLM Gateway on every request using DATAROBOT_API_TOKEN. Baking a literal token into the artifact is fine for staging exploration; for production, switch to a dr-credential-sourced environment variable so the token isn't stored in the artifact spec.

In [ ]:

Copied!





payload = {
    "name": f"agent-service-{int(time.time())}",
    "importance": "high",
    "artifact": {
        "name": f"agent-service-artifact-{int(time.time())}",
        "type": "service",
        "spec": {
            "containerGroups": [{
                "name": "default",
                "containers": [{
                    "name": "agent",
                    "imageUri": "otkachnlp/fastapi-server-example:latest",
                    "port": 8080,
                    "primary": True,
                    "entrypoint": ["python", "server.py"],
                    "environmentVars": [
                        {"name": "MODEL", "value": MODEL},
                        {"name": "DATAROBOT_ENDPOINT", "value": API_BASE},
                        {"name": "DATAROBOT_API_TOKEN", "value": client.token},
                    ],
                    "readinessProbe": {"path": "/healthz", "port": 8080},
                }]
            }]
        },
    },
    "runtime": {
        "containerGroups": [{
            "name": "default",
            "replicaCount": 1,
            "containers": [{
                "name": "agent",
                "resourceAllocation": {"cpu": 1, "memory": "512MB"},
            }],
        }]
    },
}

resp = requests.post(f"{API_BASE}/workloads", headers=HEADERS, json=payload, timeout=120)
check(resp)
body = resp.json()
workload_id = body["id"]
artifact_id = body.get("artifactId") or body.get("artifact", {}).get("id")
print("Workload ID:", workload_id)
print("Artifact ID:", artifact_id)
payload = {
    "name": f"agent-service-{int(time.time())}",
    "importance": "high",
    "artifact": {
        "name": f"agent-service-artifact-{int(time.time())}",
        "type": "service",
        "spec": {
            "containerGroups": [{
                "name": "default",
                "containers": [{
                    "name": "agent",
                    "imageUri": "otkachnlp/fastapi-server-example:latest",
                    "port": 8080,
                    "primary": True,
                    "entrypoint": ["python", "server.py"],
                    "environmentVars": [
                        {"name": "MODEL", "value": MODEL},
                        {"name": "DATAROBOT_ENDPOINT", "value": API_BASE},
                        {"name": "DATAROBOT_API_TOKEN", "value": client.token},
                    ],
                    "readinessProbe": {"path": "/healthz", "port": 8080},
                }]
            }]
        },
    },
    "runtime": {
        "containerGroups": [{
            "name": "default",
            "replicaCount": 1,
            "containers": [{
                "name": "agent",
                "resourceAllocation": {"cpu": 1, "memory": "512MB"},
            }],
        }]
    },
}

resp = requests.post(f"{API_BASE}/workloads", headers=HEADERS, json=payload, timeout=120)
check(resp)
body = resp.json()
workload_id = body["id"]
artifact_id = body.get("artifactId") or body.get("artifact", {}).get("id")
print("Workload ID:", workload_id)
print("Artifact ID:", artifact_id)

Lock the artifact¶

Transition the artifact from draft to locked. Because this Workload is the only one backing the draft artifact, the Workload's lifecycle transitions to locked alongside it.

curl -X PATCH "${DATAROBOT_ENDPOINT}/artifacts/${ARTIFACT_ID}" \
  -H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{"status": "locked"}'

PATCH /artifacts/{artifact_id} accepts name, description, spec, and status. Locking is one-way: locked artifacts cannot return to draft.

In [ ]:

Copied!





resp = requests.patch(
    f"{API_BASE}/artifacts/{artifact_id}",
    headers=HEADERS,
    json={"status": "locked"},
    timeout=120,
)
check(resp)
print("Artifact status:", resp.json().get("status"))
resp = requests.patch(
    f"{API_BASE}/artifacts/{artifact_id}",
    headers=HEADERS,
    json={"status": "locked"},
    timeout=120,
)
check(resp)
print("Artifact status:", resp.json().get("status"))

Wait for running¶

Poll the Workload's .status until it reaches running. Expected happy-path progression: submitted → provisioning → launching → running. The status may briefly read errored during startup; keep polling.

curl -s "${DATAROBOT_ENDPOINT}/workloads/${WORKLOAD_ID}" \
  -H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" | jq -r '.status'

In [ ]:

Copied!





deadline = time.time() + 600
while time.time() < deadline:
    r = requests.get(f"{API_BASE}/workloads/{workload_id}", headers=HEADERS, timeout=60)
    check(r)
    status = r.json().get("status")
    print("status:", status)
    if status == "running":
        break
    time.sleep(5)
else:
    raise TimeoutError("Timed out waiting for running")
deadline = time.time() + 600
while time.time() < deadline:
    r = requests.get(f"{API_BASE}/workloads/{workload_id}", headers=HEADERS, timeout=60)
    check(r)
    status = r.json().get("status")
    print("status:", status)
    if status == "running":
        break
    time.sleep(5)
else:
    raise TimeoutError("Timed out waiting for running")

Invoke the service¶

Read the invoke URL from the Workload's .endpoint field, then call your application routes against it.

ENDPOINT=$(curl -s "${DATAROBOT_ENDPOINT}/workloads/${WORKLOAD_ID}" \
  -H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" | jq -r '.endpoint')

curl -X POST "${ENDPOINT}/chat/completions" \
  -H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{"model": "'"${MODEL}"'", "messages": [{"role": "user", "content": "Hello!"}]}'

In [ ]:

Copied!





r = requests.get(f"{API_BASE}/workloads/{workload_id}", headers=HEADERS, timeout=60)
check(r)
invoke_url = r.json()["endpoint"]
if invoke_url.startswith("http://"):
    invoke_url = "https://" + invoke_url.removeprefix("http://")
print("Invoke URL:", invoke_url)

resp = requests.post(
    invoke_url.rstrip("/") + "/chat/completions",
    headers={
        "Authorization": f"Bearer {client.token}",
        "Content-Type": "application/json",
    },
    json={"model": MODEL, "messages": [{"role": "user", "content": "Hello!"}]},
    timeout=120,
)
check(resp)
print(json.dumps(resp.json(), indent=2)[:2000])
r = requests.get(f"{API_BASE}/workloads/{workload_id}", headers=HEADERS, timeout=60)
check(r)
invoke_url = r.json()["endpoint"]
if invoke_url.startswith("http://"):
    invoke_url = "https://" + invoke_url.removeprefix("http://")
print("Invoke URL:", invoke_url)

resp = requests.post(
    invoke_url.rstrip("/") + "/chat/completions",
    headers={
        "Authorization": f"Bearer {client.token}",
        "Content-Type": "application/json",
    },
    json={"model": MODEL, "messages": [{"role": "user", "content": "Hello!"}]},
    timeout=120,
)
check(resp)
print(json.dumps(resp.json(), indent=2)[:2000])

Govern the Workload¶

Now that it's a production Workload, wire up importance, sharing, and access controls.

Raise importance to critical:

curl -X PATCH "${DATAROBOT_ENDPOINT}/workloads/${WORKLOAD_ID}" \
  -H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{"importance": "critical"}'

Share with another user, group, or organization:

curl -X PATCH "${DATAROBOT_ENDPOINT}/workloads/${WORKLOAD_ID}/sharedRoles" \
  -H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{
    "operation": "updateRoles",
    "roles": [{"id": "<recipient-id>", "role": "USER", "shareRecipientType": "user"}]
  }'

PATCH /workloads/{id} accepts name, description, and importance. For runtime changes (replicas, resources) use PATCH /workloads/{id}/settings, which triggers a rolling replacement.

In [ ]:

Copied!





resp = requests.patch(
    f"{API_BASE}/workloads/{workload_id}",
    headers=HEADERS,
    json={"importance": "critical"},
    timeout=60,
)
check(resp)
print("Importance:", resp.json().get("importance"))

if not RECIPIENT_USER_ID:
    print("RECIPIENT_USER_ID is blank; skipping share.")
else:
    share = requests.patch(
        f"{API_BASE}/workloads/{workload_id}/sharedRoles",
        headers=HEADERS,
        json={
            "operation": "updateRoles",
            "roles": [{"id": RECIPIENT_USER_ID, "role": "USER", "shareRecipientType": "user"}],
        },
        timeout=60,
    )
    check(share)
    print("Share status:", share.status_code)
resp = requests.patch(
    f"{API_BASE}/workloads/{workload_id}",
    headers=HEADERS,
    json={"importance": "critical"},
    timeout=60,
)
check(resp)
print("Importance:", resp.json().get("importance"))

if not RECIPIENT_USER_ID:
    print("RECIPIENT_USER_ID is blank; skipping share.")
else:
    share = requests.patch(
        f"{API_BASE}/workloads/{workload_id}/sharedRoles",
        headers=HEADERS,
        json={
            "operation": "updateRoles",
            "roles": [{"id": RECIPIENT_USER_ID, "role": "USER", "shareRecipientType": "user"}],
        },
        timeout=60,
    )
    check(share)
    print("Share status:", share.status_code)

The server also contains an agent with a tool call to Arxiv. To test the agent use the /agent route:

In [ ]:

Copied!





curl -X POST "${ENDPOINT}/agent" \
  -H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{"query": "Find latest research on solar power"}'
curl -X POST "${ENDPOINT}/agent" \
  -H "Authorization: Bearer ${DATAROBOT_API_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{"query": "Find latest research on solar power"}'

Observe¶

Locked Workloads expose the full monitoring surface with 30-day retention.

Capability	Endpoint
Service health, latency, error rate	`GET /workloads/{id}` (computed fields on the Workload)
Lifecycle events (audit trail)	`GET /workloads/{id}/events`
Aggregate request statistics	`GET /workloads/{id}/stats`
Per-metric time series	`GET /workloads/{id}/stats/{metric_name}`
Per-replica status	`GET /workloads/{id}/protons/{proton_id}/statusDetails`

curl -s "${DATAROBOT_ENDPOINT}/workloads/${WORKLOAD_ID}/events" \
  -H "Authorization: Bearer ${DATAROBOT_API_TOKEN}"

curl -s "${DATAROBOT_ENDPOINT}/workloads/${WORKLOAD_ID}/stats" \
  -H "Authorization: Bearer ${DATAROBOT_API_TOKEN}"

In [ ]:

Copied!





ev = requests.get(f"{API_BASE}/workloads/{workload_id}/events", headers=HEADERS, timeout=60)
check(ev)
print("Events:", json.dumps(ev.json(), indent=2)[:2000])

stats = requests.get(f"{API_BASE}/workloads/{workload_id}/stats", headers=HEADERS, timeout=60)
check(stats)
print("\nStats:", json.dumps(stats.json(), indent=2)[:2000])
ev = requests.get(f"{API_BASE}/workloads/{workload_id}/events", headers=HEADERS, timeout=60)
check(ev)
print("Events:", json.dumps(ev.json(), indent=2)[:2000])

stats = requests.get(f"{API_BASE}/workloads/{workload_id}/stats", headers=HEADERS, timeout=60)
check(stats)
print("\nStats:", json.dumps(stats.json(), indent=2)[:2000])