Get started: Workload API¶
プレビュー
Workload APIはプレビュー機能で、デフォルトではオンになっています。
機能フラグ:Workload APIのエクスペリメント機能へのアクセスを有効にする
Run any AI Workload, get a production-grade endpoint. Build anything that listens on HTTP.
The Workload API runs containerized AI services on DataRobot. Bring a container image to get a stable URL with autoscaling, monitoring, sharing, importance-based prioritization, and an API-driven end-to-end lifecycle—from early development to production rollouts.
What you can build¶
The Workload API is intentionally container-shaped—anything that listens on HTTP is a viable Workload.
| Pattern | 例 |
|---|---|
| Agent services | LangGraph, CrewAI, AutoGen, and Google ADK orchestrations on top of any LLM, or your own framework. |
| Model inference servers | NVIDIA NIM, vLLM, TGI, and custom GPU servers with autoscaling and bundle-based GPU selection. |
| RAG pipelines | Retrievers, rerankers, and embedding services with shared backends. |
| MCP servers | Model Context Protocol servers that expose tools to agents. |
| ベクターデータベース | Qdrant, Weaviate, Milvus—anything with an HTTP API. |
| Frontends and dashboards | Streamlit, Gradio, FastAPI apps, and custom UIs. |
New here?
Deploy your first Workload in five minutes with Tutorial: Hello, Workload—a runnable notebook with paired curl and Python cells: create a draft Workload, wait for running, and invoke the endpoint.
The model in 30 seconds¶
The Workload API consists of three objects, flowing in one direction:
flowchart LR
A["<b>Artifact</b><br/><i>What to run</i>"] --> W["<b>Workload</b><br/><i>Governed identity</i>"]
W --> P["<b>Protons / Pods</b><br/><i>Runtime backing / Kubernetes execution</i>"]
| エンティティ | 役割 | What it carries |
|---|---|---|
| アーティファクト | What to run | A container specification (image, ports, entrypoint, environment, resources, probes) and its lifecycle (draft or locked). |
| ワークロード | Governed identity | The identity you hand to consumers: a stable invoke endpoint plus sharing settings, importance, monitoring, and a lifetime policy that moves from ephemeral draft (8-hour TTL) to persistent production on promote. |
| Proton | Runtime backing | The running container instances (and their pods) that execute the artifact behind the Workload URL. |
The artifact describes what. The Workload is the governed identity you hand to consumers. Protons are the execution layer—you'll work at the Workload level most of the time and only drop down to protons when you're inspecting status, debugging a failure, or validating a candidate during a replacement.
Lifecycle state flows up: from pods to protons to the Workload. A Workload always has at least one proton; during a replacement it can temporarily have two (active and candidate), and high-availability Workloads can have multiple protons permanently. A replacement adds a candidate proton alongside the active one and shifts traffic between them; Workload identity stays put. A promotion flips the artifact from draft to locked and the Workload from ephemeral (8-hour TTL) to persistent—identity preserved.
Pick your path¶
| Goal | Go here |
|---|---|
| Deploy your first container and see traffic flow end-to-end | Tutorial: Hello, Workload |
| Take a service to production with sharing, importance, and monitoring | Tutorial: Deploy a production-ready container |
| See what your code is doing inside each request—traces, metrics, and logs | Instrument a Workload with OpenTelemetry (Python) |
| Ship a new container version without dropping the endpoint | Tutorial: Replace the artifact behind a running Workload |
Two ways to create a Workload¶
POST /workloads/ accepts either an inline artifact spec (artifact) or a reference to an existing artifact (artifactId). Exactly one of the two is required.
| モード | 目的 |
|---|---|
Inline (artifact: { ... }) |
Quick experiments and hello-world Workloads. One request creates the artifact and the Workload together; the artifact is created in draft status. |
By reference (artifactId: "...") |
The artifact is governed separately—created via POST /artifacts, locked from a previous draft, or used as a shared versioned baseline. Required when the same locked artifact serves more than one Workload (for example, multi-region or A/B deployments). |
Core concepts at a glance¶
Use this section as a compass. Each row points to deeper pages; details live in the linked sections.
| トピック | 説明 | リファレンス |
|---|---|---|
| アーティファクト | Container spec plus draft/locked lifecycle, configuration layering, and repositories for version history. | Artifact concepts, REST: artifacts. |
| ワークロード | Stable identity, invoke URL, importance, sharing, monitoring, and lifetime rules tied to artifact status. | Workload concepts, Choose draft vs. locked. |
| Runtime execution | Pods back protons; Workload status aggregates pod and proton state (including worst-state-wins for replicas). | Lifecycle states, Replace and roll out. |
| Day zero to production | Draft Workloads for iteration; lock (or promote) for indefinite lifetime; rolling replacement for zero-downtime upgrades. | Hello, Workload, Production-ready tutorial, Promote. |
| Observability | OTel logs, metrics, traces; Workload stats, events, and history; retention differs for draft vs. locked. | Monitoring concepts, Health and readiness. |
Design principles¶
These principles explain why the API is shaped the way it is—useful when you're deciding where a piece of configuration belongs or whether a workflow fits the model.
| Principle | 説明 |
|---|---|
| Separation of concerns | Artifacts define what the container is (image, entrypoint, probes, resource requests, baseline env). Workloads define the governed identity and per-deployment runtime (importance, sharing, autoscaling, runtime parameter overrides). The same artifact can back many Workloads across environments. |
| Immutability for production | Locked artifacts are immutable and versioned within an artifact repository. The same locked artifact can back staging, production, and per-region Workloads with confidence that the binary hasn't changed underneath them. |
| Progressive governance | Draft artifacts and draft Workloads exist for fast iteration: 8-hour TTL, no required importance. Governance applies when you lock—either directly with PATCH /artifacts/{id} or in place via POST /workloads/{id}/promote. |
| Infrastructure abstraction | Workload status is computed from pod predicates by the platform, not surfaced from any specific underlying operator. The API surface is uniform across Workload types so the same lifecycle semantics apply whether your Workload runs as a Kubernetes Deployment, a NIM custom resource, or another execution shape. |
次のステップ¶
Next, learn about Workload API endpoints and run the hello-world tutorial to deploy a real container in five minutes.
| リソース | 説明 |
|---|---|
| Hello, Workload | Deploy whoami as a draft Workload, following the shortest path from zero to a running container on DataRobot. |
| API quick reference | Authentication, endpoint groups, and links into generated API documentation. |
| Best practices and troubleshooting | Container design, production hardening, security, and recovery steps for common failures. |