Skip to content

Get started: Workload API

プレビュー

Workload APIはプレビュー機能で、デフォルトではオンになっています。

機能フラグ:Workload APIのエクスペリメント機能へのアクセスを有効にする

Run any AI Workload, get a production-grade endpoint. Build anything that listens on HTTP.

The Workload API runs containerized AI services on DataRobot. Bring a container image to get a stable URL with autoscaling, monitoring, sharing, importance-based prioritization, and an API-driven end-to-end lifecycle—from early development to production rollouts.

What you can build

The Workload API is intentionally container-shaped—anything that listens on HTTP is a viable Workload.

Pattern
Agent services LangGraph, CrewAI, AutoGen, and Google ADK orchestrations on top of any LLM, or your own framework.
Model inference servers NVIDIA NIM, vLLM, TGI, and custom GPU servers with autoscaling and bundle-based GPU selection.
RAG pipelines Retrievers, rerankers, and embedding services with shared backends.
MCP servers Model Context Protocol servers that expose tools to agents.
ベクターデータベース Qdrant, Weaviate, Milvus—anything with an HTTP API.
Frontends and dashboards Streamlit, Gradio, FastAPI apps, and custom UIs.

New here?

Deploy your first Workload in five minutes with Tutorial: Hello, Workload—a runnable notebook with paired curl and Python cells: create a draft Workload, wait for running, and invoke the endpoint.

The model in 30 seconds

The Workload API consists of three objects, flowing in one direction:

flowchart LR
    A["<b>Artifact</b><br/><i>What to run</i>"] --> W["<b>Workload</b><br/><i>Governed identity</i>"]
    W --> P["<b>Protons / Pods</b><br/><i>Runtime backing / Kubernetes execution</i>"] 
エンティティ 役割 What it carries
アーティファクト What to run A container specification (image, ports, entrypoint, environment, resources, probes) and its lifecycle (draft or locked).
ワークロード Governed identity The identity you hand to consumers: a stable invoke endpoint plus sharing settings, importance, monitoring, and a lifetime policy that moves from ephemeral draft (8-hour TTL) to persistent production on promote.
Proton Runtime backing The running container instances (and their pods) that execute the artifact behind the Workload URL.

The artifact describes what. The Workload is the governed identity you hand to consumers. Protons are the execution layer—you'll work at the Workload level most of the time and only drop down to protons when you're inspecting status, debugging a failure, or validating a candidate during a replacement.

Lifecycle state flows up: from pods to protons to the Workload. A Workload always has at least one proton; during a replacement it can temporarily have two (active and candidate), and high-availability Workloads can have multiple protons permanently. A replacement adds a candidate proton alongside the active one and shifts traffic between them; Workload identity stays put. A promotion flips the artifact from draft to locked and the Workload from ephemeral (8-hour TTL) to persistent—identity preserved.

Pick your path

Goal Go here
Deploy your first container and see traffic flow end-to-end Tutorial: Hello, Workload
Take a service to production with sharing, importance, and monitoring Tutorial: Deploy a production-ready container
See what your code is doing inside each request—traces, metrics, and logs Instrument a Workload with OpenTelemetry (Python)
Ship a new container version without dropping the endpoint Tutorial: Replace the artifact behind a running Workload

Two ways to create a Workload

POST /workloads/ accepts either an inline artifact spec (artifact) or a reference to an existing artifact (artifactId). Exactly one of the two is required.

モード 目的
Inline (artifact: { ... }) Quick experiments and hello-world Workloads. One request creates the artifact and the Workload together; the artifact is created in draft status.
By reference (artifactId: "...") The artifact is governed separately—created via POST /artifacts, locked from a previous draft, or used as a shared versioned baseline. Required when the same locked artifact serves more than one Workload (for example, multi-region or A/B deployments).

Core concepts at a glance

Use this section as a compass. Each row points to deeper pages; details live in the linked sections.

トピック 説明 リファレンス
アーティファクト Container spec plus draft/locked lifecycle, configuration layering, and repositories for version history. Artifact concepts, REST: artifacts.
ワークロード Stable identity, invoke URL, importance, sharing, monitoring, and lifetime rules tied to artifact status. Workload concepts, Choose draft vs. locked.
Runtime execution Pods back protons; Workload status aggregates pod and proton state (including worst-state-wins for replicas). Lifecycle states, Replace and roll out.
Day zero to production Draft Workloads for iteration; lock (or promote) for indefinite lifetime; rolling replacement for zero-downtime upgrades. Hello, Workload, Production-ready tutorial, Promote.
Observability OTel logs, metrics, traces; Workload stats, events, and history; retention differs for draft vs. locked. Monitoring concepts, Health and readiness.

Design principles

These principles explain why the API is shaped the way it is—useful when you're deciding where a piece of configuration belongs or whether a workflow fits the model.

Principle 説明
Separation of concerns Artifacts define what the container is (image, entrypoint, probes, resource requests, baseline env). Workloads define the governed identity and per-deployment runtime (importance, sharing, autoscaling, runtime parameter overrides). The same artifact can back many Workloads across environments.
Immutability for production Locked artifacts are immutable and versioned within an artifact repository. The same locked artifact can back staging, production, and per-region Workloads with confidence that the binary hasn't changed underneath them.
Progressive governance Draft artifacts and draft Workloads exist for fast iteration: 8-hour TTL, no required importance. Governance applies when you lock—either directly with PATCH /artifacts/{id} or in place via POST /workloads/{id}/promote.
Infrastructure abstraction Workload status is computed from pod predicates by the platform, not surfaced from any specific underlying operator. The API surface is uniform across Workload types so the same lifecycle semantics apply whether your Workload runs as a Kubernetes Deployment, a NIM custom resource, or another execution shape.

次のステップ

Next, learn about Workload API endpoints and run the hello-world tutorial to deploy a real container in five minutes.

リソース 説明
Hello, Workload Deploy whoami as a draft Workload, following the shortest path from zero to a running container on DataRobot.
API quick reference Authentication, endpoint groups, and links into generated API documentation.
Best practices and troubleshooting Container design, production hardening, security, and recovery steps for common failures.