Configure LLM provider fallback¶
You can configure a primary LLM provider with one or more fallback providers for automatic failover. When the primary is unavailable or returns an error, a litellm.Router automatically retries with the next fallback provider in the list. The num_retries option controls how many retries occur per provider before moving to the next. This works with any DataRobot-supported LLM provider, including the LLM gateway, hosted deployments, NIM deployments, and external APIs.
Prerequisites¶
datarobot-genai>=0.15.20must be available in the execution environment. See Add Python packages for instructions.- A working agent template (CrewAI, LangGraph, LlamaIndex, or DRAgent/NAT).
- At least two LLM providers or models configured (one primary, one or more fallbacks).
Configure fallback in code¶
For DRUM-based templates (CrewAI, LangGraph, LlamaIndex), replace get_llm() with get_router_llm() in your myagent.py file.
LLMConfig fields¶
The primary LLM and each fallback (in fallbacks) is defined as an LLMConfig object. Set only the fields relevant to your provider type (see Mixing provider types for a multi-fallback example):
| Field | Type | Description |
|---|---|---|
use_datarobot_llm_gateway |
bool |
Use the DataRobot LLM gateway as the provider. |
llm_default_model |
str |
The model identifier (e.g., azure/gpt-4o-mini). |
llm_deployment_id |
str |
DataRobot deployment ID for hosted LLM deployments. |
nim_deployment_id |
str |
DataRobot NIM deployment ID. |
datarobot_endpoint |
str |
DataRobot API endpoint URL. |
datarobot_api_token |
str |
DataRobot API token. |
Framework examples¶
The LLM fallback system follows a similar pattern for each DRUM-based template:
from datarobot_genai.core.config import LLMConfig
from datarobot_genai.crewai.llm import get_router_llm
primary = LLMConfig(
use_datarobot_llm_gateway=True,
llm_default_model="{LLM_DEFAULT_MODEL}",
)
fallbacks = [
LLMConfig(
use_datarobot_llm_gateway=True,
llm_default_model="anthropic/claude-opus-4-20250514",
)
]
llm = get_router_llm(primary, fallbacks, {"num_retries": 1})
from datarobot_genai.core.config import LLMConfig
from datarobot_genai.langgraph.llm import get_router_llm
primary = LLMConfig(
use_datarobot_llm_gateway=True,
llm_default_model="{LLM_DEFAULT_MODEL}",
)
fallbacks = [
LLMConfig(
use_datarobot_llm_gateway=True,
llm_default_model="anthropic/claude-opus-4-20250514",
)
]
llm = get_router_llm(primary, fallbacks, {"num_retries": 1})
from datarobot_genai.core.config import LLMConfig
from datarobot_genai.llamaindex.llm import get_router_llm
primary = LLMConfig(
use_datarobot_llm_gateway=True,
llm_default_model="{LLM_DEFAULT_MODEL}",
)
fallbacks = [
LLMConfig(
use_datarobot_llm_gateway=True,
llm_default_model="anthropic/claude-opus-4-20250514",
)
]
llm = get_router_llm(primary, fallbacks, {"num_retries": 1})
Multiple fallbacks
You can specify multiple fallback providers in the fallbacks list. The router tries them in order if the primary fails.
Configure fallback in workflow.yaml¶
For DRAgent/NAT templates, use _type: datarobot-llm-router with primary and fallbacks blocks in workflow.yaml:
llms:
datarobot_llm:
_type: datarobot-llm-router
primary:
use_datarobot_llm_gateway: true
llm_default_model: "{LLM_DEFAULT_MODEL}"
fallbacks:
- use_datarobot_llm_gateway: true
llm_default_model: anthropic/claude-opus-4-20250514
num_retries: 1
LLMConfig fields in YAML
The primary and each item in fallbacks accept the same fields as LLMConfig: use_datarobot_llm_gateway, llm_default_model, llm_deployment_id, nim_deployment_id, datarobot_endpoint, and datarobot_api_token.
Mixing provider types¶
The primary and fallback providers can use different provider types. For example, you can use the LLM gateway as primary and a deployment as fallback:
primary = LLMConfig(
use_datarobot_llm_gateway=True,
llm_default_model="azure/gpt-4o-mini",
)
fallbacks = [
LLMConfig(
llm_deployment_id="YOUR_DEPLOYMENT_ID",
),
LLMConfig(
use_datarobot_llm_gateway=True,
llm_default_model="anthropic/claude-opus-4-20250514",
),
]
llm = get_router_llm(primary, fallbacks, {"num_retries": 1})
llms:
datarobot_llm:
_type: datarobot-llm-router
primary:
use_datarobot_llm_gateway: true
llm_default_model: azure/gpt-4o-mini
fallbacks:
- llm_deployment_id: YOUR_DEPLOYMENT_ID
- use_datarobot_llm_gateway: true
llm_default_model: anthropic/claude-opus-4-20250514
num_retries: 1
Retry and latency
Each retry adds latency to the response. Set num_retries conservatively (e.g., 1) to balance reliability and response time.