Skip to content

Configure LLM provider fallback

You can configure a primary LLM provider with one or more fallback providers for automatic failover. When the primary is unavailable or returns an error, a litellm.Router automatically retries with the next fallback provider in the list. The num_retries option controls how many retries occur per provider before moving to the next. This works with any DataRobot-supported LLM provider, including the LLM gateway, hosted deployments, NIM deployments, and external APIs.

前提条件

  • datarobot-genai>=0.15.20 must be available in the execution environment. See Add Python packages for instructions.
  • A working agent template (CrewAI, LangGraph, LlamaIndex, or DRAgent/NAT).
  • At least two LLM providers or models configured (one primary, one or more fallbacks).

Configure fallback in code

For DRUM-based templates (CrewAI, LangGraph, LlamaIndex), replace get_llm() with get_router_llm() in your myagent.py file.

LLMConfig fields

The primary LLM and each fallback (in fallbacks) is defined as an LLMConfig object. Set only the fields relevant to your provider type (see Mixing provider types for a multi-fallback example):

フィールド タイプ 説明
use_datarobot_llm_gateway bool Use the DataRobot LLM gateway as the provider.
llm_default_model str The model identifier (e.g., azure/gpt-4o-mini).
llm_deployment_id str DataRobot deployment ID for hosted LLM deployments.
nim_deployment_id str DataRobot NIM deployment ID.
datarobot_endpoint str DataRobot API endpoint URL.
datarobot_api_token str DataRobot API token.

Framework examples

The LLM fallback system follows a similar pattern for each DRUM-based template:

from datarobot_genai.core.config import LLMConfig
from datarobot_genai.crewai.llm import get_router_llm

primary = LLMConfig(
    use_datarobot_llm_gateway=True,
    llm_default_model="{LLM_DEFAULT_MODEL}",
)
fallbacks = [
    LLMConfig(
        use_datarobot_llm_gateway=True,
        llm_default_model="anthropic/claude-opus-4-20250514",
    )
]

llm = get_router_llm(primary, fallbacks, {"num_retries": 1}) 
from datarobot_genai.core.config import LLMConfig
from datarobot_genai.langgraph.llm import get_router_llm

primary = LLMConfig(
    use_datarobot_llm_gateway=True,
    llm_default_model="{LLM_DEFAULT_MODEL}",
)
fallbacks = [
    LLMConfig(
        use_datarobot_llm_gateway=True,
        llm_default_model="anthropic/claude-opus-4-20250514",
    )
]

llm = get_router_llm(primary, fallbacks, {"num_retries": 1}) 
from datarobot_genai.core.config import LLMConfig
from datarobot_genai.llamaindex.llm import get_router_llm

primary = LLMConfig(
    use_datarobot_llm_gateway=True,
    llm_default_model="{LLM_DEFAULT_MODEL}",
)
fallbacks = [
    LLMConfig(
        use_datarobot_llm_gateway=True,
        llm_default_model="anthropic/claude-opus-4-20250514",
    )
]

llm = get_router_llm(primary, fallbacks, {"num_retries": 1}) 

Multiple fallbacks

You can specify multiple fallback providers in the fallbacks list. The router tries them in order if the primary fails.

Configure fallback in workflow.yaml

For DRAgent/NAT templates, use _type: datarobot-llm-router with primary and fallbacks blocks in workflow.yaml:

workflow.yaml
llms:
  datarobot_llm:
    _type: datarobot-llm-router
    primary:
      use_datarobot_llm_gateway: true
      llm_default_model: "{LLM_DEFAULT_MODEL}"
    fallbacks:
      - use_datarobot_llm_gateway: true
        llm_default_model: anthropic/claude-opus-4-20250514
    num_retries: 1 

LLMConfig fields in YAML

The primary and each item in fallbacks accept the same fields as LLMConfig: use_datarobot_llm_gateway, llm_default_model, llm_deployment_id, nim_deployment_id, datarobot_endpoint, and datarobot_api_token.

Mixing provider types

The primary and fallback providers can use different provider types. For example, you can use the LLM gateway as primary and a deployment as fallback:

primary = LLMConfig(
    use_datarobot_llm_gateway=True,
    llm_default_model="azure/gpt-4o-mini",
)
fallbacks = [
    LLMConfig(
        llm_deployment_id="YOUR_DEPLOYMENT_ID",
    ),
    LLMConfig(
        use_datarobot_llm_gateway=True,
        llm_default_model="anthropic/claude-opus-4-20250514",
    ),
]

llm = get_router_llm(primary, fallbacks, {"num_retries": 1}) 
llms:
  datarobot_llm:
    _type: datarobot-llm-router
    primary:
      use_datarobot_llm_gateway: true
      llm_default_model: azure/gpt-4o-mini
    fallbacks:
      - llm_deployment_id: YOUR_DEPLOYMENT_ID
      - use_datarobot_llm_gateway: true
        llm_default_model: anthropic/claude-opus-4-20250514
    num_retries: 1 

Retry and latency

Each retry adds latency to the response. Set num_retries conservatively (e.g., 1) to balance reliability and response time.