開発者向けドキュメント > コードファーストツール > DataRobot Moderations library > Moderations guardrails

Moderations guardrails¶

Guards evaluate prompts (prescore) and/or responses (postscore) and can block, report, or replace content based on configurable conditions.

File structure¶

The yaml file structure contains configuration and is later imported to the library.

timeout_sec: 10
timeout_action: score
nemo_evaluator_deployment_id: "<your-nemo-evaluator-id>"

guards:
  - name: My Guard
    type: ootb
    stage: prompt
    # ...

Top-level options¶

フィールド	タイプ	デフォルト	説明
`timeout_sec`	整数	`10`	Seconds to wait per guard
`timeout_action`	文字列	`score`	`score` (allow) or `block` on timeout
`nemo_evaluator_deployment_id`	文字列	—	DataRobot deployment ID of the NeMo Evaluator microservice; required when any guard uses `type: nemo_evaluator`
`enable_deepeval_telemetry`	ブール	`false`	Opt in to deepeval usage telemetry and local `.deepeval/` artifacts. See Environment variables.
`prompt_column_name`	文字列	`"promptText"`	Name of the DataFrame column that holds the input text. Used in standalone Python when no DRUM deployment is active. Ignored when a DRUM deployment context is active.
`response_column_name`	文字列	`"completion"`	Name of the DataFrame column that holds the LLM response text. Used in standalone Python as a fallback when `TARGET_NAME` is not set. Lower priority than `TARGET_NAME` — if both are provided, `TARGET_NAME` wins. Ignored when a DRUM deployment context is active.
`guards`	list	required	List of guard definitions

Common guard fields¶

フィールド	必須	説明
`name`	はい	Unique label; used as the key in `result.metrics` and as the DataRobot custom metric name
`type`	はい	`ootb` · `model` · `nemo_guardrails` · `nemo_evaluator`
`stage`	はい	`prompt` · `response` · `[prompt, response]` (list runs the guard at both stages)
`description`	いいえ	Free-text label, ignored by the library
`intervention`	いいえ	What to do when the condition fires (see Intervention block). Omit entirely to measure only — nothing is ever blocked
`copy_citations`	いいえ	Boolean (`true`/`false`, default `false`). Passes retrieved RAG context to this guard. Required for `rouge_1` and `faithfulness` to produce meaningful scores
`is_agentic`	いいえ	Marks an agentic-workflow guard (default `false`). Required by `agent_goal_accuracy`

# stage as a list — guard runs independently at both prompt and response stages
- name: Token Count Both
  type: ootb
  ootb_type: token_count
  stage: [prompt, response]
  intervention:
    action: block
    message: "Input or output exceeds the token limit."
    conditions:
      - comparator: greaterThan
        comparand: 100

Intervention block¶

intervention:
  action: block               # "block" | "report" | "replace"
  message: "Blocked."         # returned to caller
  send_notification: false
  conditions:
    - comparand: 0.5
      comparator: greaterThan

One condition per intervention

The conditions list accepts exactly one entry for block and replace; zero entries (conditions: []) is valid for report. To combine conditions (e.g. block if score < 0.2 or > 0.9), use two separate guards.

アクション¶

アクション	効果
`block`	Reject and return `message` to the caller. `message` is optional in the schema but omitting it returns an empty string — always set it.
`report`	Record the metric and allow content through unchanged. Behaviorally identical to omitting the `intervention` block entirely; useful when you want the metric tracked but never want to block.
`replace`	Swap the text with the sanitized version returned by the deployment. Only valid for `type: model` guards. The deployment must return the replacement text in the field specified by `model_info.replacement_text_column_name`; if that field is absent a `ValueError` is raised.

Comparators¶

Comparator	Comparand type	説明
`greaterThan` / `lessThan`	number	Numeric threshold
`equals` / `notEquals`	number \	文字列
`is` / `isNot`	ブーリアン	Boolean equality
`matches` / `doesNotMatch`	文字列のリスト	Class membership. `matches` fires if the prediction is in the list; `doesNotMatch` fires if it is not.
`contains` / `doesNotContain`	文字列のリスト	Substring check against a list. `contains` fires if all items in the list are found as substrings of the prediction; `doesNotContain` fires if not all items are found.

Guard types¶

Out-of-the-Box (`ootb`)¶

Set type: ootb and ootb_type. Install the required libraries for your use case:

pip install datarobot-moderations                          # base — token_count, rouge_1, cost, custom_metric
pip install 'datarobot-moderations[llm-eval]'              # + faithfulness, task_adherence, agent_guideline_adherence, agent_goal_accuracy
pip install 'datarobot-moderations[llm-eval,vertex]'       # + Google Vertex AI as LLM judge
pip install 'datarobot-moderations[llm-eval,bedrock]'      # + AWS Bedrock as LLM judge
pip install 'datarobot-moderations[llm-eval,nvidia]'       # + NVIDIA NIM as LLM judge
pip install 'datarobot-moderations[nemo]'                  # + NeMo Guardrails colang flow guard (type: nemo_guardrails)
pip install 'datarobot-moderations[nemo-evaluator]'        # + NeMo Evaluator microservice guard (type: nemo_evaluator)
pip install 'datarobot-moderations[datarobot-sdk]'         # required for type: model and llm_type: datarobot
pip install 'datarobot-moderations[all]'                   # everything

`ootb_type`	段階	Install extra	説明
`token_count`	prompt / response	(base)	トークン数
`rouge_1`	応答	(base)	ROUGE-1 overlap with citations
`faithfulness`	応答	`llm-eval`	LLM-judged hallucination detection
`task_adherence`	応答	`llm-eval`	Task-completion score
`agent_guideline_adherence`	応答	`llm-eval`	Guideline adherence
`agent_goal_accuracy`	応答	`llm-eval`	Agentic goal-accuracy
`cost`	応答	(base)	Estimated cost. Counts both prompt tokens (`input_price`/`input_unit`) and response tokens (`output_price`/`output_unit`). Must be at the response stage because both token counts are only available after the LLM responds. Currently only `currency: USD` is supported.
`custom_metric`	prompt / response	(base)	User-defined numeric metric

# Token count — report only
- name: Prompt Token Count
  type: ootb
  ootb_type: token_count
  stage: prompt

# Token count — block on length
- name: Response Token Count
  type: ootb
  ootb_type: token_count
  stage: response
  intervention:
    action: block
    message: "Response too long."
    conditions:
      - comparand: 1000
        comparator: greaterThan

# ROUGE-1 (requires citations)
- name: Rouge 1
  type: ootb
  ootb_type: rouge_1
  stage: response
  copy_citations: true
  intervention:
    action: report
    conditions: []

# Faithfulness
- name: Faithfulness
  type: ootb
  ootb_type: faithfulness
  stage: response
  copy_citations: true
  llm_type: datarobot
  deployment_id: "<your-llm-id>"   # 24-char DataRobot deployment ID
  intervention:
    action: block
    message: "Hallucination detected."
    conditions:
      - comparand: 0.0
        comparator: equals

# Task Adherence
- name: Task Adherence
  type: ootb
  ootb_type: task_adherence
  stage: response
  llm_type: datarobot
  deployment_id: "<your-llm-id>"
  intervention:
    action: block
    message: "LLM did not complete the requested task."
    conditions:
      - comparator: lessThan
        comparand: 0.5

# Guideline Adherence
- name: Guideline Adherence
  type: ootb
  ootb_type: agent_guideline_adherence
  stage: response
  llm_type: datarobot
  deployment_id: "<your-llm-id>"
  additional_guard_config:
    agent_guideline: "Response must be polite and on-topic."   # free-text criterion for the LLM judge
  intervention:
    action: block
    message: "Response violates guidelines."
    conditions:
      - comparand: 0.0
        comparator: equals

# Agent Goal Accuracy
- name: Agent Goal Accuracy
  type: ootb
  ootb_type: agent_goal_accuracy
  stage: response
  is_agentic: true
  llm_type: datarobot
  deployment_id: "<your-llm-id>"
  intervention:
    action: report
    conditions: []

# Cost tracking
- name: Cost
  type: ootb
  ootb_type: cost
  stage: response
  additional_guard_config:
    cost:
      currency: USD
      input_price: 0.01
      input_unit: 1000
      output_price: 0.03
      output_unit: 1000
  intervention:
    action: report
    conditions: []

Model guard¶

Wraps any DataRobot deployment you have already created (binary classifier, regression, multiclass, or text-generation). The library sends the text to that deployment and uses the prediction it returns to decide whether to block, report, or replace content.

# Binary classifier (e.g. toxicity, prompt injection)
# Works with any DataRobot binary classification deployment.
- name: Toxicity
  type: model
  stage: prompt
  deployment_id: "<your-deployment-id>"   # 24-char DataRobot deployment ID
  model_info:
    input_column_name: text               # field your deployment reads as input
    target_name: toxicity_toxic_PREDICTION  # prediction field returned by the deployment
    target_type: Binary        # Binary | Regression | Multiclass | TextGeneration
    class_names: []            # leave empty for Binary/Regression
  intervention:
    action: block
    message: "Toxic content blocked."
    conditions:
      - comparand: 0.5
        comparator: greaterThan

# PII detection with text replacement
# The deployment must return BOTH the score field (`target_name`)
# AND a sanitized-text field (`replacement_text_column_name`).
- name: PII Detector
  type: model
  stage: prompt
  deployment_id: "<your-pii-deployment-id>"
  model_info:
    input_column_name: text
    target_name: contains_pii_true_PREDICTION
    target_type: TextGeneration
    replacement_text_column_name: anonymized_text_OUTPUT
    class_names: []
  intervention:
    action: replace
    message: "PII removed from prompt."
    conditions:
      - comparand: 0.5
        comparator: greaterThan

# Multi-label / emotion classifier
- name: Emotion Classifier
  type: model
  stage: prompt
  deployment_id: "<your-emotion-deployment-id>"
  model_info:
    input_column_name: text
    target_name: target_PREDICTION
    target_type: TextGeneration
    class_names: [anger, fear, sadness, disgust, joy, neutral]
  intervention:
    action: block
    message: "Negative emotion detected."
    conditions:
      - comparand: [anger, fear, sadness, disgust]
        comparator: matches

NeMo Guardrails¶

Flow-based content filtering. Requires pip install 'datarobot-moderations[nemo]'. Supported llm_type values include openAi, azureOpenAi, nim, and llmGateway.

Colang flow files must live in stage-specific subdirectories of nemo_guardrails/:

nemo_guardrails/
  prompt/      # config.yml + *.co files for stage: prompt
  response/    # config.yml + *.co files for stage: response

- name: Stay on topic
  type: nemo_guardrails
  stage: prompt
  llm_type: azureOpenAi
  openai_api_base: "https://<resource>.openai.azure.com/"
  openai_deployment_id: gpt-4o-mini
  intervention:
    action: block
    message: "This topic is outside the allowed scope."
    conditions:
      - comparand: "TRUE"
        comparator: equals

NeMo Evaluator¶

Calls a DataRobot-hosted NeMo Evaluator microservice. Requires pip install 'datarobot-moderations[nemo-evaluator]'.

Two deployment IDs — what's the difference?

フィールド	What it points to
`nemo_evaluator_deployment_id` (top-level)	Your NeMo Evaluator microservice deployment in DataRobot
`deployment_id` (per-guard)	The LLM deployment the evaluator uses to do the judging

Both values must be valid 24-character DataRobot deployment IDs. Using a placeholder longer than 24 characters (e.g. "<your-nemo-evaluator-id>") causes a load-time validation error: String is longer than 24 characters.

llm_type must be datarobot for all nemo_evaluator guards.

`nemo_evaluator_type`	段階	説明
`llm_judge`	prompt / response	Custom LLM-as-judge with your own prompts. `score_parsing_regex` is a regular expression applied to the LLM's raw text reply to extract a single numeric score — e.g. `"([1-5])"` picks the first digit 1–5 from any surrounding text.
`context_relevance`	応答	Relevance of retrieved context to the question
`response_groundedness`	応答	Groundedness in retrieved context
`topic_adherence`	応答	Adherence to allowed topics
`response_relevancy`	応答	Relevance of response to question
`faithfulness`	応答	NeMo microservice faithfulness score
`agent_goal_accuracy`	応答	Agentic goal-accuracy via NeMo

nemo_evaluator_deployment_id: "<your-nemo-evaluator-id>"

guards:
  - name: Safety Judge
    type: nemo_evaluator
    stage: response
    nemo_evaluator_type: llm_judge
    llm_type: datarobot
    deployment_id: "<your-llm-id>"
    nemo_llm_judge_config:
      system_prompt: "Rate safety 1-5. Output ONLY the integer."
      user_prompt: "Response: {response}"
      score_parsing_regex: "([1-5])"   # regex to extract the numeric score from the LLM's text output
      custom_metric_directionality: higherIsBetter   # "higherIsBetter" | "lowerIsBetter"
    intervention:
      action: block
      message: "Response failed safety evaluation."
      conditions:
        - comparand: 2
          comparator: lessThan

  - name: Topic Adherence
    type: nemo_evaluator
    stage: response
    nemo_evaluator_type: topic_adherence
    llm_type: datarobot
    deployment_id: "<your-llm-id>"
    nemo_topic_adherence_config:
      metric_mode: f1          # "f1" | "precision" | "recall"
      reference_topics: [DataRobot, machine learning, AI platforms]
    intervention:
      action: report
      conditions: []

  - name: Response Relevancy
    type: nemo_evaluator
    stage: response
    nemo_evaluator_type: response_relevancy
    llm_type: datarobot
    deployment_id: "<your-llm-id>"
    nemo_response_relevancy_config:
      embedding_deployment_id: "<your-embedding-id>"
    intervention:
      action: report
      conditions: []

LLM back-end options¶

Some ootb guards (e.g. faithfulness, task_adherence) call an LLM to judge the text. You choose which LLM provider to use via llm_type.

DataRobot credentials (DATAROBOT_ENDPOINT + DATAROBOT_API_TOKEN) are always required

Supported `llm_type` values¶

`llm_type`	LLM provider	Extra YAML fields	Extra install
`datarobot`	DataRobot-hosted LLM deployment	`deployment_id`	`datarobot-sdk`
`openAi`	OpenAI API	（なし）	`llm-eval`
`azureOpenAi`	Azure OpenAI	`openai_api_base`, `openai_deployment_id`	`llm-eval`
`google`	Google Vertex AI	`google_region`, `google_model`	`llm-eval,vertex`
`amazon`	AWS Bedrock	`aws_region`, `aws_model`	`llm-eval,bedrock`
`nim`	NVIDIA NIM	`openai_api_base`	`llm-eval,nvidia`
`llmGateway`	DataRobot LLM Gateway	`llm_gateway_model_id`	`datarobot-sdk`

nemo_guardrails supports: openAi, azureOpenAi, nim, llmGateway only nemo_evaluator supports: datarobot only

Available models (Google / AWS)¶

The library maps a fixed set of model names to their provider API identifiers. Models not in this list are not supported.

プロバイダー	`llm_type`	`google_model` / `aws_model`
Google Vertex AI	`google`	`google-gemini-1.5-flash`, `google-gemini-1.5-pro`, `chat-bison`
AWS Bedrock	`amazon`	`amazon-titan`, `anthropic-claude-2`, `anthropic-claude-3-haiku`, `anthropic-claude-3-sonnet`, `anthropic-claude-3-opus`, `anthropic-claude-3.5-sonnet-v1`, `anthropic-claude-3.5-sonnet-v2`, `amazon-nova-lite`, `amazon-nova-micro`, `amazon-nova-pro`

Full annotated example¶

Replace every <...> placeholder with a real value before use.

DataRobot deployment IDs are exactly 24 hexadecimal characters.

timeout_sec: 15
timeout_action: score

guards:
  # -- Prescore (prompt) --------------------------------------------------

  - name: Prompt Injection
    type: model
    stage: prompt
    deployment_id: "<prompt-injection-id>"
    model_info:
      input_column_name: text
      target_name: injection_injection_PREDICTION
      target_type: Binary
      class_names: []
    intervention:
      action: block
      message: "Prompt injection attempt detected and blocked."
      conditions:
        - comparand: 0.80
          comparator: greaterThan

  - name: Toxicity
    type: model
    stage: prompt
    deployment_id: "<toxicity-id>"
    model_info:
      input_column_name: text
      target_name: toxicity_toxic_PREDICTION
      target_type: Binary
      class_names: []
    intervention:
      action: block
      message: "Toxic content is not allowed."
      conditions:
        - comparand: 0.5
          comparator: greaterThan

  - name: PII Detector
    type: model
    stage: prompt
    deployment_id: "<pii-id>"
    model_info:
      input_column_name: text
      target_name: contains_pii_true_PREDICTION
      target_type: TextGeneration
      replacement_text_column_name: anonymized_text_OUTPUT
      class_names: []
    intervention:
      action: replace
      message: "PII detected and removed."
      conditions:
        - comparand: 0.5
          comparator: greaterThan

  - name: Topic Guardrail
    type: nemo_guardrails
    stage: prompt
    llm_type: azureOpenAi
    openai_api_base: "https://<resource>.openai.azure.com/"
    openai_deployment_id: gpt-4o-mini
    intervention:
      action: block
      message: "This topic is outside the allowed scope."
      conditions:
        - comparand: "TRUE"
          comparator: equals

  # -- Postscore (response) -----------------------------------------------

  - name: Response Token Count
    type: ootb
    ootb_type: token_count
    stage: response

  - name: Faithfulness
    type: ootb
    ootb_type: faithfulness
    stage: response
    copy_citations: true
    llm_type: datarobot
    deployment_id: "<llm-id>"
    intervention:
      action: block
      message: "The response appears to be hallucinated."
      conditions:
        - comparand: 0.0
          comparator: equals

  - name: Task Adherence
    type: ootb
    ootb_type: task_adherence
    stage: response
    llm_type: datarobot
    deployment_id: "<llm-id>"
    intervention:
      action: block
      message: "LLM did not complete the requested task."
      conditions:
        - comparator: lessThan
          comparand: 0.5

  - name: Cost
    type: ootb
    ootb_type: cost
    stage: response
    additional_guard_config:
      cost:
        currency: USD
        input_price: 0.01
        input_unit: 1000
        output_price: 0.03
        output_unit: 1000
    intervention:
      action: report
      conditions: []

Using the config in Python¶

Guards can be configured from a YAML file, a plain Python dict, or a Pydantic object built entirely in Python. All approaches are fully equivalent — choose whichever fits your workflow.

From a YAML file¶

Return types¶

方法	Returns
`evaluate_prompt(prompt)`	`(EvaluationResult, latency_seconds, prescore_df)`
`evaluate_response(response, prompt=None)`	`(EvaluationResult, latency_seconds, postscore_df)`
`evaluate_full_pipeline(prompt, llm_callable)`	`(PipelineResult, prescore_df, postscore_df)` — `postscore_df` is `None` when the prompt was blocked; per-stage latency is not returned — use `evaluate_prompt` / `evaluate_response` directly when you need it
`evaluate_prompt_async(prompt)`	same as `evaluate_prompt` but non-blocking
`evaluate_response_async(response, prompt=None)`	same as `evaluate_response` but non-blocking
`evaluate_full_pipeline_async(prompt, llm_callable)`	same as `evaluate_full_pipeline` but non-blocking; `llm_callable` must be an `async` coroutine
`evaluate_full_pipeline_stream_async(prompt, llm_callable)`	`AsyncGenerator[ChatCompletionChunk, None]` — see Streaming pipeline
`stream_response_async(completion, *, prompt, prescore_df, prescore_latency)`	`AsyncGenerator[ChatCompletionChunk, None]` — lower-level; see Streaming pipeline

EvaluationResult.metrics holds the guard scores keyed by guard name.

`evaluate_prompt` / `evaluate_prompt_async` parameters¶

パラメーター	タイプ	必須	説明
`prompt`	`str`	はい	The user prompt text to evaluate against prescore guards

`evaluate_response` / `evaluate_response_async` parameters¶

パラメーター	タイプ	必須	説明
`response`	`str`	はい	The LLM response text to evaluate against postscore guards
`prompt`	`str \ \| None`	いいえ	The original user prompt. Required for guards that compare prompt and response (e.g. `faithfulness`, `task_adherence`, `rouge_1`). Omit only when no such guards are configured
`pipeline_interactions`	`str \ \| None`	いいえ	JSON-serialized `MultiTurnSample` dict from the DataRobot agentic pipeline. Enables `agent_goal_accuracy` to evaluate the full interaction trace instead of just the final response.

`evaluate_full_pipeline` / `evaluate_full_pipeline_async` parameters¶

パラメーター	タイプ	必須	説明
`prompt`	`str`	はい	The user prompt to evaluate
`llm_callable`	`Callable[[str], str]` (sync) or `Callable[[str], Awaitable[str]]` (async)	はい	Callable that receives the (possibly sanitized) effective prompt and returns the LLM response. For the async variant this must be an `async` coroutine

`EvaluationResult` fields¶

フィールド	タイプ	説明
`blocked`	`bool`	`True` if any guard blocked the text
`blocked_message`	`str \ \| None`	The block message configured on the guard
`replaced`	`bool`	`True` if a `replace`-action guard fired
`replacement`	`str \ \| None`	The sanitized replacement text (PII-scrubbed prompt, etc.)
`metrics`	`dict[str, Any]`	Guard scores keyed by guard name (e.g. `{"Toxicity": 0.87}`)

`PipelineResult` fields¶

フィールド	タイプ	説明
`prompt_evaluation`	`EvaluationResult`	Prescore evaluation result
`response`	`str \ \| None`	Final (possibly replaced) LLM response; `None` when blocked
`response_evaluation`	`EvaluationResult \ \| None`	Postscore evaluation result; `None` when prompt was blocked
`blocked` (computed)	`bool`	`True` if either stage was blocked
`replaced` (computed)	`bool`	`True` if either stage was replaced

What `prescore_df` contains¶

prescore_df is the raw pandas DataFrame produced by running all prescore (prompt-stage) guards on the input. It starts as a copy of the input and gains one set of columns per guard after execution.

列	説明
`{prompt_column_name}`	Original prompt text
`{guard.metric_column_name}`	Guard score (one column per guard, e.g. `Toxicity_toxicity_toxic_PREDICTION`)
`{guard_name}_latency`	Wall-clock seconds this guard took
`blocked_{prompt_col}`	`True` if any guard blocked the prompt
`blocked_message_{prompt_col}`	Block reason / message returned to the caller
`replaced_{prompt_col}`	`True` if a replace-action guard fired
`replaced_message_{prompt_col}`	Replacement text (sanitized prompt from PII guard, etc.)
`reported_{prompt_col}`	`True` when a report-action guard fired
`Noneed_{prompt_col}`	Internal sentinel for no-action guards
`action_{prompt_col}`	Comma-joined string of actions taken (e.g. `"block"`, `"report,block"`)
(per-guard enforced column)	Internal per-guard enforcement flag used by `format_result_df`

What `postscore_df` contains¶

postscore_df is the raw pandas DataFrame produced by running all postscore (response-stage) guards on the LLM output. It starts with the predictions DataFrame (which includes the LLM response plus any pass-through columns) and gains guard result columns after execution.

列	説明
`{response_column_name}`	LLM's response text
`{prompt_column_name}`	User prompt (forwarded for faithfulness / task-adherence calculation)
`CITATION_CONTENT_{N}`	Retrieved RAG context chunks (when citations are enabled)
`PROMPT_TOKEN_COUNT_from_usage`	Prompt token count (when `usage` is provided by the LLM)
`RESPONSE_TOKEN_COUNT_from_usage`	Response token count (when `usage` is provided by the LLM)
`agentic_pipeline_interactions`	Agentic workflow interaction trace (for `agent_goal_accuracy` / `task_adherence`)
`{association_id_column_name}`	Association ID (if the deployment has one configured)
`{guard.metric_column_name}`	Guard score (one column per postscore guard, e.g. `Response_Faithfulness_score`)
`{guard_name}_latency`	Wall-clock seconds this guard took
`blocked_{response_col}`	`True` if any guard blocked the response
`blocked_message_{response_col}`	Block message returned to the caller
`replaced_{response_col}`	`True` if a replace-action guard fired on the response
`replaced_message_{response_col}`	Replacement text
`reported_{response_col}`	`True` when a report-action guard fired
`Noneed_{response_col}`	Internal sentinel for no-action guards
`action_{response_col}`	Comma-joined string of actions taken
(per-guard enforced column)	Internal per-guard enforcement flag

Note: prescore_df and postscore_df are the raw executor outputs.

In the DRUM pipeline, format_result_df merges them into a single result_df that also adds

unmoderated_{response_col}, moderated_{prompt_col}, datarobot_latency, datarobot_token_count,

and datarobot_confidence_score. Those derived columns are not present in the DataFrames

returned directly by evaluate_prompt / evaluate_response / evaluate_full_pipeline.

import os
from datarobot_dome.api import ModerationPipeline

os.environ["DATAROBOT_ENDPOINT"]  = "<your-endpoint>"
os.environ["DATAROBOT_API_TOKEN"] = "<your-token>"
# TARGET_NAME is optional — sets the response column name used by postscore guards.
# Resolution order: TARGET_NAME env var → response_column_name in config → default "completion".
# os.environ["TARGET_NAME"] = "resultText"

pipeline = ModerationPipeline.from_yaml("moderation_config.yaml")

# ── Prompt evaluation (prescore guards) ───────────────────────────────────────
# sync
result, latency, prescore_df = pipeline.evaluate_prompt("What is DataRobot?")
# async (inside an async function / FastAPI route / agent)
result, latency, prescore_df = await pipeline.evaluate_prompt_async("What is DataRobot?")

if result.blocked:
    print(f"Blocked: {result.blocked_message}")
elif result.replaced:
    print(f"Prompt sanitized to: {result.replacement}")

# ── Response evaluation (postscore guards) ────────────────────────────────────
# sync
result, latency, postscore_df = pipeline.evaluate_response(
    "DataRobot is an AI platform.",
    prompt="What is DataRobot?",   # required for faithfulness / task-adherence guards
)
# async
result, latency, postscore_df = await pipeline.evaluate_response_async(
    "DataRobot is an AI platform.",
    prompt="What is DataRobot?",
)
print(f"Latency: {latency:.3f}s  Blocked: {result.blocked}  Metrics: {result.metrics}")

# ── Full pipeline: prescore → LLM → postscore ─────────────────────────────────
# sync
def my_llm(prompt: str) -> str:
    return "DataRobot is an AI platform."   # replace with your LLM call

result, prescore_df, postscore_df = pipeline.evaluate_full_pipeline("What is DataRobot?", my_llm)

# async (llm_callable must be an async coroutine)
async def my_async_llm(prompt: str) -> str:
    return "DataRobot is an AI platform."   # replace with your async LLM call

result, prescore_df, postscore_df = await pipeline.evaluate_full_pipeline_async(
    "What is DataRobot?", my_async_llm
)

if result.blocked:
    stage = "prompt" if result.prompt_evaluation.blocked else "response"
    blocked_eval = (
        result.prompt_evaluation if result.prompt_evaluation.blocked
        else result.response_evaluation
    )
    print(f"Blocked at {stage}: {blocked_eval.blocked_message}")
elif result.replaced:
    print(f"Text replaced. Response: {result.response}")
else:
    print(f"Response: {result.response}")
    print(f"Metrics: {result.response_evaluation.metrics}")

Agentic workflow example¶

For agents, the library can evaluate the full interaction trace — every tool call, intermediate message, and final response — not just the last reply. This gives the agent_goal_accuracy guard accurate context to judge whether the agent actually achieved the user's goal.

The interaction trace (pipeline_interactions) is a JSON-serialized ragas.MultiTurnSample produced by the DataRobot agent after each task run. Pass it directly to evaluate_response.

Config (docs/examples/agent_goal_accuracy_config.yaml):

targets:
  - target: _default
    guards:
      - name: Agent Goal Accuracy
        type: ootb
        ootb_type: agent_goal_accuracy
        stage: response
        is_agentic: true
        llm_type: llmGateway
        llm_gateway_model_id: "azure/gpt-4o-mini"
        intervention:
          action: report  # measure-only: block/replace are ignored by the library
          conditions: []

Measure-only guard: agent_goal_accuracy (like cost and guideline_adherence) always

forces intervene=False internally regardless of the action configured. The score is only

available in result.metrics["agent_goal_accuracy"] — use it to make blocking decisions in

your own code when needed.

Python — with full interaction trace (recommended for agentic pipelines):

import json
from datarobot_dome.api import ModerationPipeline

pipeline = ModerationPipeline.from_yaml("docs/examples/agent_goal_accuracy_config.yaml")

task = "Book a flight from NYC to London"

# chat_completion is the object returned by the DataRobot agent SDK.
# `pipeline_interactions` is attached when the agent has tool calls / multi-turn
# history; it is None for a plain single-turn response.
chat_completion = my_agent.run(task=task)
agent_response = chat_completion.choices[0].message.content
interactions_json = getattr(chat_completion, "pipeline_interactions", None)

result, latency, postscore_df = pipeline.evaluate_response(
    response=agent_response,
    prompt=task,
    pipeline_interactions=interactions_json,  # JSON str, or None
)

score = result.metrics.get("agent_goal_accuracy")
passed = score is not None and score >= 0.5
print(f"score={score}  passed={passed}")

**Python — building the interaction trace manually** (when not using the DataRobot agent SDK):

```python
import json
from ragas import MultiTurnSample
from ragas.messages import AIMessage, HumanMessage, ToolCall, ToolMessage

# Reconstruct the trace from your agent's execution log. {: #reconstruct-the-trace-from-your-agents-execution-log }
sample = MultiTurnSample(
    user_input=[
        HumanMessage(content="Book a flight from NYC to London"),
        AIMessage(
            content="Searching for available flights…",
            tool_calls=[ToolCall(name="search_flights", args={"origin": "NYC", "dest": "LON"})],
        ),
        ToolMessage(content='[{"flight": "BA178", "price": 620}]'),
        AIMessage(content="I found BA178 departing tomorrow for $620. Shall I book it?"),
    ]
)
interactions_json = json.dumps(sample.to_dict())

result, latency, _ = pipeline.evaluate_response(
    response="I found BA178 departing tomorrow for $620. Shall I book it?",
    prompt="Book a flight from NYC to London",
    pipeline_interactions=interactions_json,
)
print(result.blocked, result.metrics)

Without pipeline_interactions the guard falls back gracefully to evaluating the single

prompt/response pair — useful during development before you have a live agent.

From a plain Python dict¶

Use ModerationPipeline.from_dict when your configuration is already in dict form (e.g. loaded from JSON, fetched from an API, or assembled programmatically). The dict must follow the same schema as the YAML file.

パラメーター¶

パラメーター	タイプ	必須	説明
`config`	`dict`	はい	Guard configuration dictionary following the YAML schema
`model_dir`	`str \ \| None`	いいえ	Base directory used to resolve relative asset paths (e.g. NeMo guardrails `.co` flow files). Defaults to `os.getcwd()`

import os
from datarobot_dome.api import ModerationPipeline

os.environ["DATAROBOT_ENDPOINT"]  = "<your-endpoint>"
os.environ["DATAROBOT_API_TOKEN"] = "<your-token>"
# os.environ["TARGET_NAME"] = "resultText"  # optional — see [Environment variables](#environment-variables) for resolution order {: #osenvirontarget_name-resulttext-optional-see-10-for-resolution-order }

pipeline = ModerationPipeline.from_dict(
    {
        "targets": [
            {
                "target": "_default",
                "guards": [
                    {
                        "name": "Token Count",
                        "type": "ootb",
                        "ootb_type": "token_count",
                        "stage": "prompt",
                    }
                ],
            }
        ]
    },
    model_dir="/path/to/nemo_guardrails_dir",  # optional; only needed for NeMo guards
)

result, latency, prescore_df = pipeline.evaluate_prompt("Hello")
print(result.metrics)

From a Pydantic config object¶

Use ModerationPipeline.from_config to build the configuration entirely in Python — no YAML file required. This is useful for dynamic configurations, programmatic guard registration, or when embedding moderation in a larger application.

パラメーター¶

パラメーター	タイプ	必須	説明
`config`	`ModerationConfig`	はい	A fully-constructed `ModerationConfig` Pydantic object
`model_dir`	`str \ \| None`	いいえ	Base directory used to resolve relative asset paths (e.g. NeMo guardrails `.co` flow files). Defaults to `os.getcwd()`

All schema types are importable from datarobot_dome.schema:

from datarobot_dome.schema import (
    ModerationConfig,
    TargetBlock,
    # Guard subtypes — pick the matching one per guard
    OOTBGuardSchema,
    ModelGuardSchema,
    NemoGuardrailsSchema,
    NemoEvaluatorSchema,
    # Nested schemas used inside guards
    AdditionalGuardConfigSchema,
    InterventionSchema,
    InterventionConditionSchema,
    ModelInfoSchema,
)

Schema type → guard type mapping¶

Guard YAML `type`	Pydantic class
`ootb`	`OOTBGuardSchema`
`model`	`ModelGuardSchema`
`nemo_guardrails`	`NemoGuardrailsSchema`
`nemo_evaluator`	`NemoEvaluatorSchema`

LLM Gateway example — hate speech / guideline adherence¶

import os
from datarobot_dome.api import ModerationPipeline
from datarobot_dome.schema import (
    AdditionalGuardConfigSchema,
    InterventionSchema,
    ModerationConfig,
    OOTBGuardSchema,
    TargetBlock,
)

os.environ["DATAROBOT_ENDPOINT"]  = "https://app.datarobot.com/api/v2"
os.environ["DATAROBOT_API_TOKEN"] = "<your-dr-token>"
# os.environ["TARGET_NAME"] = "resultText"  # optional — see [Environment variables](#environment-variables) for resolution order {: #osenvirontarget_name-resulttext-optional-see-10-for-resolution-order }

config = ModerationConfig(
    targets=[
        TargetBlock(
            target="_default",
            guards=[
                OOTBGuardSchema(
                    type="ootb",
                    name="Hate Speech",
                    stage="response",
                    ootb_type="agent_guideline_adherence",
                    llm_type="llmGateway",
                    llm_gateway_model_id="azure/gpt-4o-2024-11-20",
                    additional_guard_config=AdditionalGuardConfigSchema(
                        agent_guideline=(
                            "The response must not contain hate speech, slurs, or content "
                            "that demeans people based on race, religion, gender, nationality, "
                            "or any other protected characteristic."
                        )
                    ),
                    intervention=InterventionSchema(
                        action="report",
                        conditions=[],
                    ),
                )
            ],
        )
    ]
)

# Pass model_dir when your config references NeMo guardrails flow files: {: #pass-model_dir-when-your-config-references-nemo-guardrails-flow-files }
# pipeline = ModerationPipeline.from_config(config, model_dir="/path/to/nemo_guardrails_dir") {: #pipeline-moderationpipelinefrom_configconfig-model_dirpathtonemo_guardrails_dir }

text = "People from that group are living in France."
result, latency, postscore_df = pipeline.evaluate_response(response=text, prompt="Describe this text.")
score = result.metrics.get("agent_guideline_adherence_score")
print(f"score={score}  latency={latency:.3f}s")

Model guard example¶

import os
from datarobot_dome.api import ModerationPipeline
from datarobot_dome.schema import (
    InterventionConditionSchema,
    InterventionSchema,
    ModerationConfig,
    ModelGuardSchema,
    ModelInfoSchema,
    TargetBlock,
)

os.environ["DATAROBOT_ENDPOINT"]  = "<your-endpoint>"
os.environ["DATAROBOT_API_TOKEN"] = "<your-token>"
# os.environ["TARGET_NAME"] = "resultText"  # optional — see [Environment variables](#environment-variables) for resolution order {: #osenvirontarget_name-resulttext-optional-see-10-for-resolution-order }

config = ModerationConfig(
    targets=[
        TargetBlock(
            target="_default",
            guards=[
                ModelGuardSchema(
                    type="model",
                    name="Toxicity",
                    stage="prompt",
                    deployment_id="<your-toxicity-deployment-id>",
                    model_info=ModelInfoSchema(
                        input_column_name="text",
                        target_name="toxicity_toxic_PREDICTION",
                        target_type="Binary",
                        class_names=[],
                    ),
                    intervention=InterventionSchema(
                        action="block",
                        message="Toxic content blocked.",
                        conditions=[
                            InterventionConditionSchema(comparand=0.5, comparator="greaterThan")
                        ],
                    ),
                )
            ],
        )
    ]
)

pipeline = ModerationPipeline.from_config(config)

Streaming pipeline¶

evaluate_full_pipeline_stream_async is the primary high-level API for streaming. It encapsulates prescore evaluation, the thread/queue bridge to ModerationIterator, and postscore guard execution — callers supply only a prompt and a streaming LLM callable.

Method signatures¶

方法	使用するタイミング
`evaluate_full_pipeline_stream_async(prompt, llm_callable)`	Preferred. Hides all internal state — no `prescore_df` required.
`stream_response_async(completion, *, prompt, prescore_df, prescore_latency)`	Advanced: when you need to inspect the `EvaluationResult` from prescore before starting the LLM stream (e.g. to act on a REPLACE result).

`evaluate_full_pipeline_stream_async` parameters¶

パラメーター	タイプ	必須	説明
`prompt`	`str`	はい	The user prompt
`llm_callable`	`Callable[[str], AsyncIterator[ChatCompletionChunk]]`	はい	Sync callable that receives the (possibly sanitized) effective prompt and returns an async iterator of chunks. Called only when the prompt is not blocked.

Chunk signals¶

`finish_reason`	Meaning
`None` or `"stop"`	Normal chunk — content is in `chunk.choices[0].delta.content`
`"content_filter"`	A guard intervened. `delta.content` holds the block message. The LLM was never called if this is the first (and only) chunk.

例¶

import asyncio
import os
from datarobot_dome.api import ModerationPipeline
from datarobot_dome.schema import (
    InterventionSchema, ModerationConfig, OOTBGuardSchema, TargetBlock,
)

os.environ["DATAROBOT_ENDPOINT"]  = "<your-endpoint>"
os.environ["DATAROBOT_API_TOKEN"] = "<your-token>"

pipeline = ModerationPipeline.from_config(
    ModerationConfig(
        targets=[
            TargetBlock(
                target="_default",
                guards=[
                    OOTBGuardSchema(
                        name="Prompt Token Limit",
                        type="ootb",
                        ootb_type="token_count",
                        stage="prompt",
                        intervention=InterventionSchema(
                            action="block",
                            conditions=[{"comparator": "greaterThan", "comparand": 200}],
                            message="Prompt too long.",
                        ),
                    ),
                ],
            )
        ]
    )
)

async def my_llm_stream(prompt: str):
    """Wrap a sync OpenAI stream as an async iterator."""
    import openai
    client = openai.OpenAI(
        api_key=os.environ["DATAROBOT_API_TOKEN"],
        base_url=f"{os.environ['DATAROBOT_ENDPOINT']}/genai/llmgw",
    )
    for chunk in client.chat.completions.create(
        model="azure/gpt-4o-2024-11-20",
        messages=[{"role": "user", "content": prompt}],
        stream=True,
    ):
        yield chunk

async def run(prompt: str) -> None:
    print(f"Prompt: {prompt!r}")
    async for chunk in pipeline.evaluate_full_pipeline_stream_async(prompt, my_llm_stream):
        finish_reason = chunk.choices[0].finish_reason
        content = chunk.choices[0].delta.content
        if finish_reason == "content_filter":
            print(f"[BLOCKED] {content}")
            return
        if content:
            print(content, end="", flush=True)
    print()

asyncio.run(run("What is DataRobot?"))

Advanced: `stream_response_async`¶

Use when you need the prescore EvaluationResult before streaming begins:

result, latency, prescore_df = await pipeline.evaluate_prompt_async(prompt)
if result.blocked:
    # handle block before ever calling the LLM
    return result.blocked_message

effective = result.replacement if result.replaced else prompt

async for chunk in pipeline.stream_response_async(
    my_llm_stream(effective),
    prompt=effective,
    prescore_df=prescore_df,      # must come from evaluate_prompt_async
    prescore_latency=latency,
):
    ...

With DRUM¶

Place moderation_config.yaml alongside your custom model code, then:

drum score --verbose \
  --code-dir ./ \
  --target-type textgeneration \
  --input ./input.csv \
  --runtime-params-file values.yaml

Testing guide¶

Set these environment variables before running any test (see Environment variables for details):

export DATAROBOT_ENDPOINT="https://app.datarobot.com/api/v2"
export DATAROBOT_API_TOKEN="your-token"
export TARGET_NAME="resultText"

Guards fall into four groups based on the credentials they require:

グループ	Guard types	Extra credentials needed
ローカル	`token_count`, `rouge_1`, `cost`, `custom_metric`	(none beyond the base vars above)
DataRobotへのデプロイ	`type: model`, any `ootb` with `llm_type: datarobot` or `llm_type: llmGateway`	Only `DATAROBOT_API_TOKEN`; provide a real `deployment_id`
External LLM provider	Any `ootb` with `llm_type: openAi`, `azureOpenAi`, `google`, `amazon`, `nim`	Provider-specific env var (see Environment variables)
NeMo	`type: nemo_guardrails`, `type: nemo_evaluator`	Provider key for NeMo Guardrails; `DATAROBOT_API_TOKEN` for NeMo Evaluator

See Guard types for complete YAML examples per guard type and Using the config in Python for Python usage patterns.

環境変数¶

Always required¶

特徴量	説明
`DATAROBOT_ENDPOINT`	DataRobot instance URL, e.g. `https://app.datarobot.com/api/v2`
`DATAROBOT_API_TOKEN`	DataRobot APIトークン。
`TARGET_NAME`	The name of the DataFrame column that holds the LLM response text (e.g. `resultText`). Resolution order for the response column (highest to lowest priority): (1) DRUM deployment `target_name` (always wins when `MLOPS_DEPLOYMENT_ID` is set), (2) `TARGET_NAME` env var, (3) `response_column_name` in the config file, (4) built-in default `"completion"`. DRUM sets this automatically; in standalone Python you can set it here or declare `response_column_name` in the YAML/`ModerationConfig` — but the env var takes precedence if both are provided.
`DISABLE_MODERATION`	Set to `true` to disable all guards at runtime.

OTelのトレース（オプション）¶

OTelのトレースは、OTEL_EXPORTER_OTLP_ENDPOINTが設定されるたびに出力されます。 The remaining two variables are optional — their corresponding request headers are omitted when the variable is absent, which allows traces to be forwarded to an unauthenticated local OTLP collector such as the af-component-agent-playground UI without needing credentials.

特徴量	必須	説明
`OTEL_EXPORTER_OTLP_ENDPOINT`	✅	OTLP HTTPコレクターのベースURL（`http://localhost:4318`など）。ライブラリは自動的に`/v1/traces`を追加します。
`OTEL_SERVICE_NAME`	❌	トレースリクエストに`X-DataRobot-Entity-Id`を追加します。 DataRobotの実稼働コレクターにルーティングする場合は必須です。ローカルコレクターの場合は省略します。
`OTEL_COLLECTOR_TOKEN`	❌	トレースリクエストに`Authorization: Bearer <token>`を追加します。実稼働/デプロイ済みのコレクターでは必須です。ローカルコレクターの場合は省略します。

ローカルプレイグラウンドの例：

export OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4318"
# OTEL_SERVICE_NAME and OTEL_COLLECTOR_TOKEN are not needed

本番環境の例：

export OTEL_EXPORTER_OTLP_ENDPOINT="https://collector.datarobot.com"
export OTEL_SERVICE_NAME="deployment-abc123"
export OTEL_COLLECTOR_TOKEN="my-token"

deepeval telemetry¶

The task_adherence guard uses deepeval internally. By default, moderations opts out of deepeval's usage telemetry — no .deepeval/ directory is created and no data is sent externally.

To opt in, set enable_deepeval_telemetry: true in your config (only takes effect when a task_adherence guard is present; deepeval is loaded lazily):

enable_deepeval_telemetry: true   # default: false

guards:
  - name: Task Adherence
    type: ootb
    ootb_type: task_adherence
    stage: response

To opt out explicitly via environment variable (e.g. in CI or container environments):

export DEEPEVAL_TELEMETRY_OPT_OUT=YES  # opt out (library default)
unset DEEPEVAL_TELEMETRY_OPT_OUT       # opt in

Credentials for LLM-eval guards using external providers¶

When your guard uses llm_type: datarobot, it reuses DATAROBOT_API_TOKEN — no extra variable needed.

For external providers (OpenAI, Azure OpenAI, Google, AWS), set a guard-specific env var. The variable name is built from the guard's type, stage, and ootb_type:

MLOPS_RUNTIME_PARAM_MODERATION_{TYPE}_{STAGE}_{OOTB_TYPE}_{PROVIDER_SUFFIX}

Guard (`ootb_type`)	プロバイダー	環境変数
`task_adherence`	OpenAI	`MLOPS_RUNTIME_PARAM_MODERATION_OOTB_RESPONSE_TASK_ADHERENCE_OPENAI_API_KEY`
`task_adherence`	Azure OpenAI	`MLOPS_RUNTIME_PARAM_MODERATION_OOTB_RESPONSE_TASK_ADHERENCE_AZURE_OPENAI_API_KEY`
`faithfulness`	OpenAI	`MLOPS_RUNTIME_PARAM_MODERATION_OOTB_RESPONSE_FAITHFULNESS_OPENAI_API_KEY`
`faithfulness`	Azure OpenAI	`MLOPS_RUNTIME_PARAM_MODERATION_OOTB_RESPONSE_FAITHFULNESS_AZURE_OPENAI_API_KEY`
`agent_guideline_adherence`	Azure OpenAI	`MLOPS_RUNTIME_PARAM_MODERATION_OOTB_RESPONSE_AGENT_GUIDELINE_ADHERENCE_AZURE_OPENAI_API_KEY`
`agent_guideline_adherence`	Google Vertex AI	`MLOPS_RUNTIME_PARAM_MODERATION_OOTB_RESPONSE_AGENT_GUIDELINE_ADHERENCE_GOOGLE_SERVICE_ACCOUNT`
`agent_goal_accuracy`	Azure OpenAI	`MLOPS_RUNTIME_PARAM_MODERATION_OOTB_RESPONSE_AGENT_GOAL_ACCURACY_AZURE_OPENAI_API_KEY`
`agent_goal_accuracy`	AWS Bedrock	`MLOPS_RUNTIME_PARAM_MODERATION_OOTB_RESPONSE_AGENT_GOAL_ACCURACY_AWS_ACCOUNT`
`nemo_guardrails` (prompt)	Azure OpenAI	`MLOPS_RUNTIME_PARAM_MODERATION_NEMO_GUARDRAILS_PROMPT_AZURE_OPENAI_API_KEY`

Value format per provider:

# OpenAI / Azure OpenAI {: #openai-azure-openai }
'{"type":"credential","payload":{"credentialType":"api_token","apiToken":"YOUR_KEY"}}'

# Google Vertex AI {: #google-vertex-ai }
'{"type":"credential","payload":{"credentialType":"gcp","gcpKey":{...}}}'

# AWS Bedrock {: #aws-bedrock }
'{"type":"credential","payload":{"credentialType":"s3","awsAccessKeyId":"...","awsSecretAccessKey":"...","awsSessionToken":"..."}}'

Moderations guardrails¶

File structure¶

Top-level options¶

Common guard fields¶

Intervention block¶

アクション¶

Comparators¶

Guard types¶

Out-of-the-Box (ootb)¶

Model guard¶

NeMo Guardrails¶

NeMo Evaluator¶

LLM back-end options¶

Supported llm_type values¶

Available models (Google / AWS)¶

Full annotated example¶

Using the config in Python¶

From a YAML file¶

Return types¶

evaluate_prompt / evaluate_prompt_async parameters¶

evaluate_response / evaluate_response_async parameters¶

evaluate_full_pipeline / evaluate_full_pipeline_async parameters¶

EvaluationResult fields¶

PipelineResult fields¶

What prescore_df contains¶

What postscore_df contains¶

Agentic workflow example¶

From a plain Python dict¶

パラメーター¶

From a Pydantic config object¶

パラメーター¶

Schema type → guard type mapping¶

LLM Gateway example — hate speech / guideline adherence¶

Model guard example¶

Streaming pipeline¶

Method signatures¶

evaluate_full_pipeline_stream_async parameters¶

Chunk signals¶

例¶

Advanced: stream_response_async¶

With DRUM¶

Testing guide¶

環境変数¶

Always required¶

OTelのトレース（オプション）¶

deepeval telemetry¶

Credentials for LLM-eval guards using external providers¶

Out-of-the-Box (`ootb`)¶

Supported `llm_type` values¶

`evaluate_prompt` / `evaluate_prompt_async` parameters¶

`evaluate_response` / `evaluate_response_async` parameters¶

`evaluate_full_pipeline` / `evaluate_full_pipeline_async` parameters¶

`EvaluationResult` fields¶

`PipelineResult` fields¶

What `prescore_df` contains¶

What `postscore_df` contains¶

`evaluate_full_pipeline_stream_async` parameters¶

Advanced: `stream_response_async`¶