Skip to content

Generative AI moderation

Moderation configurations help ensure your generative AI applications produce safe and appropriate content by filtering prompts and responses. Moderation can block or warn on problematic content at different stages of the generation process.

List moderation templates

To retrieve all available moderation templates:

import datarobot as dr
templates = dr.ModerationTemplate.list()
for template in templates:
    print(f"Template: {template.name}")
    print(f"  Description: {template.description}")

Create a moderation configuration

Create a moderation configuration from a template. When creating a moderation configuration, you should specify the following:

  • template_id: The ID of the template to base this configuration on.
  • name: A user-friendly name for the configuration.
  • description: A description of the configuration.
  • stages: The stages of moderation where this guard is active (e.g., PROMPT, RESPONSE).
  • entity_id: The ID of the custom model version or playground this configuration applies to.
  • entity_type: The type of the associated entity (CUSTOM_MODEL_VERSION or PLAYGROUND).
  • intervention: The action to take if moderation fails (BLOCK or WARN).
  • llm_type: The backing LLM this guard uses.
templates = dr.ModerationTemplate.list()
template = templates[0]
custom_model_version = dr.CustomModelVersion.get(version_id)
moderation_config = dr.ModerationConfiguration.create(
    template_id=template.id,
    name="Content Safety Guard",
    description="Filters inappropriate content",
    stages=[dr.ModerationGuardStage.PROMPT, dr.ModerationGuardStage.RESPONSE],
    entity_id=custom_model_version.id,
    entity_type=dr.ModerationGuardEntityType.CUSTOM_MODEL_VERSION,
    intervention=dr.ModerationIntervention.BLOCK,
    llm_type=dr.ModerationGuardLlmType.DATAROBOT_LLM
)
moderation_config

You can also create moderation for playgrounds:

playground = dr.genai.Playground.get(playground_id)
moderation_config = dr.ModerationConfiguration.create(
    template_id=template.id,
    name="Playground Safety Guard",
    description="Filters content in playground",
    stages=[dr.ModerationGuardStage.PROMPT, dr.ModerationGuardStage.RESPONSE],
    entity_id=playground.id,
    entity_type=dr.ModerationGuardEntityType.PLAYGROUND,
    intervention=dr.ModerationIntervention.WARN,
    llm_type=dr.ModerationGuardLlmType.DATAROBOT_LLM
)

List moderation configurations

To retrieve configurations for an entity:

custom_model_version = dr.CustomModelVersion.get(version_id)
configs = dr.ModerationConfiguration.list(
    entity_id=custom_model_version.id,
    entity_type=dr.ModerationGuardEntityType.CUSTOM_MODEL_VERSION
)
for config in configs:
    print(f"Config: {config.name}")
    print(f"  Stages: {config.stages}")
    print(f"  Intervention: {config.intervention}")

Get a moderation configuration

To retrieve a specific configuration:

config = dr.ModerationConfiguration.get(config_id)
print(f"Name: {config.name}")
print(f"Description: {config.description}")
print(f"Stages: {config.stages}")

Update moderation configuration

To update moderation settings:

config = dr.ModerationConfiguration.get(config_id)
config.update(
    name="Updated Safety Guard",
    description="Enhanced content filtering",
    intervention=dr.ModerationIntervention.WARN
)

Get the overall moderation configuration

Retrieve the overall moderation configuration for an entity:

custom_model_version = dr.CustomModelVersion.get(version_id)
overall_config = dr.OverallModerationConfig.get(
    entity_id=custom_model_version.id,
    entity_type=dr.ModerationGuardEntityType.CUSTOM_MODEL_VERSION
)
if overall_config:
    print(f"Moderation enabled: {overall_config.is_enabled}")
    print(f"Configurations: {len(overall_config.configurations)}")

List the overall moderation configurations

To get all of the overall moderation configurations:

overall_configs = dr.OverallModerationConfig.list()
for config in overall_configs:
    print(f"Entity: {config.entity_id}")
    print(f"  Enabled: {config.is_enabled}")

Update the overall moderation configuration

To modify the overall moderation settings:

overall_config = dr.OverallModerationConfig.get(
    entity_id=custom_model_version.id,
    entity_type=dr.ModerationGuardEntityType.CUSTOM_MODEL_VERSION
)
overall_config.update(is_enabled=True)