Generative AI moderation¶
Moderation configurations help ensure your generative AI applications produce safe and appropriate content by filtering prompts and responses. Moderation can block or warn on problematic content at different stages of the generation process.
List moderation templates¶
To retrieve all available moderation templates:
import datarobot as dr
templates = dr.ModerationTemplate.list()
for template in templates:
print(f"Template: {template.name}")
print(f" Description: {template.description}")
Create a moderation configuration¶
Create a moderation configuration from a template. When creating a moderation configuration, you should specify the following:
template_id: The ID of the template to base this configuration on.name: A user-friendly name for the configuration.description: A description of the configuration.stages: The stages of moderation where this guard is active (e.g., PROMPT, RESPONSE).entity_id: The ID of the custom model version or playground this configuration applies to.entity_type: The type of the associated entity (CUSTOM_MODEL_VERSION or PLAYGROUND).intervention: The action to take if moderation fails (BLOCK or WARN).llm_type: The backing LLM this guard uses.
templates = dr.ModerationTemplate.list()
template = templates[0]
custom_model_version = dr.CustomModelVersion.get(version_id)
moderation_config = dr.ModerationConfiguration.create(
template_id=template.id,
name="Content Safety Guard",
description="Filters inappropriate content",
stages=[dr.ModerationGuardStage.PROMPT, dr.ModerationGuardStage.RESPONSE],
entity_id=custom_model_version.id,
entity_type=dr.ModerationGuardEntityType.CUSTOM_MODEL_VERSION,
intervention=dr.ModerationIntervention.BLOCK,
llm_type=dr.ModerationGuardLlmType.DATAROBOT_LLM
)
moderation_config
You can also create moderation for playgrounds:
playground = dr.genai.Playground.get(playground_id)
moderation_config = dr.ModerationConfiguration.create(
template_id=template.id,
name="Playground Safety Guard",
description="Filters content in playground",
stages=[dr.ModerationGuardStage.PROMPT, dr.ModerationGuardStage.RESPONSE],
entity_id=playground.id,
entity_type=dr.ModerationGuardEntityType.PLAYGROUND,
intervention=dr.ModerationIntervention.WARN,
llm_type=dr.ModerationGuardLlmType.DATAROBOT_LLM
)
List moderation configurations¶
To retrieve configurations for an entity:
custom_model_version = dr.CustomModelVersion.get(version_id)
configs = dr.ModerationConfiguration.list(
entity_id=custom_model_version.id,
entity_type=dr.ModerationGuardEntityType.CUSTOM_MODEL_VERSION
)
for config in configs:
print(f"Config: {config.name}")
print(f" Stages: {config.stages}")
print(f" Intervention: {config.intervention}")
Get a moderation configuration¶
To retrieve a specific configuration:
config = dr.ModerationConfiguration.get(config_id)
print(f"Name: {config.name}")
print(f"Description: {config.description}")
print(f"Stages: {config.stages}")
Update moderation configuration¶
To update moderation settings:
config = dr.ModerationConfiguration.get(config_id)
config.update(
name="Updated Safety Guard",
description="Enhanced content filtering",
intervention=dr.ModerationIntervention.WARN
)
Get the overall moderation configuration¶
Retrieve the overall moderation configuration for an entity:
custom_model_version = dr.CustomModelVersion.get(version_id)
overall_config = dr.OverallModerationConfig.get(
entity_id=custom_model_version.id,
entity_type=dr.ModerationGuardEntityType.CUSTOM_MODEL_VERSION
)
if overall_config:
print(f"Moderation enabled: {overall_config.is_enabled}")
print(f"Configurations: {len(overall_config.configurations)}")
List the overall moderation configurations¶
To get all of the overall moderation configurations:
overall_configs = dr.OverallModerationConfig.list()
for config in overall_configs:
print(f"Entity: {config.entity_id}")
print(f" Enabled: {config.is_enabled}")
Update the overall moderation configuration¶
To modify the overall moderation settings:
overall_config = dr.OverallModerationConfig.get(
entity_id=custom_model_version.id,
entity_type=dr.ModerationGuardEntityType.CUSTOM_MODEL_VERSION
)
overall_config.update(is_enabled=True)