Skip to content

AI robustness testing

AI robustness tests help validate that your generative AI models perform well under various conditions and meet quality standards. Robustness testing allows you to configure insights that measure different aspects of model performance, such as toxicity, relevance, and coherence.

Configure insights for robustness testing

Set up insights to test AI model robustness. When creating an insight configuration, you should specify the following:

  • custom_model_version_id: The ID of the custom model version to test.
  • insight_name: A user-friendly name for the insight.
  • insight_type: The type of insight (e.g., OOTB_METRIC).
  • ootb_metric_name: The name of the DataRobot-provided metric (for the OOTB_METRIC type).
  • stage: The stage when the metric is calculated (PROMPT or RESPONSE).
import datarobot as dr
custom_model_version = dr.CustomModelVersion.get(version_id)
insight_config = dr.InsightsConfiguration.create(
    custom_model_version_id=custom_model_version.id,
    insight_name="Toxicity Check",
    insight_type=dr.InsightTypes.OOTB_METRIC,
    ootb_metric_name="Toxicity",
    stage=dr.InsightStage.RESPONSE
)
insight_config

Set up test configurations

To configure test parameters:

eval_dataset = dr.EvaluationDatasetConfiguration.create(
    name="Robustness Test Dataset",
    dataset_id=dataset.id
)
cost_config = dr.CostConfiguration.create(
    name="Test Cost Config",
    cost_per_token=0.0001
)
insight_config.update(
    evaluation_dataset_configuration_id=eval_dataset.id,
    cost_configuration_id=cost_config.id
)

Run robustness tests

To execute tests and review results:

insight_config = dr.InsightsConfiguration.get(insight_config_id)
if insight_config.execution_status == "COMPLETED":
    results = insight_config.get_results()
    print(f"Test results: {results}")
elif insight_config.execution_status == "ERROR":
    print(f"Test failed: {insight_config.error_message}")
    print(f"Resolution: {insight_config.error_resolution}")

Configure multiple insights

To set up multiple tests for comprehensive validation:

insights = [
    {
        "insight_name": "Toxicity Check",
        "ootb_metric_name": "Toxicity",
        "stage": dr.InsightStage.RESPONSE
    },
    {
        "insight_name": "Relevance Check",
        "ootb_metric_name": "Relevance",
        "stage": dr.InsightStage.RESPONSE
    },
    {
        "insight_name": "Coherence Check",
        "ootb_metric_name": "Coherence",
        "stage": dr.InsightStage.RESPONSE
    }
]
for insight_config in insights:
    dr.InsightsConfiguration.create(
        custom_model_version_id=custom_model_version.id,
        **insight_config
    )

Get an insight configuration

To retrieve a specific insight configuration:

insight_config = dr.InsightsConfiguration.get(insight_config_id)
print(f"Insight name: {insight_config.insight_name}")
print(f"Insight type: {insight_config.insight_type}")
print(f"Execution status: {insight_config.execution_status}")

List insight configurations

Get all insights for a custom model version:

custom_model_version = dr.CustomModelVersion.get(version_id)
insights = dr.InsightsConfiguration.list(custom_model_version_id=custom_model_version.id)
for insight in insights:
    print(f"{insight.insight_name}: {insight.execution_status}")