Use the Bolt-on Governance API¶
This notebook outlines how to use the Bolt-on Governance API with deployed LLM blueprints. LLM blueprints deployed from the playground implement the chat() hook in the custom model's custom.py file by default.
You can use the official Python library for the OpenAI API to make chat completion requests to DataRobot LLM blueprint deployments:
!pip install openai
from openai import OpenAI
Specify the ID of the LLM blueprint deployment and your DataRobot API token:
DEPLOYMENT_ID = "<SPECIFY_DEPLOYMENT_ID_HERE>"
DATAROBOT_API_TOKEN = "<SPECIFY_TOKEN_HERE>"
DEPLOYMENT_URL = f"https://app.datarobot.com/api/v2/deployments/{DEPLOYMENT_ID}"
Use the code below to create an OpenAI client:
client = OpenAI(base_url=DEPLOYMENT_URL, api_key=DATAROBOT_API_TOKEN)
Use the code below to request a chat completion. See the considerations below for more information on specifying the model parameter. Specifying the system message in the request overrides the system prompt configured in the LLM blueprint. Specifying other settings in the request, such as max_completion_tokens, overrides the settings of the LLM blueprint.
completion = client.chat.completions.create(
model="datarobot-deployed-llm",
messages=[
{"role": "system", "content": "Answer with just a number."},
{"role": "user", "content": "What is 2+3?"},
{"role": "assistant", "content": "5"},
{"role": "user", "content": "Now multiply the result by 4."},
{"role": "assistant", "content": "20"},
{"role": "user", "content": "Now divide the result by 2."},
],
)
print(completion)
This returns a ChatCompletion object if streaming is disabled and Iterator[ChatCompletionChunk] if streaming is enabled. Use the following cell to request a chat completion with a streaming response.
streaming_response = client.chat.completions.create(
model="datarobot-deployed-llm",
messages=[
{"role": "system", "content": "Explain your thoughts using at least 100 words."},
{"role": "user", "content": "What would it take to colonize Mars?"},
],
stream=True,
)
for chunk in streaming_response:
content = chunk.choices[0].delta.content
if content is not None:
print(content, end="")
To return citations, the deployed LLM must have a vector database associated with it. completion returns keys related to citations and accessible to custom models.
Specify association ID and custom metrics¶
When making a chat request to a DataRobot-deployed text generation or agentic workflow custom model, a custom association ID can be optionally specified for chat requests in place of the auto-generated ID by setting datarobot_association_id in the extra_body field of the chat request. Values can also be reported for arbitrary custom metrics defined for the deployment by setting datarobot_metrics in the extra_body field. To do this, define these values in the optional extra_body field of the chat request. The extra_body field is a standard way to add more parameters to an OpenAI chat request, allowing the chat client to pass model-specific parameters to an LLM.
If the field datarobot_association_id is found in extra_body, DataRobot uses that value instead of the automatically generated one. If the field datarobot_metrics is found in extra_body, DataRobot reports a custom metric for all the name:value pairs found inside. A matching custom metric for each name must already be defined for the deployment. If the reported value is a string, the custom metric must be the multiclass type, with the reported value matching one of the classes.
The deployed custom model must have an association ID column defined for DataRobot to process custom metrics from chat requests, regardless of whether extra_body is specified. Moderation must be configured for the custom model for the metrics to be processed.
extra_body = {
# These values pass through to the LLM
"llm_id": "azure-gpt-6",
# If set here, replaces the auto-generated association ID
"datarobot_association_id": "my_association_id_0001",
# DataRobot captures these for custom metrics
"datarobot_metrics": {
"field1": 24,
"field2": "example"
}
}
completion = client.chat.completions.create(
model="datarobot-deployed-llm",
messages=[
{"role": "system", "content": "Explain your thoughts using at least 100 words."},
{"role": "user", "content": "What would it take to colonize Mars?"},
],
max_tokens=512,
extra_body=extra_body
)
print(completion.choices[0].message.content)
Moderation and guardrails¶
Moderation guardrails help your organization block prompt injection and hateful, toxic, or inappropriate prompts and responses. To return datarobot_moderations, the deployed LLM must be running in an execution environment that has the moderation library installed, and the custom model code directory must contain moderation_config.yaml to configure the moderations. When using the Bolt-on Governance API with moderations configured, consider the following:
- If there are no guards for the response stage, the moderation library returns the existing stream obtained from the LLM.
- Not all response guards are applied to a chunk. The faithfulness, rouge-1, and nemo guards are not applied to chunk. Instead they are applied to the whole response when available because these guards need the whole response to be present in order to evaluate.
- If moderation is enabled and the streaming response is requested, the first chunk will always contain the information about prompt guards (if configured) and response guards (excluding faithfulness, rouge-1, and nemo). Access the chunk via
chunk.datarobot_moderations. - For every subsequent chunk that is not the last chunk, response guards (excluding faithfulness, rouge-1, and NeMo) are applied and can be accessed from
chunk.datarobot_moderations. - The last chunk has all response guards (excluding faithfulness, rouge-1 and nemo) applied to the chunk. Faithfulness, rouge-1, and nemo are applied to the whole response.
- If streaming is the aggregration, the following custom metrics are reported:
- latency: The sum of latency for running guard on a chunk.
- score: the score of the chunk - if the intermediate chunk is blocked - else score of the assembled response (if no chunk blocked).
- enforced: The logical OR of the enforced metric on each chunk.
Considerations¶
When using the Bolt-on Governance API, consider the following:
- If you implement the chat completion hook without modification, the
chat()hook behaves differently than thescore()hook. Specifically, the unmodifiedchat()hook passes in themodelparameter through thecompletion_create_paramsargument while thescore()hook specifies the model in the custom model code. - If you add a deployed LLM to the playground, the validation uses the value entered into the "Chat model ID" field as the
modelparameter value. Ensure the LLM deployment accepts this value as themodelparameter. Alternatively, you can modify the implementation of thechat()hook to override the value of themodelparameter, defining the intended model (for example, using a runtime parameter). For more information, see GenAI troubleshooting. - The Bolt-on Governance API is also available in GPU environments for custom models running on
datarobot-drum>=1.14.3.