Bolt-on Governance API¶
This notebook outlines how to use the Bolt-on Governance API with deployed LLM blueprints.
You can use the official Python library for the OpenAI API to make chat completion requests to DataRobot LLM blueprint deployments:
!pip install openai=="1.51.2"
from openai import OpenAI
Specify the ID of the LLM blueprint deployment and your DataRobot API token:
DEPLOYMENT_ID = "<SPECIFY_DEPLOYMENT_ID_HERE>"
DATAROBOT_API_TOKEN = "<SPECIFY_TOKEN_HERE>"
DEPLOYMENT_URL = f"https://app.datarobot.com/api/v2/deployments/{DEPLOYMENT_ID}"
Use the code below to create an OpenAI client:
client = OpenAI(base_url=DEPLOYMENT_URL, api_key=DATAROBOT_API_TOKEN)
Use the code below to request a chat completion.
Note that specifying the system message in the request overrides the system prompt set in the LLM blueprint. Specifying other settings in the request, such as max_completion_tokens
, overrides the settings of the LLM blueprint.
completion = client.chat.completions.create(
model="llm-blueprint",
messages=[
{"role": "system", "content": "Answer with just a number."},
{"role": "user", "content": "What is 2+3?"},
{"role": "assistant", "content": "5"},
{"role": "user", "content": "Now multiply the result by 4."},
{"role": "assistant", "content": "20"},
{"role": "user", "content": "Now divide the result by 2."},
],
)
print(completion)
Accessing DataRobot citations, moderations and evaluations¶
If your DataRobot deployed model uses a vector database or has evaluation metrics those can be accessed in the model_extra
object. This will also have other key details about DataRobot handling of the prompt, retrieval, and response steps.
The code below assumes you have toxicity and latency evaluations configured on your deployed model.
print(completion.model_extra.keys())
datarobot_details = completion.model_extra["datarobot_moderations"]
TEMPLATE = """
For this sample chat completion, the latency was {latency:0.2f}.
The prompt had a toxicity rating of {prompt_toxicity:0.2f}
The DataRobot association ID for tracking is {association_id}
"""
print(
TEMPLATE.format(
latency=datarobot_details["datarobot_latency"],
prompt_toxicity=datarobot_details["Prompts_toxicity_toxic_PREDICTION"],
association_id=datarobot_details["association_id"],
)
)
Streaming¶
Use the following cell to request a chat completion with a streaming response.
streaming_response = client.chat.completions.create(
model="llm-blueprint",
messages=[
{"role": "system", "content": "Explain your thoughts using at least 100 words."},
{"role": "user", "content": "What would it take to colonize Mars?"},
],
stream=True,
)
for chunk in streaming_response:
content = chunk.choices[0].delta.content
if content is not None:
print(content, end="")