Deploy a governed custom LLM¶
This notebook outlines how to create a deployment endpoint that interfaces with a large language model (LLM). Doing so allows you to engage with it via a DataRobot API key. In exchange for creating a deployment around the LLM, you get all of the benefits of DataRobot MLOps governance capabilities, such as text drift monitoring, request history, and usage statistics.
Before proceeding, download the necessary assets to execute the tasks in this notebook.
Initialize the environment¶
Start the environment, import the DataRobot library, and make sure your LLM credentials work.
import os
import datarobot as dr
from openai import AzureOpenAI
os.environ['PULUMI_CONFIG_PASSPHRASE'] = 'default'
assert 'DATAROBOT_API_TOKEN' in os.environ, 'Please set the DATAROBOT_API_TOKEN environment variable'
assert 'DATAROBOT_ENDPOINT' in os.environ, 'Please set the DATAROBOT_ENDPOINT environment variable'
assert 'OPENAI_API_BASE' in os.environ, 'Please set the OPENAI_API_BASE environment variable'
assert 'OPENAI_API_KEY' in os.environ, 'Please set the OPENAI_API_KEY environment variable'
assert 'OPENAI_API_VERSION' in os.environ, 'Please set the OPENAI_API_VERSION environment variable'
dr_client = dr.Client()
def test_azure_openai_credentials():
"""Test the provided OpenAI credentials."""
model_name = os.getenv("OPENAI_API_DEPLOYMENT_ID")
try:
client = AzureOpenAI(
api_key=os.getenv("OPENAI_API_KEY"),
azure_endpoint=os.getenv("OPENAI_API_BASE"),
api_version=os.getenv("OPENAI_API_VERSION"),
)
client.chat.completions.create(
messages=[{"role": "user", "content": "hello"}],
model=model_name, # type: ignore[arg-type]
)
except Exception as e:
raise ValueError(
f"Unable to run a successful test completion against model '{model_name}' "
"with provided Azure OpenAI credentials. Please validate your credentials."
) from e
test_azure_openai_credentials()
Set up a project¶
Set up functions to create and build or destroy your Pulumi stack.
from pulumi import automation as auto
def stack_up(project_name: str, stack_name: str, program: callable) -> auto.Stack:
# create (or select if one already exists) a stack that uses our inline program
stack = auto.create_or_select_stack(
stack_name=stack_name, project_name=project_name, program=program
)
stack.refresh(on_output=print)
stack.up(on_output=print)
return stack
def destroy_project(stack: auto.Stack):
"""Destroy pulumi project"""
stack_name = stack.name
stack.destroy(on_output=print)
stack.workspace.remove_stack(stack_name)
print(f"stack {stack_name} in project removed")
Create a declarative LLM deployment¶
Deploying a governance LLM model isn't complicated, but it is more involved than creating a custom model deployment for a standard classification model for three reasons:
- You want to set runtime paramaters around the deployment specifying your LLM endpoint and other metadata.
- You must set up and apply a credential for model metadata that should be hidden, such as the API key. The hidden credential you create will end up as one of the runtime parameters.
- You need to use a special environment called a serverless prediction environment that works well for sending API calls through a deployment. You need to set up one of these specifically for the model.
After configuring the credentials and runtime parameters, upload the source code to DataRobot, register the model, and initialize the deployment.
import pulumi_datarobot as datarobot
import pulumi
def setup_runtime_parameters(
credential: datarobot.ApiTokenCredential,
) -> list[datarobot.CustomModelRuntimeParameterValueArgs]:
"""Setup runtime parameters for bolt on goverance deployment.
Each runtime parameter is a tuple trio with the key, type, and value.
Args:
credential (datarobot.ApiTokenCredential):
The DataRobot credential representing the LLM api token
"""
return [
datarobot.CustomModelRuntimeParameterValueArgs(
key=key,
type=type_,
value=value, # type: ignore[arg-type]
)
for key, type_, value in [
("OPENAI_API_KEY", "credential", credential.id),
("OPENAI_API_BASE", "string", os.getenv("OPENAI_API_BASE")),
("OPENAI_API_VERSION", "string", os.getenv("OPENAI_API_VERSION")),
(
"OPENAI_API_DEPLOYMENT_ID",
"string",
os.getenv("OPENAI_API_DEPLOYMENT_ID"),
),
]
]
def make_bolt_on_governance_deployment():
"""
Deploy a trained model onto DataRobot's prediction environment.
Upload source code to create a custom model version.
Then create a registered model and deploy it to a prediction environment.
"""
# ID for Python 3.11 Moderations Environment
python_environment_id = "65f9b27eab986d30d4c64268"
custom_model_name = "App Template Minis - OpenAI LLM"
registered_model_name = "App Template Minis - OpenAI Registered Model"
deployment_name = "App Template Minis - Bolt on Goverance Deployment"
prediction_environment = datarobot.PredictionEnvironment(
resource_name="App Template Minis - Serverless Environment",
platform=dr.enums.PredictionEnvironmentPlatform.DATAROBOT_SERVERLESS,
)
llm_credential = datarobot.ApiTokenCredential(
resource_name="App Template Minis - OpenAI LLM Credentials",
api_token=os.getenv("OPENAI_API_KEY"),
)
runtime_parameters = setup_runtime_parameters(llm_credential)
deployment_files = [
("./model_package/requirements.txt", "requirements.txt"),
("./model_package/custom.py", "custom.py"),
("./model_package/model-metadata.yaml", "model-metadata.yaml"),
]
custom_model = datarobot.CustomModel(
resource_name=custom_model_name,
runtime_parameter_values=runtime_parameters,
files=deployment_files,
base_environment_id=python_environment_id,
target_type=dr.enums.TARGET_TYPE.TEXT_GENERATION,
target_name="content",
language="python",
replicas=2,
)
registered_model = datarobot.RegisteredModel(
resource_name=registered_model_name,
custom_model_version_id=custom_model.version_id,
)
deployment = datarobot.Deployment(
resource_name=deployment_name,
label=deployment_name,
registered_model_version_id=registered_model.version_id,
prediction_environment_id=prediction_environment.id,
)
pulumi.export("serverless_environment_id", prediction_environment.id)
pulumi.export("custom_model_id", custom_model.id)
pulumi.export("registered_model_id", registered_model.id)
pulumi.export("deployment_id", deployment.id)
Putting it together¶
Now it's time to run the stack. Doing this will take the files that are in the model_package
directory, put them onto DataRobot as a custom model, register that model, and deploy the result.
project_name = "AppTemplateMinis-BoltOnGovernance"
stack_name = "MarshallsExtraSpecialLargeLanguageModel"
stack = stack_up(project_name, stack_name, program=make_bolt_on_governance_deployment)
Interact with outputs¶
Now that you have the goverance deployment, you can interact with it directly through the OpenAI SDK. The only difference is that you pass your DataRobot API key instead of the LLM credentials.
from pprint import pprint
from openai import OpenAI
deployment_id = stack.outputs().get("deployment_id").value
deployment_chat_base_url = dr_client.endpoint + f"/deployments/{deployment_id}/"
client = OpenAI(api_key=dr_client.token, base_url=deployment_chat_base_url)
messages = [
{"role": "user", "content": "Why are ducks called ducks?"},
]
response = client.chat.completions.create(messages=messages, model="gpt-4o")
pprint(response.choices[0].message.content)
Clear your work¶
You may not be interested in keeping the deployment. Use this cell to shutdown the stack, deleting any assets created in DataRobot.
destroy_project(stack)
Appendix¶
How does scoring code work?¶
The code below shows what you upload so that DataRobot knows how to interact with the custom model. Since you are deploying a custom inference model with minimal transformations, it only defines two hooks to interact with the model, but you could add others too. Since the example model is a standard Scikit-learn binary classifier, DataRobot can figure out how to interact with it without you defining any hooks. Since most model artifacts require some custom scoring logic, the example includes a custom.py
file anyway.
from IPython.display import Code
Code(filename="./model_package/custom.py", language="python")