MLOps > Performance monitoring > Generative model monitoring

Generative model monitoring¶

Availability information

Monitoring support for generative models is a premium feature. Contact your DataRobot representative or administrator for information on enabling this feature.

Using the text generation target type for custom and external models, a premium LLMOps feature, deploy generative Large Language Models (LLMs) to make predictions, monitor service, usage, and data drift statistics, and create custom metrics. DataRobot supports LLMs through two deployment methods:

Create a text generation model as a custom inference model in DataRobot: Create and deploy a text generation model using DataRobot's Custom Model Workshop, calling the LLM's API to generate text instead of performing inference directly and allowing DataRobot MLOps to access the LLM's input and output for monitoring. To call the LLM's API, you should enable public network access for custom models.
Monitor a text generation model running externally: Create and deploy a text generation model on your infrastructure (local or cloud), using the monitoring agent to communicate the input and output of your LLM to DataRobot for monitoring.

Create and deploy a generative custom model¶

Custom inference models are user-created, pretrained models that you can upload to DataRobot (as a collection of files) via the Custom Model Workshop. You can then upload a model artifact to create, test, and deploy custom inference models to DataRobot's centralized deployment hub.

Add a generative custom model¶

To add a generative model to the Custom Model Workshop:

Click Model Registry > Custom Model Workshop and, on the Models tab, click + Add new model.
In the Add Custom Inference Model dialog box, under Target type, click Text Generation.
Enter a Model name and Target name. In addition, you can click Show Optional Fields to define the language used to build the model and provide a description.
Click Add Custom Model. The new custom model opens to the Assemble tab.

Assemble and deploy a generative custom model¶

To assemble, test, and deploy a generative model from the Custom Model Workshop:

On the right side of the Assemble tab, under Model Environment, select a model environment from the Base Environment list. The model environment is used for testing and deploying the custom model.

Note

The Base Environment pulldown menu includes drop-in model environments, if any exist, as well as custom environments that you can create.

On the left side of the Assemble tab, under Model, drag and drop files or click Browse local files to upload your LLM's custom model artifacts. Alternatively, you can import model files from a remote repository.

Important

If you click Browse local files, you have the option of adding a Local Folder. The local folder should contain dependent files and additional assets required by your model, not the model itself. If the model file is included in the folder, it will not be accessible to DataRobot. Instead, the model file must exist at the root level. The root file can then point to the dependencies in the folder.

A basic LLM assembled in the Custom Model Workshop should include the following files:

File	Contents
`custom.py`	The custom model code, calling the LLM service's API through public network access for custom models.
`model-metadata.yaml`	The runtime parameters required by the generative model.
`requirements.txt`	The libraries (and versions) required by the generative model.

The dependencies from requirements.txt appear under Model Environment in the Model Dependencies box.

After you add the required model files, add training data. To provide a training baseline for drift monitoring, you should upload a dataset containing at least 20 rows of prompts and responses relevant to the topic your generative model is intended to answer questions about. These prompts and responses can be taken from documentation, manually created, or generated.
Next, click the Test tab, click + New test, and then click Start test to run the Startup and Prediction error tests, the only tests supported for the Text Generation target type.
Click Register to deploy, provide the model information, and click Add to registry.

The model opens on the Registered Models tab.
In the registered model version header, click Deploy, and then configure the deployment settings.

You can now make predictions as you would with any other DataRobot model.

Create and deploy an external generative model¶

External model packages allow you to register and deploy external generative models. You can use the monitoring agent to access MLOps monitoring capabilities with these model types.

To create and deploy a model package for an external generative model:

Click Model Registry and on the Registered Models tab, click Add new package and select New external model package.
In the Register new external model dialog box, from the Prediction type list, click Text generation and add the required information about the agent-monitored generative model. To provide a training baseline for drift monitoring, in the Training data field, you should upload a dataset containing at least 20 rows of prompts and responses relevant to the topic your generative model is intended to answer questions about. These prompts and responses can be taken from documentation, manually created, or generated.
After you define all fields for the model package, click Register. The package is registered in the Model Registry and is available for use.
From the Model Registry > Registered Models tab, locate and deploy the generative model.
Add deployment information and complete the deployment.

Monitor a deployed generative model¶

To monitor a generative model in production, you can view service health and usage statistics, export deployment data, create custom metrics, and identify data drift.

Data drift for generative models¶

To monitor drift in a generative model's prediction data, DataRobot compares new prompts and responses to the prompts and responses in the training data you uploaded during model creation. To provide an adequate training baseline for comparison, the uploaded training dataset should contain at least 20 rows of prompts and responses relevant to the topic your model is intended to answer questions about. These prompts and responses can be taken from documentation, manually created, or generated.

On the Data Drift tab for a generative model, you can view the Feature Drift vs. Feature Importance, Feature Details, and Drift Over Time charts:

To learn how to adjust the Data Drift dashboard to focus on the model, time period, or feature you're interested in, see the Configure the Data Drift dashboard documentation.

The Feature Details chart includes new functionality for text generation models, providing a word cloud visualizing differences in the data distribution for each token in the dataset between the training and scoring periods. By default, the Feature Details chart includes information about the question (or prompt) and answer (or model completion/output):

Feature	Description
question	A word cloud visualizing the difference in data distribution for each user prompt token between the training and scoring periods and revealing how much each token contributes to data drift in the user prompt data.
answer	A word cloud visualizing the difference in data distribution for each model output token between the training and scoring periods and revealing how much each token contributes to data drift in the model output data.

Note

The feature names for the generative model's input and output depend on the feature names in your model's data; therefore, the question and answer features in the example above will be replaced by the names of the input and output columns in your model's data.

You can also designate other features for data drift tracking; for example, you could decide to track the model's temperature, monitoring the level of creativity in the generative model's responses from high creativity (1) to low (0).

To interpret the feature drift word cloud for a text feature like question or answer, hover over a user prompt or model output token to view the following details:

Chart element	Description
Token	The tokenized text represented by the word in the word cloud. Text size represents the token's drift contribution and text color represents the dataset prevalence. Stop words are hidden from this chart.
Drift contribution	How much this particular token contributes to the feature's drift value, as reported in the Feature Drift vs. Feature Importance chart.
Data distribution	How much more often this particular token appears in the training data or the predictions data. Blue: This token appears `X`% more often in training data. Red: This token appears`X`% more often in predictions data.

Tip

When your pointer is over the word cloud, you can scroll up to zoom in and view the text of smaller tokens.

Generative model monitoring¶

Create and deploy a generative custom model¶

Add a generative custom model¶

Assemble and deploy a generative custom model¶

Create and deploy an external generative model¶

Monitor a deployed generative model¶

Data drift for generative models¶

Was this page helpful?

Great! Let us know what you found helpful.

What can we do to improve the content?