Skip to content

On-premise users: click in-app to access the full platform documentation for your version of DataRobot.

Generative model monitoring

Availability information

Monitoring support for generative models is a premium feature. Contact your DataRobot representative or administrator for information on enabling this feature.

Using the text generation target type for custom and external models, a premium LLMOps feature, deploy generative Large Language Models (LLMs) to make predictions, monitor service, usage, and data drift statistics, and create custom metrics. DataRobot supports LLMs through two deployment methods:

Custom metrics for evaluation and moderation require an association ID

For the metrics added when you configure evaluations and moderations, to view data on the Custom metrics tab, ensure that you set an association ID and enable prediction storage before you start making predictions through the deployed LLM. If you don't set an association ID and provide association IDs alongside the LLM's predictions, the metrics for the moderations won't be calculated on the Custom metrics tab. After you define the association ID, you can enable automatic association ID generation to ensure these metrics appear on the Custom metrics tab. You can enable this setting during or after deployment.

Create and deploy a generative custom model

Custom inference models are user-created, pretrained models that you can upload to DataRobot (as a collection of files) via the Custom Model Workshop. You can then upload a model artifact to create, test, and deploy custom inference models to DataRobot's centralized deployment hub.

Add a generative custom model

To add a generative model to the Custom Model Workshop:

  1. Click Model Registry > Custom Model Workshop and, on the Models tab, click + Add new model.

  2. In the Add Custom Inference Model dialog box, under Target type, click Text Generation.

  3. Enter a Model name and Target name. In addition, you can click Show Optional Fields to define the language used to build the model and provide a description.

  4. Click Add Custom Model. The new custom model opens to the Assemble tab.

Assemble and deploy a generative custom model

To assemble, test, and deploy a generative model from the Custom Model Workshop:

  1. On the right side of the Assemble tab, under Model Environment, select a model environment from the Base Environment list. The model environment is used for testing and deploying the custom model.

    Note

    The Base Environment pulldown menu includes drop-in model environments, if any exist, as well as custom environments that you can create.

  2. On the left side of the Assemble tab, under Model, drag and drop files or click Browse local files to upload your LLM's custom model artifacts. Alternatively, you can import model files from a remote repository.

    Important

    If you click Browse local files, you have the option of adding a Local Folder. The local folder should contain dependent files and additional assets required by your model, not the model itself. If the model file is included in the folder, it will not be accessible to DataRobot. Instead, the model file must exist at the root level. The root file can then point to the dependencies in the folder.

    A basic LLM assembled in the Custom Model Workshop should include the following files:

    File Contents
    custom.py The custom model code, calling the LLM service's API through public network access for custom models.
    model-metadata.yaml The runtime parameters required by the generative model.
    requirements.txt The libraries (and versions) required by the generative model.

    The dependencies from requirements.txt appear under Model Environment in the Model Dependencies box.

  3. After you add the required model files, add training data. To provide a training baseline for drift monitoring, you should upload a dataset containing at least 20 rows of prompts and responses relevant to the topic your generative model is intended to answer questions about. These prompts and responses can be taken from documentation, manually created, or generated.

  4. Next, click the Test tab, click + New test, and then click Start test to run the Startup and Prediction error tests, the only tests supported for the Text Generation target type.

  5. Click Register to deploy, provide the model information, and click Add to registry.

    The model opens on the Registered Models tab.

  6. In the registered model version header, click Deploy, and then configure the deployment settings.

    You can now make predictions as you would with any other DataRobot model.

Create and deploy an external generative model

External model packages allow you to register and deploy external generative models. You can use the monitoring agent to access MLOps monitoring capabilities with these model types.

To create and deploy a model package for an external generative model:

  1. Click Model Registry and on the Registered Models tab, click Add new package and select New external model package.

  2. In the Register new external model dialog box, from the Prediction type list, click Text generation and add the required information about the agent-monitored generative model. To provide a training baseline for drift monitoring, in the Training data field, you should upload a dataset containing at least 20 rows of prompts and responses relevant to the topic your generative model is intended to answer questions about. These prompts and responses can be taken from documentation, manually created, or generated.

  3. After you define all fields for the model package, click Register. The package is registered in the Model Registry and is available for use.

  4. From the Model Registry > Registered Models tab, locate and deploy the generative model.

  5. Add deployment information and complete the deployment.

Monitor a deployed generative model

To monitor a generative model in production, you can view service health and usage statistics, export deployment data, create custom metrics, and identify data drift.

Data drift for generative models

To monitor drift in a generative model's prediction data, DataRobot compares new prompts and responses to the prompts and responses in the training data you uploaded during model creation. To provide an adequate training baseline for comparison, the uploaded training dataset should contain at least 20 rows of prompts and responses relevant to the topic your model is intended to answer questions about. These prompts and responses can be taken from documentation, manually created, or generated.

On the Data Drift tab for a generative model, you can view the Feature Drift vs. Feature Importance, Feature Details, and Drift Over Time charts:

To learn how to adjust the Data Drift dashboard to focus on the model, time period, or feature you're interested in, see the Configure the Data Drift dashboard documentation.

The Feature Details chart includes new functionality for text generation models, providing a word cloud visualizing differences in the data distribution for each token in the dataset between the training and scoring periods. By default, the Feature Details chart includes information about the question (or prompt) and answer (or model completion/output):

Feature Description
question A word cloud visualizing the difference in data distribution for each user prompt token between the training and scoring periods and revealing how much each token contributes to data drift in the user prompt data.
answer A word cloud visualizing the difference in data distribution for each model output token between the training and scoring periods and revealing how much each token contributes to data drift in the model output data.

Note

The feature names for the generative model's input and output depend on the feature names in your model's data; therefore, the question and answer features in the example above will be replaced by the names of the input and output columns in your model's data.

You can also designate other features for data drift tracking; for example, you could decide to track the model's temperature, monitoring the level of creativity in the generative model's responses from high creativity (1) to low (0).

To interpret the feature drift word cloud for a text feature like question or answer, hover over a user prompt or model output token to view the following details:

Chart element Description
Token The tokenized text represented by the word in the word cloud. Text size represents the token's drift contribution and text color represents the dataset prevalence. Stop words are hidden from this chart.
Drift contribution How much this particular token contributes to the feature's drift value, as reported in the Feature Drift vs. Feature Importance chart.
Data distribution How much more often this particular token appears in the training data or the predictions data.
  • Blue: This token appears X% more often in training data.
  • Red: This token appearsX% more often in predictions data.

Tip

When your pointer is over the word cloud, you can scroll up to zoom in and view the text of smaller tokens.


Updated August 16, 2024