Skip to content

On-premise users: click in-app to access the full platform documentation for your version of DataRobot.

Custom metrics

On a deployment's Monitoring > Custom metrics tab, you can use the data you collect from the Data exploration tab (or data calculated through other custom metrics) to compute and monitor custom business or performance metrics. These metrics are recorded on the configurable Custom metrics summary dashboard, where you monitor, visualize, and export each metric's change over time. This feature allows you to implement your organization's specialized metrics, expanding on the insights provided by DataRobot's built-in service health, data drift, and accuracy metrics.

Custom metrics limits

You can have up to 50 custom metrics per deployment, and of those 50, 5 can be hosted custom metrics.

To view and add custom metrics, in the Console, open the deployment for which you want to create custom metrics and click the Monitoring > Custom metrics tab:

What types of custom metrics are supported?

Three types of custom metrics are available for use:

Custom metric type Description
External custom metrics
  • What: A custom metric where the calculations of the metric are not directly hosted by DataRobot. An external metric is a simple API used to submit a metric value for DataRobot to save and visualize. The metric calculation is handled externally, by the user. External metrics can be combined with other tools in DataRobot like Notebooks, Jobs, or Custom Models, or external tools like Airflow or cloud providers to provide the hosting and calculation needed for a particular metric.
  • Where: On the Custom Metrics tab of any deployment, click Add custom metric > New external metric (or Create new external metric in Classic).
  • Why: Provides a simple option to save a number from your AI solution that you want for tracking and visualization in DataRobot. For example, you could track the change in LLM cost, calculated by your LLM provider, over time.
Hosted custom metrics
  • What: A custom metric where the metric calculations are hosted in a custom job within DataRobot. For hosted metrics, DataRobot orchestrates pulling the data, computing the metric values, saving the values to storage, and visualizing the data. No outside tools or infrastructure are required.
  • Where: On the Custom Metrics tab of any deployment, click Add custom metric > New hosted metric (or Create new hosted metric in Classic).
  • Why: Provides a complete end-to-end workflow for building business-specific metrics and dashboards in DataRobot.
Hosted custom metric templates
  • What: A template or ready-to-use example of a hosted custom metric, where DataRobot provides the user the code and automates the creation process for a hosted custom metric. For metric templates, the result is a hosted metric, without starting from scratch. Templates are provided by DataRobot and can be used as-is or modified to calculate new metrics.
  • Where: On the Custom Metrics tab of any deployment in NextGen only, click Add custom metric > Create new from template.
  • Why: Provides the simplest way to get started with custom metrics, where DataRobot provides an example implementation and a complete end-to-end workflow. They are ready to use in just a few clicks.

Add custom metrics

To add a metric, in a predictive or generative modeling deployment, click the Monitoring > Custom metrics tab. Then, on the Custom metrics tab, click + Add custom metric, select one of the following custom metric types, and proceed to the configuration steps linked in the table:

Custom metric type Description
New external metric Add a custom metric where the calculations of the metric are not directly hosted by DataRobot. An external metric is a simple API used to submit a metric value for DataRobot to save and visualize. The metric calculation is handled externally, by the user. External metrics can be combined with other tools in DataRobot like notebooks, jobs, or custom models, or external tools like Airflow or cloud providers to provide the hosting and calculation needed for a particular metric.

External custom metrics provide a simple option to save a value from your AI solution for tracking and visualization in DataRobot. For example, you could track the change in LLM cost, calculated by your LLM provider, over time.
New hosted metric Add a custom metric where the metric calculations are hosted in a custom job within DataRobot. For hosted metrics, DataRobot orchestrates pulling the data, computing the metric values, saving the values to storage, and visualizing the data. No outside tools or infrastructure are required.

Hosted custom metrics provide a complete end-to-end workflow for building business-specific metrics and dashboards in DataRobot.
Create new from template Add a custom metric from a template, or ready-to-use example of a hosted custom metric, where DataRobot provides the code and automates the creation process. With metric templates, the result is a hosted metric, without starting from scratch. Templates are provided by DataRobot and can be used as-is or modified to calculate new metrics.

Hosted custom metric templates provide the simplest way to get started with custom metrics, where DataRobot provides an example implementation and a complete end-to-end workflow. They are ready to use in just a few clicks.

Add external custom metrics

External custom metrics allow you to create metrics with calculations occurring outside of DataRobot. With an external metric, you can submit a metric value for DataRobot to save and visualize. External metrics can be combined with other tools in DataRobot like notebooks, jobs, or custom models, or external tools like Airflow or cloud providers to provide the hosting and calculation needed for a particular metric.

To add an external custom metric, in the Add custom metric dialog box, configure the metric settings, and then click + Add custom metric:

Preview

Categorical custom metrics are on by default.

Feature flag: Enable Categorical Custom Metrics

Field Description
Name A descriptive name for the metric. This name appears on the Custom metrics summary dashboard.
Description (Optional) A description of the custom metric; for example, you could describe the purpose, calculation method, and more.
Name of Y-axis (label) A descriptive name for the dependent variable. This name appears on the custom metric's chart on the Custom Metric Summary dashboard.
Default interval The default interval used by the selected Aggregation type. Only HOUR is supported.
Metric type The type of metric to create, Numeric or Categorical. The available metric settings change based on this selection.
Numeric metric settings
Baseline (Optional) The value used as a basis for comparison when calculating the x% better or x% worse values.
Aggregation type The type of metric calculation. Select from Sum, Average, or Gauge—a metric with a distinct value measured at single point in time.
Metric direction The directionality of the metric, controlling how changes to the metric are visualized. You can select Higher is better or Lower is better. For example, if you choose Lower is better, a 10% decrease in the calculated value of your custom metric will be considered 10% better, and displayed in green.
Categorical metric settings
Class name For each class added, a descriptive name (maximum of 200 characters).
Baseline (Optional) For each class added, the value used as a basis for comparison when calculating the x% better or x% worse values.
Class direction For each class added, the directionality of the metric, controlling how changes to the metric are visualized. You can select Higher is better or Lower is better. For example, if you choose Lower is better, a 10% decrease in the calculated value of your custom metric will be considered 10% better, and displayed in green.
+ Add class To define each class needed for the categorical metric, click + Add class and configure the required class settings listed above. You can add up to ten classes. To remove a class, click Delete class.
Model specific aggregation setting
Is model-specific When enabled, links the metric to the model with the Model Package ID (the Registered Model Version ID) provided in the dataset. This setting influences when values are aggregated (or uploaded). For example:
  • Model-specific (enabled): Model accuracy metrics are model-specific, so the values are aggregated separately. When you replace a model, the chart for your custom accuracy metric only shows data for the days after the replacement.
  • Not model-specific (disabled): Revenue metrics aren't model-specific, so the values are aggregated together. When you replace a model, the chart for your custom revenue metric doesn't change.
This field can't be edited after you create the metric.
Column name definitions
Timestamp column The column in the dataset containing a timestamp.
Value column The column in the dataset containing the values used for custom metric calculation.
Date format (Optional) The date format used by the timestamp column.

Note

You can override the Column names definition settings when you upload data to a custom metric, as described below.

Add hosted custom metrics

Hosted custom metrics allow you to implement up to 5 of your organization's specialized metrics in a deployment, uploading the custom metric code using DataRobot Notebooks and hosting the metric calculation on custom jobs infrastructure. After creation, these custom metrics can be reused for other deployments.

Custom metrics limits

You can have up to 50 custom metrics per deployment, and of those 50, 5 can be hosted custom metrics.

To add a hosted custom metric, in the Add Custom Metric dialog box configure the metric settings, and then click Add custom metric from notebook:

Field Description
Name (Required) A descriptive name for the metric. This name appears on the Custom Metric Summary dashboard.
Description A description of the custom metric; for example, you could describe the purpose, calculation method, and more.
Name of y-axis (label) (Required) A descriptive name for the dependent variable. This name appears on the custom metric's chart on the Custom Metric Summary dashboard.
Default interval Determines the default interval used by the selected Aggregation type. Only HOUR is supported.
Baseline Determines the value used as a basis for comparison when calculating the x% better or x% worse values.
Aggregation type Determines if the metric is calculated as a Sum, Average, or Gauge—a metric with a distinct value measured at single point in time.
Metric direction Determines the directionality of the metric, which controls how changes to the metric are visualized. You can select Higher is better or Lower is better. For example, if you choose Lower is better a 10% decrease in the calculated value of your custom metric will be considered 10% better, displayed in green.
Is Model Specific When enabled, this setting links the metric to the model with the Model Package ID (Registered Model Version ID) provided in the dataset. This setting influences when values are aggregated (or uploaded). For example:
  • Model specific (enabled): Model accuracy metrics are model specific, so the values are aggregated completely separately. When you replace a model, the chart for your custom accuracy metric only shows data for the days after the replacement.
  • Not model specific (disabled): Revenue metrics aren't model specific, so the values are aggregated together. When you replace a model, the chart for your custom revenue metric doesn't change.
This field can't be edited after you create the metric.
Schedule Defines when the custom metrics are populated. Select a frequency (hourly, daily, monthly, etc.) and a time. Select Use advanced scheduler for more precise scheduling options.

After configuring a custom metric, DataRobot loads the notebook that contains the metric's code. The notebook contains one custom metric cell. A custom metric cell is a unique notebook cell, containing Python code defining how the metric is exported and calculated, code for scoring, and code to populate the metric. Modify the code in the custom metric cell as needed. Then, test the code by clicking Test custom metric code at the bottom of the cell. The test creates a custom job. If the test runs successfully, click Deploy custom metric code to add the custom metric to your deployment.

Availability information

Notebooks for hosted custom metrics are off by default. Contact your DataRobot representative or administrator for information on enabling this feature.

Feature flag: Enable Notebooks Custom Environments

If the code does not run properly, you will receive the Testing custom metric code failed warning after testing completes. Click Open custom metric job to access the job and check the logs to troubleshoot the issue:

To troubleshoot a custom metric's code, navigate to the job's Runs tab, containing a log of the failed test. In the failed run, click View log.

The custom metrics gallery provides a centralized library containing pre-made, reusable, and shareable code implementing a variety of hosted custom metrics for predictive and generative models. These metrics are recorded on the configurable Custom Metric Summary dashboard, alongside any external custom metrics. From this dashboard, you can monitor, visualize, and export each metric's change over time. This feature allows you to implement your organization's specialized metrics, expanding on the insights provided by DataRobot's built-in service health, data drift, and accuracy metrics.

To add a pre-made custom metric to a deployment:

  1. In the Add custom metric panel, select a custom metric template applicable to your use case.

    The metrics available for your deployment appear at the top of the list with the metric icon highlighted. For example, in a generative modeling deployment:

    DataRobot provides three different categories of Metric type:

    Custom metric template Description
    Recall for top x% Measures model performance limited to a certain top fraction of the sorted predicted probabilities. Recall is a measure of a model's performance that calculates the proportion of actual positives that are correctly identified by the model.
    Precision for top x% Measures model performance limited to a certain top fraction of the sorted predicted probabilities. Precision is a measure of a model's performance that calculates the proportion of correctly predicted positive observations from the total predicted positive.
    F1 for top x% Measures model performance limited to a certain top fraction of the sorted predicted probabilities. F1 score is a measure of a model's performance which considers both precision and recall.
    AUC (Area Under the ROC Curve) for top x% Measures model performance limited to a certain top fraction of the sorted predicted probabilities.
    Custom metric template Description
    Mean Squared Logarithmic Error (MSLE) Calculates the mean of the squared differences between logarithms of the predicted and actual values. It is a loss function used in regression problems when the target values are expected to have exponential growth, like population counts, average sales of a commodity over a time period, and so on.
    Median Absolute Error (MedAE) Calculates the median of the absolute differences between the target and the predicted values. It is a robust metric used in regression problems to measure the accuracy of predictions.
    Custom metric template Description
    Completion Reading Time Estimates the average time it takes a person to read text generated by the LLM.
    Completion Tokens Mean Calculates the mean number of tokens in completions for the time period requested. The cl100k_base encoding used only supports OpenAI models: gpt-4, gpt-3.5-turbo, and text-embedding-ada-002, if you use a different model, change the encoding.
    Cosine Similarity Average Calculates the mean cosine similarity between each prompt vector and corresponding context vectors.
    Cosine Similarity Maximum Calculates the maximum cosine similarity between each prompt vector and corresponding context vectors.
    Cosine Similarity Minimum Calculates the minimum cosine similarity between each prompt vector and corresponding context vectors.
    Cost Estimates the financial cost of using the LLM by calculating the number of tokens in the input, output, and retrieved text, and then applying token pricing. The cl100k_base encoding used only supports OpenAI models: gpt-4, gpt-3.5-turbo, and text-embedding-ada-002, if you use a different model, change the encoding.
    Dale Chall Readability Measures the U.S. grade level required to understand a text based on the percentage of difficult words and average sentence length.
    Euclidean Average Calculates the mean Euclidean distance between each prompt vector and corresponding context vectors.
    Euclidean Maximum Calculates the maximum Euclidean distance between each prompt vector and corresponding context vectors.
    Euclidean Minimum Calculates the minimum Euclidean distance between each prompt vector and corresponding context vectors.
    Flesch Reading Ease Measures the readability of text based on the average sentence length and average number of syllables per word.
    Prompt Injection [sidecar metric] Detects input manipulations, such as overwriting or altering system prompts, that are intended to modify the model's output. This metric requires an additional deployment of the Prompt Injection Classifier global model.
    Prompt Tokens Mean Calculates the mean number of tokens in prompts for the time period requested. The cl100k_base encoding used only supports OpenAI models: gpt-4, gpt-3.5-turbo, and text-embedding-ada-002, if you use a different model, change the encoding.
    Sentence Count Calculates the total number of sentences in user prompts and text generated by the LLM.
    Sentiment Classifies text sentiment as positive or negative
    Sentiment [sidecar metric] Classifies text sentiment as positive or negative using a pre-trained sentiment classification model. This metric requires an additional deployment of the Sentiment Classifier global model.
    Syllable Count Calculates the total number of syllables in the words in user prompts and text generated by the LLM.
    Tokens Mean Calculates the mean of tokens in prompts and completions. The cl100k_base encoding used only supports OpenAI models: gpt-4, gpt-3.5-turbo, and text-embedding-ada-002, if you use a different model, change the encoding.
    Toxicity [sidecar metric] Measures the toxicity of text using a pre-trained hate speech classification model to safeguard against harmful content. This metric requires an additional deployment of the Toxicity Classifier global model.
    Word Count Calculates the total number of words in user prompts and text generated by the LLM.
    Japanese text metrics
    [JP] Character Count Calculates the total number of characters generated while working with the LLM.
    [JP] PII occurrence count Calculates the total number of PII occurrences while working with the LLM.
  2. After you select a metric from the list, in the Custom metric configuration sidebar, configure a metric calculation schedule or run the metric calculation immediately, and, optionally, set a metric baseline value.

  3. Click Create metric. The new metric appears on the Custom metrics dashboard.

  4. After you create a custom metric, you can view the custom job associated with the metric. This job runs on the metric's defined schedule, in the same way as hosted custom metrics (those not from the gallery). To access and manage the associated custom job, click the Actions menu and then click Open Custom Job:

    Sidecar metrics

    If you selected a [sidecar metric], when you open the Assemble tab, navigate to the Runtime Parameters section to set the SIDECAR_DEPLOYMENT_ID, associating the sidecar metric with the connected deployment required to calculate that metric. If you haven't deployed a model to calculate the metric, you can find pre-defined models for these metrics as global models.

Upload data to custom metrics

After you create a custom metric, you can provide data to calculate the metric:

  1. On the Custom metrics tab, locate the custom metric for which you want to upload data and click the Upload Data icon.

  2. In the Upload data dialog box, select an upload method and click Next:

    Upload method Description
    Use Data Registry In the Select a dataset panel, upload a dataset or click a dataset from the list, and then click Confirm. The Data Registry includes datasets from the Data exploration tab.
    Use API In the Use API Client panel, click Copy to clipboard, and then modify and use the API snippet to upload a dataset. You can upload up to 10,000 values in one API call.
  3. In the Select dataset columns dialog box, configure the following:

    Field Description
    Timestamp column (Required) The column in the dataset containing a timestamp.
    Value column (Required) The column in the dataset containing the values used for custom metric calculation.
    Association ID The row containing the association ID required by the custom metric to link predicted values to actuals.
    Date format The date format used by the timestamp column.
  4. Click Upload data.

Manage custom metrics

On the Custom metrics dashboard, after you've added your custom metrics, you can edit or delete them:

On the Custom metrics tab, locate the custom metric you want to manage, and then click the Actions menu :

  • To edit a metric, click Edit, update any configurable settings, and then click Update custom metric.

  • To delete a metric, click Delete.

Configure the custom metric dashboard display settings

Configure the following settings to specify the custom metric calculations you want to view on the dashboard:

Custom metrics for evaluation and moderation require an association ID

For the metrics added when you configure evaluations and moderations, to view data on the Custom metrics tab, ensure that you set an association ID and enable prediction storage before you start making predictions through the deployed LLM. If you don't set an association ID and provide association IDs alongside the LLM's predictions, the metrics for the moderations won't be calculated on the Custom metrics tab. After you define the association ID, you can enable automatic association ID generation to ensure these metrics appear on the Custom metrics tab. You can enable this setting during or after deployment.

Setting Description
1 Model Select the deployment's model, current or previous, to show custom metrics for.
2 Range (UTC) / Date Slider Select the start and end dates of the period from which you want to view custom metrics.
3 Resolution Select the granularity of the date slider. Select from hourly, daily, weekly, and monthly granularity based on the time range selected. If the time range is longer than 7 days, hourly granularity is not available.
4 Refresh Refresh the custom metric dashboard.
5 Reset Reset the custom metric dashboard's display settings to the default.

Arrange or hide metrics on the dashboard

To arrange or hide metrics on the Custom metrics summary dashboard, locate the custom metric you want to move or hide:

  • To move a metric, click the grid icon on the left side of the metric tile and then drag the metric to a new location.

  • To hide a metric chart, clear the checkbox next to the metric name.

Select chart type for categorical metrics

Preview

Categorical custom metrics are on by default.

Feature flag: Enable Categorical Custom Metrics

If you added a categorical external custom metric, the metric chart on the Custom metrics summary dashboard is viewable as a line chart or a bar chart. To change the chart view, click the settings icon in the upper-right corner of the plot area, and then select (or clear) the View as line chart checkbox:

Explore metric data

Availability information

Tracing is a premium feature. Contact your DataRobot representative or administrator for information on enabling this feature.

Feature flags: Enable Data Quality Table for Text Generation Target Types (Premium feature), Enable Actuals Storage for Generative Models (Premium feature)

On the Custom metrics tab of a Generative AI deployment, in a custom metric chart's header, click Show tracing to view prompts, responses, user ratings, and other custom metric scores per row, matched by association ID. This view provides insight into the quality of the Generative AI model's responses, as rated by users and based on any Generative AI custom metrics you implement. You can also click the download icon to download the data for the custom metric:

Prompt and response matching

To use the tracing table, define an association ID to match prompts with responses in the same row. Tracing analysis is only available for prompts and responses matched in the same row by association ID; aggregate custom metric data is excluded.

After you open the tracing table for data exploration, you can view prompts, responses, user ratings, and other custom metric scores per row, matched by association ID. The custom metric currently presented in the table is highlighted. Locate specific rows in the Tracing table by searching. Click Search by and select Prompt values, Response, or Actual values. Then, click Search. Click the open icon to expand the details panel, displaying a row's full Prompt and the Response matched with that prompt by association ID:

Additionally, to configure the rows displayed in the tracing table, click Settings to open the Column management panel, where columns can be selected, hidden, or rearranged. To export columns for external use, click Export all in selected range to export every row in the time range currently selected at the top of the Custom metrics, or click Export selected rows if you've selected one or more rows in the table.


Updated October 15, 2024