# Define custom metrics

> Define custom metrics - The MetricBase class, along with four additional default classes, provides
> an interface to define custom metrics.

This Markdown file sits beside the HTML page at the same path (with a `.md` suffix). It summarizes the topic and lists links for tools and LLM context.

Companion generated at `2026-04-24T16:03:56.254441+00:00` (UTC).

## Primary page

- [Define custom metrics](https://docs.datarobot.com/en/docs/api/code-first-tools/dr-model-metrics/dmm-custom-metrics.html): Full documentation for this topic (HTML).

## Sections on this page

- [Create a metric base](https://docs.datarobot.com/en/docs/api/code-first-tools/dr-model-metrics/dmm-custom-metrics.html#create-a-metric-base): In-page section heading.
- [Create metrics calculated with predictions and actuals](https://docs.datarobot.com/en/docs/api/code-first-tools/dr-model-metrics/dmm-custom-metrics.html#create-metrics-calculated-with-predictions-and-actuals): In-page section heading.
- [Create metrics calculated with scoring data](https://docs.datarobot.com/en/docs/api/code-first-tools/dr-model-metrics/dmm-custom-metrics.html#create-metrics-calculated-with-scoring-data): In-page section heading.
- [Create LLM metrics](https://docs.datarobot.com/en/docs/api/code-first-tools/dr-model-metrics/dmm-custom-metrics.html#create-llm-metrics): In-page section heading.
- [Create Sklearn metrics](https://docs.datarobot.com/en/docs/api/code-first-tools/dr-model-metrics/dmm-custom-metrics.html#create-sklearn-metrics): In-page section heading.
- [PromptSimilarityMetricBase](https://docs.datarobot.com/en/docs/api/code-first-tools/dr-model-metrics/dmm-custom-metrics.html#promptsimilaritymetricbase): In-page section heading.
- [Report custom metric values](https://docs.datarobot.com/en/docs/api/code-first-tools/dr-model-metrics/dmm-custom-metrics.html#report-custom-metric-values): In-page section heading.

## Related documentation

- [Developer documentation](https://docs.datarobot.com/en/docs/api/index.html): Linked from this page.
- [Code-first tools](https://docs.datarobot.com/en/docs/api/code-first-tools/index.html): Linked from this page.
- [Model Metrics](https://docs.datarobot.com/en/docs/api/code-first-tools/dr-model-metrics/index.html): Linked from this page.

## Documentation content

# Define custom metrics

The `MetricBase` class provides an interface to define custom metrics. Four additional default classes can help you create custom metrics: `ModelMetricBase`, `DataMetricBase`, `LLMMetricBase`, and `SklearnMetric`.

## Create a metric base

In `MetricBase`, define the type of data a metric requires; the custom metric inherits that definition:

```
class MetricBase(object):
    def __init__(
        self,
        name: str,
        description: str = None,
        need_predictions: bool = False,
        need_actuals: bool = False,
        need_scoring_data: bool = False,
        need_training_data: bool = False,
    ):
        self.name = name
        self.description = description
        self._need_predictions = need_predictions
        self._need_actuals = need_actuals
        self._need_scoring_data = need_scoring_data
        self._need_training_data = need_training_data
```

In addition, you must implement the scoring and reduction methods in `MetricBase`:

- Scoring (score): Uses initialized data types to calculate a metric.
- Reduction (reduce_func): Reduces multiple values in the sameTimeBucketto one value.

```
    def score(
        self,
        scoring_data: pd.DataFrame,
        predictions: np.ndarray,
        actuals: np.ndarray,
        fit_ctx=None,
        metadata=None,
    ) -> float:
        raise NotImplemented

    def reduce_func(self) -> callable:
        return np.mean
```

## Create metrics calculated with predictions and actuals

`ModelMetricBase` is the base class for metrics that require actuals and predictions for metric calculation.

```
class ModelMetricBase(MetricBase):
    def __init__(
        self, name: str, description: str = None, need_training_data: bool = False
    ):
        super().__init__(
            name=name,
            description=description,
            need_scoring_data=False,
            need_predictions=True,
            need_actuals=True,
            need_training_data=need_training_data,
        )

    def score(
        self,
        prediction: np.ndarray,
        actuals: np.ndarray,
        fit_context=None,
        metadata=None,
        scoring_data=None,
    ) -> float:
        raise NotImplemented
```

## Create metrics calculated with scoring data

`DataMetricBase` is the base class for metrics that require scoring data for metric calculation.

```
class DataMetricBase(MetricBase):
    def __init__(
        self, name: str, description: str = None, need_training_data: bool = False
    ):
        super().__init__(
            name=name,
            description=description,
            need_scoring_data=True,
            need_predictions=False,
            need_actuals=False,
            need_training_data=need_training_data,
        )

    def score(
        self,
        scoring_data: pd.DataFrame,
        fit_ctx=None,
        metadata=None,
        predictions=None,
        actuals=None,
    ) -> float:
        raise NotImplemented
```

## Create LLM metrics

`LLMMetricBase` is the base class for LLM metrics that require scoring data and predictions for metric calculation, otherwise known as prompts (the user input) and completions (the LLM response).

```
class LLMMetricBase(MetricBase):
    def __init__(
        self, name: str, description: str = None, need_training_data: bool = False
    ):
        super().__init__(
            name=name,
            description=description,
            need_scoring_data=True,
            need_predictions=True,
            need_actuals=False,
            need_training_data=need_training_data,
        )

    def score(
        self,
        scoring_data: pd.DataFrame,
        predictions: np.ndarray,
        fit_ctx=None,
        metadata=None,
        actuals=None,
    ) -> float:
        raise NotImplemented
```

## Create Sklearn metrics

To accelerate the implementation of custom metrics, you can use ready-made, proven metrics from [Sklearn](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics). Provide the name of a metric, using the `SklearnMetric` class as the base class, to create a custom metric. For example:

```
from dmm.metric.sklearn_metric import SklearnMetric


class MedianAbsoluteError(SklearnMetric):
    """
    Metric that calculates the median absolute error of the difference between predictions and actuals
    """

    def __init__(self):
        super().__init__(
            metric="median_absolute_error",
        )
```

## PromptSimilarityMetricBase

The `PromptSimilarityMetricBase` class compares the LLM prompt and context vectors. This class is generally used with Text Generation models where the prompt and context vectors are populated as described below:

The base class pulls the vectors from the `scoring_data`, and iterates over each entry:

- The prompt vector is pulled from theprompt_column(which defaults to_LLM_PROMPT_VECTOR) of thescoring_data.
- The context vectors are pulled from thecontext_column(which defaults to_LLM_CONTEXT) of thescoring_data. The context column contains a list of context dictionaries, and each context needs to have avectorelement.

> [!NOTE] Note
> Both the `prompt_column` and `context_column` are expected to be JSON-encoded data.

A derived class must implement `calculate_distance()`. For this class, `score()` is already implemented.

The `calculate_distance` function returns a single floating point value based on a single `prompt_vector` and a list of `context_vectors`.

For an example using the `PromptSimilarityMetricBase`, review the code below calculating the minimum Euclidean distance:

```
from dmm.metric import PromptSimilarityMetricBase

class EuclideanMinMetric(PromptSimilarityMetricBase):
    """Calculate the minimum Euclidean distance between a prompt vector and a list of context vectors"""
    def calculate_distance(self, prompt_vector: np.ndarray, context_vectors: List[np.ndarray]) -> float:
        distances = [
            np.linalg.norm(prompt_vector - context_vector)
            for context_vector in context_vectors
        ]
        return min(distances)

# Instantiation could look like this
scorer = EuclideanMinMetric(name=custom_metric.name, description="Euclidean minimum distance between prompt and context vectors")
```

## Report custom metric values

The metrics described above provide the source of the custom metric definitions. Use the `CustomMetric` interface to retrieve the metadata of an existing custom metric in DataRobot and to report data to that custom metric. Initialize the metric by providing the parameters explicitly ( `metric_id`, `deployment_id`, `model_id`, `DataRobotClient()`):

```
from dmm import CustomMetric


cm = CustomMetric.from_id(metric_id=METRIC_ID, deployment_id=DEPLOYMENT_ID, model_id=MODEL_ID, client=CLIENT)
```

You can also define these parameters as environment variables:

| Parameter | Environment variable |
| --- | --- |
| metric_id | os.environ["CUSTOM_METRIC_ID"] |
| deployment_id | os.environ["DEPLOYMENT_ID"] |
| model_id | os.environ["MODEL_ID"] |
| DataRobotClient() | os.environ["BASE_URL"] and os.environ["DATAROBOT_ENDPOINT"] |

```
from dmm import CustomMetric


cm = CustomMetric.from_id()
```

Optionally, specify batch mode ( `is_batch=True`).

```
from dmm import CustomMetric


cm = CustomMetric.from_id(is_batch=True)
```

The `report` method submits custom metric values to a custom metric defined in DataRobot. To use this method, report a DataFrame in the shape of the output from the metric evaluator.

```
print(aggregated_metric_per_time_bucket.to_string())

                    timestamp  samples  median_absolute_error
1  01/06/2005 14:00:00.000000        2                  0.001

response = cm.report(df=aggregated_metric_per_time_bucket)
print(response.status_code)
202
```

The `dry_run` parameter determines if the custom metric values transfer is a dry run (the values aren't saved in the database) or if it is a production data transfer.

This parameter is `False` by default (the values are saved).

```
response = cm.report(df=aggregated_metric_per_time_bucket, dry_run=True)
print(response.status_code)
202
```
