Skip to content

On-premise users: click in-app to access the full platform documentation for your version of DataRobot.

Define custom metrics

The MetricBase class provides an interface to define custom metrics. Four additional default classes can help you create custom metrics: ModelMetricBase, DataMetricBase, LLMMetricBase, and SklearnMetric.

Create a metric base

In MetricBase, define the type of data a metric requires; the custom metric inherits that definition:

class MetricBase(object):
    def __init__(
        self,
        name: str,
        description: str = None,
        need_predictions: bool = False,
        need_actuals: bool = False,
        need_scoring_data: bool = False,
        need_training_data: bool = False,
    ):
        self.name = name
        self.description = description
        self._need_predictions = need_predictions
        self._need_actuals = need_actuals
        self._need_scoring_data = need_scoring_data
        self._need_training_data = need_training_data

In addition, you must implement the scoring and reduction methods in MetricBase:

  • Scoring (score): Uses initialized data types to calculate a metric.

  • Reduction (reduce_func): Reduces multiple values in the same TimeBucket to one value.

    def score(
        self,
        scoring_data: pd.DataFrame,
        predictions: np.ndarray,
        actuals: np.ndarray,
        fit_ctx=None,
        metadata=None,
    ) -> float:
        raise NotImplemented

    def reduce_func(self) -> callable:
        return np.mean

Create metrics calculated with predictions and actuals

ModelMetricBase is the base class for metrics that require actuals and predictions for metric calculation.

class ModelMetricBase(MetricBase):
    def __init__(
        self, name: str, description: str = None, need_training_data: bool = False
    ):
        super().__init__(
            name=name,
            description=description,
            need_scoring_data=False,
            need_predictions=True,
            need_actuals=True,
            need_training_data=need_training_data,
        )

    def score(
        self,
        prediction: np.ndarray,
        actuals: np.ndarray,
        fit_context=None,
        metadata=None,
        scoring_data=None,
    ) -> float:
        raise NotImplemented

Create metrics calculated with scoring data

DataMetricBase is the base class for metrics that require scoring data for metric calculation.

class DataMetricBase(MetricBase):
    def __init__(
        self, name: str, description: str = None, need_training_data: bool = False
    ):
        super().__init__(
            name=name,
            description=description,
            need_scoring_data=True,
            need_predictions=False,
            need_actuals=False,
            need_training_data=need_training_data,
        )

    def score(
        self,
        scoring_data: pd.DataFrame,
        fit_ctx=None,
        metadata=None,
        predictions=None,
        actuals=None,
    ) -> float:
        raise NotImplemented

Create LLM metrics

LLMMetricBase is the base class for LLM metrics that require scoring data and predictions for metric calculation, otherwise known as prompts (the user input) and completions (the LLM response).

class LLMMetricBase(MetricBase):
    def __init__(
        self, name: str, description: str = None, need_training_data: bool = False
    ):
        super().__init__(
            name=name,
            description=description,
            need_scoring_data=True,
            need_predictions=True,
            need_actuals=False,
            need_training_data=need_training_data,
        )

    def score(
        self,
        scoring_data: pd.DataFrame,
        predictions: np.ndarray,
        fit_ctx=None,
        metadata=None,
        actuals=None,
    ) -> float:
        raise NotImplemented

Create Sklearn metrics

To accelerate the implementation of custom metrics, you can use ready-made, proven metrics from Sklearn. Provide the name of a metric, using the SklearnMetric class as the base class, to create a custom metric. For example:

from dmm.metric.sklearn_metric import SklearnMetric


class MedianAbsoluteError(SklearnMetric):
    """
    Metric that calculates the median absolute error of the difference between predictions and actuals
    """

    def __init__(self):
        super().__init__(
            metric="median_absolute_error",
        )

PromptSimilarityMetricBase

The PromptSimilarityMetricBase class compares the LLM prompt and context vectors. This class is generally used with Text Generation models where the prompt and context vectors are populated as described below:

The base class pulls the vectors from the scoring_data, and iterates over each entry:

  • The prompt vector is pulled from the prompt_column (which defaults to _LLM_PROMPT_VECTOR) of the scoring_data.

  • The context vectors are pulled from the context_column (which defaults to _LLM_CONTEXT) of the scoring_data. The context column contains a list of context dictionaries, and each context needs to have a vector element.

Note

Both the prompt_column and context_column are expected to be JSON-encoded data.

A derived class must implement calculate_distance(). For this class, score() is already implemented.

The calculate_distance function returns a single floating point value based on a single prompt_vector and a list of context_vectors.

For an example using the PromptSimilarityMetricBase, review the code below calculating the minimum Euclidean distance:

from dmm.metric import PromptSimilarityMetricBase

class EuclideanMinMetric(PromptSimilarityMetricBase):
    """Calculate the minimum Euclidean distance between a prompt vector and a list of context vectors"""
    def calculate_distance(self, prompt_vector: np.ndarray, context_vectors: List[np.ndarray]) -> float:
        distances = [
            np.linalg.norm(prompt_vector - context_vector)
            for context_vector in context_vectors
        ]
        return min(distances)

# Instantiation could look like this
scorer = EuclideanMinMetric(name=custom_metric.name, description="Euclidean minimum distance between prompt and context vectors")

Report custom metric values

The metrics described above provide the source of the custom metric definitions. Use the CustomMetric interface to retrieve the metadata of an existing custom metric in DataRobot and to report data to that custom metric. Initialize the metric by providing the parameters explicitly (metric_id, deployment_id, model_id, DataRobotClient()):

from dmm import CustomMetric


cm = CustomMetric.from_id(metric_id=METRIC_ID, deployment_id=DEPLOYMENT_ID, model_id=MODEL_ID, client=CLIENT)

You can also define these parameters as environment variables:

Parameter Environment variable
metric_id os.environ["CUSTOM_METRIC_ID"]
deployment_id os.environ["DEPLOYMENT_ID"]
model_id os.environ["MODEL_ID"]
DataRobotClient() os.environ["BASE_URL"] and os.environ["DATAROBOT_ENDPOINT"]
from dmm import CustomMetric


cm = CustomMetric.from_id()

Optionally, specify batch mode (is_batch=True).

from dmm import CustomMetric


cm = CustomMetric.from_id(is_batch=True)

The report method submits custom metric values to a custom metric defined in DataRobot. To use this method, report a DataFrame in the shape of the output from the metric evaluator.

print(aggregated_metric_per_time_bucket.to_string())

                    timestamp  samples  median_absolute_error
1  01/06/2005 14:00:00.000000        2                  0.001

response = cm.report(df=aggregated_metric_per_time_bucket)
print(response.status_code)
202

The dry_run parameter determines if the custom metric values transfer is a dry run (the values aren't saved in the database) or if it is a production data transfer.

This parameter is False by default (the values are saved).

response = cm.report(df=aggregated_metric_per_time_bucket, dry_run=True)
print(response.status_code)
202

Updated December 3, 2024