Skip to content

On-premise users: click in-app to access the full platform documentation for your version of DataRobot.

Define custom metrics

The MetricBase class provides an interface to define custom metrics. Four additional default classes can help you create custom metrics: ModelMetricBase, DataMetricBase, LLMMetricBase, and SklearnMetric.

Create a metric base

In MetricBase, define the type of data a metric requires; the custom metric inherits that definition:

class MetricBase(object):
    def __init__(
        self,
        name: str,
        description: str = None,
        need_predictions: bool = False,
        need_actuals: bool = False,
        need_scoring_data: bool = False,
        need_training_data: bool = False,
    ):
        self.name = name
        self.description = description
        self._need_predictions = need_predictions
        self._need_actuals = need_actuals
        self._need_scoring_data = need_scoring_data
        self._need_training_data = need_training_data

In addition, you must implement the scoring and reduction methods in MetricBase:

  • Scoring (score): Uses initialized data types to calculate a metric.

  • Reduction (reduce_func): Reduces multiple values in the same TimeBucket to one value.

    def score(
        self,
        scoring_data: pd.DataFrame,
        predictions: np.array,
        actuals: np.array,
        fit_ctx=None,
        metadata=None,
    ) -> float:
        raise NotImplemented

    def reduce_func(self) -> callable:
        return np.mean

Create metrics calculated with predictions and actuals

ModelMetricBase is the base class for metrics that require actuals and predictions for metric calculation.

class ModelMetricBase(MetricBase):
    def __init__(
        self, name: str, description: str = None, need_training_data: bool = False
    ):
        super().__init__(
            name=name,
            description=description,
            need_scoring_data=False,
            need_predictions=True,
            need_actuals=True,
            need_training_data=need_training_data,
        )

    def score(
        self,
        prediction: np.array,
        actuals: np.array,
        fit_context=None,
        metadata=None,
        scoring_data=None,
    ) -> float:
        raise NotImplemented

Create metrics calculated with scoring data

DataMetricBase is the base class for metrics that require scoring data for metric calculation.

class DataMetricBase(MetricBase):
    def __init__(
        self, name: str, description: str = None, need_training_data: bool = False
    ):
        super().__init__(
            name=name,
            description=description,
            need_scoring_data=True,
            need_predictions=False,
            need_actuals=False,
            need_training_data=need_training_data,
        )

    def score(
        self,
        scoring_data: pd.DataFrame,
        fit_ctx=None,
        metadata=None,
        predictions=None,
        actuals=None,
    ) -> float:
        raise NotImplemented

Create LLM metrics

LLMMetricBase is the base class for LLM metrics that require scoring data and predictions for metric calculation, otherwise known as prompts (the user input) and completions (the LLM response).

class LLMMetricBase(MetricBase):
    def __init__(
        self, name: str, description: str = None, need_training_data: bool = False
    ):
        super().__init__(
            name=name,
            description=description,
            need_scoring_data=True,
            need_predictions=True,
            need_actuals=False,
            need_training_data=need_training_data,
        )

    def score(
        self,
        scoring_data: pd.DataFrame,
        predictions: np.array,
        fit_ctx=None,
        metadata=None,
        actuals=None,
    ) -> float:
        raise NotImplemented

Create Sklearn metrics

To accelerate the implementation of custom metrics, you can use ready-made, proven metrics from Sklearn. Provide the name of a metric, using the SklearnMetric class as the base class, to create a custom metric. For example:

from dmm.metric.sklearn_metric import SklearnMetric


class MedianAbsoluteError(SklearnMetric):
    """
    Metric that calculates the median absolute error of the difference between predictions and actuals
    """

    def __init__(self):
        super().__init__(
            metric="median_absolute_error",
        )

Report custom metric values

The metrics described above provide the source of the custom metric definitions. Use the CustomMetric interface to retrieve the metadata of an existing custom metric in DataRobot and to report data to that custom metric. Initialize the metric by providing the parameters explicitly (metric_id, deployment_id, model_id, dr.Client()):

from dmm.custom_metric import CustomMetric


cm = CustomMetric.from_id(metric_id=METRIC_ID, deployment_id=DEPLOYMENT_ID, model_id=MODEL_ID, client=CLIENT)

You can also define these parameters as environment variables:

Parameter Environment variable
metric_id os.environ["CUSTOM_METRIC_ID"]
deployment_id os.environ["DEPLOYMENT_ID"]
model_id os.environ["MODEL_ID"]
dr.Client() os.environ["BASE_URL"] and os.environ["DATAROBOT_ENDPOINT"]
from dmm.custom_metric import CustomMetric


cm = CustomMetric.from_id()

Optionally, specify batch mode (is_batch=True).

from dmm.custom_metric import CustomMetric


cm = CustomMetric.from_id(is_batch=True)

The report method submits custom metric values to a custom metric defined in DataRobot. To use this method, report a DataFrame in the shape of the output from the metric evaluator.

print(aggregated_metric_per_time_bucket.to_string())

                    timestamp  samples  median_absolute_error
1  01/06/2005 14:00:00.000000        2                  0.001

response = cm.report(df=aggregated_metric_per_time_bucket)
print(response.status_code)
202

The dry_run parameter determines if the custom metric values transfer is a dry run (the values aren't saved in the database) or if it is a production data transfer.

This parameter is False by default (the values are saved).

response = cm.report(df=aggregated_metric_per_time_bucket, dry_run=True)
print(response.status_code)
202

Updated July 24, 2024