Define custom metrics¶
The MetricBase
class provides an interface to define custom metrics. Four additional default classes can help you create custom metrics: ModelMetricBase
, DataMetricBase
, LLMMetricBase
, and SklearnMetric
.
Create a metric base¶
In MetricBase
, define the type of data a metric requires; the custom metric inherits that definition:
class MetricBase(object):
def __init__(
self,
name: str,
description: str = None,
need_predictions: bool = False,
need_actuals: bool = False,
need_scoring_data: bool = False,
need_training_data: bool = False,
):
self.name = name
self.description = description
self._need_predictions = need_predictions
self._need_actuals = need_actuals
self._need_scoring_data = need_scoring_data
self._need_training_data = need_training_data
In addition, you must implement the scoring and reduction methods in MetricBase
:
-
Scoring (
score
): Uses initialized data types to calculate a metric. -
Reduction (
reduce_func
): Reduces multiple values in the sameTimeBucket
to one value.
def score(
self,
scoring_data: pd.DataFrame,
predictions: np.ndarray,
actuals: np.ndarray,
fit_ctx=None,
metadata=None,
) -> float:
raise NotImplemented
def reduce_func(self) -> callable:
return np.mean
Create metrics calculated with predictions and actuals¶
ModelMetricBase
is the base class for metrics that require actuals and predictions for metric calculation.
class ModelMetricBase(MetricBase):
def __init__(
self, name: str, description: str = None, need_training_data: bool = False
):
super().__init__(
name=name,
description=description,
need_scoring_data=False,
need_predictions=True,
need_actuals=True,
need_training_data=need_training_data,
)
def score(
self,
prediction: np.ndarray,
actuals: np.ndarray,
fit_context=None,
metadata=None,
scoring_data=None,
) -> float:
raise NotImplemented
Create metrics calculated with scoring data¶
DataMetricBase
is the base class for metrics that require scoring data for metric calculation.
class DataMetricBase(MetricBase):
def __init__(
self, name: str, description: str = None, need_training_data: bool = False
):
super().__init__(
name=name,
description=description,
need_scoring_data=True,
need_predictions=False,
need_actuals=False,
need_training_data=need_training_data,
)
def score(
self,
scoring_data: pd.DataFrame,
fit_ctx=None,
metadata=None,
predictions=None,
actuals=None,
) -> float:
raise NotImplemented
Create LLM metrics¶
LLMMetricBase
is the base class for LLM metrics that require scoring data and predictions for metric calculation, otherwise known as prompts (the user input) and completions (the LLM response).
class LLMMetricBase(MetricBase):
def __init__(
self, name: str, description: str = None, need_training_data: bool = False
):
super().__init__(
name=name,
description=description,
need_scoring_data=True,
need_predictions=True,
need_actuals=False,
need_training_data=need_training_data,
)
def score(
self,
scoring_data: pd.DataFrame,
predictions: np.ndarray,
fit_ctx=None,
metadata=None,
actuals=None,
) -> float:
raise NotImplemented
Create Sklearn metrics¶
To accelerate the implementation of custom metrics, you can use ready-made, proven metrics from Sklearn. Provide the name of a metric, using the SklearnMetric
class as the base class, to create a custom metric. For example:
from dmm.metric.sklearn_metric import SklearnMetric
class MedianAbsoluteError(SklearnMetric):
"""
Metric that calculates the median absolute error of the difference between predictions and actuals
"""
def __init__(self):
super().__init__(
metric="median_absolute_error",
)
PromptSimilarityMetricBase¶
The PromptSimilarityMetricBase
class compares the LLM prompt and context vectors. This class is generally used with Text Generation models where the prompt and context vectors are populated as described below:
The base class pulls the vectors from the scoring_data
, and iterates over each entry:
-
The prompt vector is pulled from the
prompt_column
(which defaults to_LLM_PROMPT_VECTOR
) of thescoring_data
. -
The context vectors are pulled from the
context_column
(which defaults to_LLM_CONTEXT
) of thescoring_data
. The context column contains a list of context dictionaries, and each context needs to have avector
element.
Note
Both the prompt_column
and context_column
are expected to be JSON-encoded data.
A derived class must implement calculate_distance()
. For this class, score()
is already implemented.
The calculate_distance
function returns a single floating point value based on a single prompt_vector
and a list of context_vectors
.
For an example using the PromptSimilarityMetricBase
, review the code below calculating the minimum Euclidean distance:
from dmm.metric import PromptSimilarityMetricBase
class EuclideanMinMetric(PromptSimilarityMetricBase):
"""Calculate the minimum Euclidean distance between a prompt vector and a list of context vectors"""
def calculate_distance(self, prompt_vector: np.ndarray, context_vectors: List[np.ndarray]) -> float:
distances = [
np.linalg.norm(prompt_vector - context_vector)
for context_vector in context_vectors
]
return min(distances)
# Instantiation could look like this
scorer = EuclideanMinMetric(name=custom_metric.name, description="Euclidean minimum distance between prompt and context vectors")
Report custom metric values¶
The metrics described above provide the source of the custom metric definitions. Use the CustomMetric
interface to retrieve the metadata of an existing custom metric in DataRobot and to report data to that custom metric. Initialize the metric by providing the parameters explicitly (metric_id
, deployment_id
, model_id
, DataRobotClient()
):
from dmm import CustomMetric
cm = CustomMetric.from_id(metric_id=METRIC_ID, deployment_id=DEPLOYMENT_ID, model_id=MODEL_ID, client=CLIENT)
You can also define these parameters as environment variables:
Parameter | Environment variable |
---|---|
metric_id |
os.environ["CUSTOM_METRIC_ID"] |
deployment_id |
os.environ["DEPLOYMENT_ID"] |
model_id |
os.environ["MODEL_ID"] |
DataRobotClient() |
os.environ["BASE_URL"] and os.environ["DATAROBOT_ENDPOINT"] |
from dmm import CustomMetric
cm = CustomMetric.from_id()
Optionally, specify batch mode (is_batch=True
).
from dmm import CustomMetric
cm = CustomMetric.from_id(is_batch=True)
The report
method submits custom metric values to a custom metric defined in DataRobot. To use this method, report a DataFrame in the shape of the output from the metric evaluator.
print(aggregated_metric_per_time_bucket.to_string())
timestamp samples median_absolute_error
1 01/06/2005 14:00:00.000000 2 0.001
response = cm.report(df=aggregated_metric_per_time_bucket)
print(response.status_code)
202
The dry_run
parameter determines if the custom metric values transfer is a dry run (the values aren't saved in the database) or if it is a production data transfer.
This parameter is False
by default (the values are saved).
response = cm.report(df=aggregated_metric_per_time_bucket, dry_run=True)
print(response.status_code)
202