Configure an environment for DMM¶
There are two primary ecosystems where you can develop a custom metric with the DataRobot Model Metrics library (DMM):
- Within the DataRobot application (via a notebook or a scheduled job).
- In a local development environment.
Environment considerations
Installing Python modules:
- When running DMM from within the DataRobot application, the ecosystem is pre-configured with all required Python modules.
- When running DMM locally, you need to install the
dmm
module. This automatically updates the Python environment with all required modules.
Setting DMM parameters:
- When running DMM from within the DataRobot application, the parameters are set through environment variables.
- When running DMM locally, it is recommended that you pass values using arguments to set these parameters, rather than setting environment variables.
Initialize the environment¶
The CustomMetricArgumentParser
class wraps the standard argparser.ArgumentParser
. This class provides convenience functions to allow reading values from the environment or normal argument parsing. When CustomMetricArgumentParser.parse_args()
is called, it checks for missing values.
The log_manager
provides a set of functions to help with logging. The DMM library and the DataRobot public API client use standard Python logging
primitives. A complete list of log classes with their current levels is available using get_log_levels()
. The initialize_loggers()
function initializes all loggers:
2024-08-09 02:19:50 PM - dmm.data_source.datarobot_source - INFO - fetching the next predictions dataframe... 2024-07-15 00:00:00 - 2024-08-09 14:19:46.643722
2024-08-09 02:19:56 PM - urllib3.connectionpool - DEBUG - https://app.datarobot.com:443 "POST /api/v2/deployments/66a90a711zd81645df8c469c/predictionDataExports/ HTTP/1.1" 202 368
The following snippet shows how to set up your runtime environment using the previously mentioned classes:
import sys
from dmm import CustomMetricArgumentParser
from dmm.log_manager import initialize_loggers
parser = CustomMetricArgumentParser(description="My new custom metric")
parser.add_base_args() # adds standard arguments
# Add more with standard ArgumentParser primitives, or some convenience functions such as add_environment_arg()
# Parse the program arguments (if any) to an argparse.Namespace.
args = parser.parse_args(sys.argv[1:])
# Initialize the logging based on the 'LOG' environment variable, or the --log option
initialize_loggers(args.log)
The standard/base arguments include the following:
引数 | 説明 |
---|---|
BASE_URL |
The URL of the public API. |
API_KEY |
The API token used for authentication to the server located at BASE_URL . |
DEPLOYMENT_ID |
The deployment ID from the application. |
CUSTOM_METRIC_ID |
The custom metric ID from the application. |
DRY_RUN |
The flag to indicate whether to report the custom metric result to the deployment. With the DRY_RUN runtime parameter set to 1 , the run is a test run and does not report metric data. |
START_TS |
The start of the time range for metric calculation. |
END_TS |
The end of the time for metric calculation. |
MAX_ROWS |
The maximum number of prediction rows to process. |
LOG |
The initialization of logging—defaults to setting all dmm and datarobot modules to WARNING . |
The following is an example of the help provided using CustomMetricArgumentParser
:
(model-runner) $ python3 custom.py --help
usage: custom.py [-h] [--api-key KEY] [--base-url URL] [--deployment-id ID] [--custom-metric-id ID] [--dry-run] [--start-ts TIMESTAMP] [--end-ts TIMESTAMP] [--max-rows ROWS] [--required] [--log [[NAME:]LEVEL ...]]
My new custom metric
optional arguments:
-h, --help show this help message and exit
--api-key KEY API key used to authenticate to server. Settable via 'API_KEY', required.
--base-url URL URL for server. Settable via 'BASE_URL' (default: https://staging.datarobot.com/api/v2), required.
--deployment-id ID Deployment ID. Settable via 'DEPLOYMENT_ID' (default: None), required.
--custom-metric-id ID
Custom metric ID. Settable via 'CUSTOM_METRIC_ID' (default: None), required.
--dry-run Dry run. Settable via 'DRY_RUN' (default: False).
--start-ts TIMESTAMP Start timestamp. Settable with 'START_TS', or 'LAST_SUCCESSFUL_RUN_TS' (when not dry run). Default is 2024-08-08 14:27:55.493027
--end-ts TIMESTAMP End timestamp. Settable with 'END_TS' or 'CURRENT_RUN_TS'. Default is 2024-08-09 14:27:55.493044.
--max-rows ROWS Maximum number of rows. Settable via 'MAX_ROWS' (default: 100000).
--required List the required properties and exit.
--log [[NAME:]LEVEL ...]
Logging level list. Settable via 'LOG' (default: WARNING).
(model-runner) $
Using the save_to_csv()
utility¶
During development, it is common to run your code over the same data multiple times to see how changes impact the results. The save_to_csv()
utility allows you to save your results to a CSV file, so you can compare the results between successive runs on the same data.