MLflow integration for DataRobot¶
Availability information
The MLflow integration for DataRobot is a preview feature. Contact your DataRobot representative or administrator for information on using this feature.
The MLflow integration for DataRobot allows you to export a model from MLflow and import it into the DataRobot Model Registry, creating key values from the training parameters, metrics, tags, and artifacts in the MLflow model.
Prerequisites for the MLflow integration¶
The MLflow integration for DataRobot requires the following:
- Python >= 3.9
- DataRobot >= 9.0
This integration library uses a preview API endpoint; the DataRobot user associated with your API token must have Owner or User permissions for the DataRobot model package.
Install the MLflow integration for DataRobot¶
You can install the datarobot-mlfow
integration with pip
:
pip install datarobot-mlflow
If you are running the integration on Azure, use the following command:
pip install "datarobot-mlflow[azure]"
Configure command line options¶
The following command line options are available for the drflow_cli
:
Option | Description |
---|---|
--mlflow-url |
Defines the MLflow tracking URL; for example:
|
--mlflow-model |
Defines the MLflow model name; for example, "cost-model" . |
--mlflow-model-version |
Defines the MLflow model version; for example, "2" . |
--dr-url |
Provides the main URL of DataRobot instance; for example, https://app.datarobot.com . |
--dr-model |
Defines the ID of the registered model for key value upload; for example, 64227b4bf82db411c90c3209 . |
--prefix |
Provides a string to prepend to the names of all key values imported to DataRobot. The default value is empty. |
--debug |
Sets the Python logging level to logging.DEBUG . The default level is logging.WARNING . |
--verbose |
Prints information to stdout during the following processes:
|
--with-artifacts |
Downloads MLflow model artifacts to /tmp/model . |
--service-provider-type |
Defines the service provider for validate-auth . The supported value is azure-databricks for Databricks MLflow in Azure. |
--auth-type |
Defines the authentication type for validate-auth . The supported value is azure-service-principal for Azure Service Principal. |
--action |
Defines the operation you want the MLflow integration for DataRobot to perform. |
The following command line operations are available for the --action
option:
Action | Description |
---|---|
sync |
Imports parameters, tags, metrics, and artifacts from an MLflow model into a DataRobot model package as key values. This action requires --mlflow-url , --mlflow-model , --mlflow-model-version , --dr-url , and --dr-model . |
list-mlflow-keys |
Lists parameters, tags, metrics, and artifacts in an MLflow model. This action requires --mlflow-url , --mlflow-model , and --mlflow-model-version . |
validate-auth |
Validates the Azure AD Service Principal credentials for troubleshooting purposes. This action requires --auth-type and --service-provider-type . |
Set environment variables¶
In addition to the command line options above, you should also provide any environment variables required for your use case:
Environment variable | Description |
---|---|
MLOPS_API_TOKEN |
A DataRobot API key, found in the DataRobot Developer Tools. |
AZURE_TENANT_ID |
The Azure Tenant ID for your Azure Databricks MLflow instance, found in the Azure portal. |
AZURE_CLIENT_ID |
The Azure Client ID for your Azure Databricks MLflow instance, found in the Azure portal. |
AZURE_CLIENT_SECRET |
The Azure Client Secret for your Azure Databricks MLflow instance, found in the Azure portal. |
You can use export
to define these environment variables with the information required for your use case:
export MLOPS_API_TOKEN="<dr-api-key>"
export AZURE_TENANT_ID="<tenant-id>"
export AZURE_CLIENT_ID="<client-id>"
export AZURE_CLIENT_SECRET="<secret>"
Run the sync action to import a model from MLflow into DataRobot¶
You can use the command line options and actions defined above to export MLflow model information from MLflow and import it into the DataRobot Model Registry:
DR_MODEL_ID="<MODEL_PACKAGE_ID>"
env PYTHONPATH=./ \
python datarobot_mlflow/drflow_cli.py \
--mlflow-url http://localhost:8080 \
--mlflow-model cost-model \
--mlflow-model-version 2 \
--dr-model $DR_MODEL_ID \
--dr-url https://app.datarobot.com \
--with-artifacts \
--verbose \
--action sync
After you run this command successfully, you can see MLflow information on the Key Values tab of a Registered Model version:
In addition, in the Activity log of the Key Values tab, you can view a record of the key value creation events:
Troubleshoot Azure AD Service Principal credentials¶
To validate Azure AD Service Principal credentials for troubleshooting purposes, you can use the following command line example:
export MLOPS_API_TOKEN="n/a" # not used for Azure auth check, but the environment variable must be present
env PYTHONPATH=./ \
python datarobot_mlflow/drflow_cli.py \
--verbose \
--auth-type azure-service-principal \
--service-provider-type azure-databricks \
--action validate-auth
This command should produce the following output if you haven't configured the required environment variables:
Required environment variable is not defined: AZURE_TENANT_ID
Required environment variable is not defined: AZURE_CLIENT_ID
Required environment variable is not defined: AZURE_CLIENT_SECRET
Azure AD Service Principal credentials are not valid; check environment variables
If you see this error, provide the required Azure AD Service Principal credentials as environment variables:
export AZURE_TENANT_ID="<tenant-id>"
export AZURE_CLIENT_ID="<client-id>"
export AZURE_CLIENT_SECRET="<secret>"
When the environment variables for the Azure AD Service Principal credentials are defined, you should see the following output:
Azure AD Service Principal credentials are valid for obtaining access token