The MLOps library communicates to the agent through a spooler, so it is important that the agent and library spooler configurations match. When configuring the MLOps agent and library's spooler settings, some settings are required, and some are optional (optional settings are identified in each table under Optional configuration). The required settings can be configured programmatically or through the environment variables documented in the General configuration and Spooler-specific configurations sections. If you configure any settings programmatically and by defining an environment variable, the environment variable takes precedence.
Java requirement
The MLOps monitoring library requires Java 11 or higher. Without monitoring, a model's Scoring Code JAR file requires Java 8 or higher; however, when using MLOps library to instrument monitoring, a model's Scoring Code JAR file requires Java 11 or higher. For Self-managed AI platform installations, the Java 11 requirement applies to DataRobot v11.0 and higher.
MLOps agent and library communication can be configured to use any of the following spoolers:
When running the monitoring agent as a separate service, specify the spooler configuration in mlops.agent.conf.yaml by uncommenting the channelConfigs section and entering the required configs. For more information on setting the channelConfigs see Configure the monitoring agent.
The MLOps library can be configured programmatically or by using environment variables. To configure the spooler programmatically, specify the spooler during the MLOps init call; for example, to configure the filesystem spooler using the Python library:
Use the following environment variables to configure the MLOps agent and library and to select a spooler type:
Variable
Description
MLOPS_DEPLOYMENT_ID
The deployment ID of the DataRobot deployment that should receive metrics from the MLOps library.
MLOPS_MODEL_ID
The model ID of the DataRobot model that should be reported on by the MLOps library.
MLOPS_SPOOLER_TYPE
The spooler type that the MLOps library will use to communicate with the monitoring agent. The following are valid spooler types:
FILESYSTEM: Enable local filesystem spooler.
SQS: Enable Amazon SQS spooler.
RABBITMQ: Enable RabbitMQ spooler.
KAFKA: Enable Apache Kafka or Azure Event Hubs spooler.
PUBSUB: Enable Google Cloud Pub/Sub spooler.
NONE: Disable MLOps library reporting.
STDOUT: Print the reported metrics to stdout rather than forward them to the agent.
API: Enable DataRobot API spooler.
.
Optional configuration
MLOPS_SPOOLER_DEQUEUE_ACK_RECORDS
Ensure that the monitoring agent does not dequeue a record until processing is complete. Set this option to true to ensure records are not dropped due to connection errors. Enabling this option is highly recommended. The dequeuing operation behaves as follows for the spooler channels:
SQS: Deletes a message.
RABBITMQ and PUBSUB: Acknowledges the message as complete.
KAFKA and FILESYSTEM: Moves the offset.
MLOPS_ASYNC_REPORTING
Enable the MLOps library to asynchronously report metrics to the spooler.
MLOPS_FEATURE_DATA_ROWS_IN_ONE_MESSAGE
The number of feature rows that will be in a single message to the spooler.
MLOPS_SPOOLER_CONFIG_RECORD_DELIMITER
The delimiter to replace the default value of ; between key-value pairs in a spooler configuration string (e.g., key1=value1;key2=value2 to key1=value1:key2=value2).
MLOPS_SPOOLER_CONFIG_KEY_VALUE_SEPARATOR
The separator to replace the default value of = between keys and values in a spooler configuration string (e.g., key1=value1 to key1:value1).
Note
Setting the environment variable here takes precedence over variables definitions specified in the configuration file or configured programmatically.
After setting a spooler type, you can configure the spooler-specific environment variables.
When using Amazon SQS as a spooler, you can provide your credential set in either of two ways:
Set your credentials in the AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_REGION or AWS_DEFAULT_REGION environment variables. Only AWS software packages use these credentials; DataRobot doesn't access them.
When using Google Cloud PUBSUB as a spooler, you must provide the appropriate credentials in the GOOGLE_APPLICATION_CREDENTIALS environment variable. Only Google Cloud software packages use these credentials; DataRobot doesn't access them.
Use the following environment variables to configure the PUBSUB spooler:
Variable
Description
MLOPS_PUBSUB_PROJECT_ID
The Pub/Sub project ID of the project used by the spooler; this should be the full path of the project ID.
MLOPS_PUBSUB_TOPIC_NAME
The Pub/Sub topic name of the topic used by the spooler; this should be the topic name within the project, not the fully qualified topic name path that includes the project ID.
MLOPS_PUBSUB_SUBSCRIPTION_NAME
The Pub/Sub subscription name of the subscription used by the spooler.
Pub/Sub service account permissions
The following service account permissions are required to use Google Cloud Pub/Sub as a spooler for the monitoring agent:
Level
Permission(s)
Project
Project Viewer
Topic
Pub/Sub Publisher
Subscription
Pub/Sub Viewer, Pub/Sub Subscriber
The base permissions required for communicating with the monitoring agent outside of DataRobot are Pub/Sub Publisher and Pub/Sub Subscriber. Pub/Sub Viewer and Project Viewer are required by DataRobot prediction environments.
Use the following environment variables to configure the Apache KAFKA spooler:
Variable
Description
MLOPS_KAFKA_TOPIC_NAME
The name of the specific Kafka topic to produce to or consume from. Apache Kafka Reference: Main Concepts and Terminology
MLOPS_KAFKA_BOOTSTRAP_SERVERS
The list of servers that the agent connects to. Use the same syntax as the bootstrap.servers config used upstream. Apache Kafka Reference: bootstrap.servers
Optional configuration
MLOPS_KAFKA_CONSUMER_POLL_TIMEOUT_MS
The amount of time to wait while consuming messages before processing them and sending them to DataRobot Default value: 3000 ms.
MLOPS_KAFKA_CONSUMER_GROUP_ID
A unique string that identifies the consumer group this consumer belongs to. Default value: tracking-agent. Apache Kafka Reference: group.id
MLOPS_KAFKA_CONSUMER_MAX_NUM_MESSAGES
The maximum number of messages to consume at one time before processing them and sending the results to DataRobot MLOps. Default value: 500 Apache Kafka Reference: max.poll.records
MLOPS_KAFKA_SESSION_TIMEOUT_MS
The timeout used to detect client failures in the consumer group. Apache Kafka Reference: session-timeout.ms
MLOPS_KAFKA_MESSAGE_BYTE_SIZE_LIMIT
The maximum chunk size when producing events to the channel. Default value: 1000000 bytes
MLOPS_KAFKA_DELIVERY_TIMEOUT_MS
The absolute upper bound amount of time to send messages before considering it permanently failed. Apache Kafka Reference: delivery.timeout.ms
MLOPS_KAFKA_REQUEST_TIMEOUT_MS
The maximum amount of time a client will wait for a response to a request before retrying. Apache Kafka Reference: request.timeout.ms
MLOPS_KAFKA_METADATA_MAX_AGE_MS
The maximum amount of time (in ms) the client will wait before refreshing its cluster metadata. Apache Kafka Reference: metadata.max.age.ms
MLOPS_KAFKA_SECURITY_PROTOCOL
Protocols used to connect to the brokers. Apache Kafka Reference: security.protocol valid values.
MLOPS_KAFKA_SASL_MECHANISM
The mechanism clients use to authenticate with the broker. Apache Kafka Reference: sasl.mechanism
MLOPS_KAFKA_SASL_JAAS_CONFIG(Java only)
Connection settings in a format used by JAAS configuration files. Apache Kafka Reference: sasl.jaas.config
The maximum amount of time (in ms) before the client closes an inactive connection. This value should be set lower than any timeouts your network infrastructure may impose. Apache Kafka Reference: connections.max.idle.ms
MLOPS_KAFKA_SASL_USERNAME(Python only)
SASL username for use with the PLAIN and SASL-SCRAM-* mechanisms. Reference: See the sasl.username setting in librdkafka.
MLOPS_KAFKA_SASL_PASSWORD(Python only)
SASL password for use with the PLAIN and SASL-SCRAM-* mechanisms. Reference: See the sasl.password setting in librdkafka
MLOPS_KAFKA_SASL_OAUTHBEARER_CONFIG(Python only)
Custom configuration to pass the OAuth login callback. Reference: See the sasl.oauthbearer.config setting in librdkafka
MLOPS_KAFKA_SOCKET_KEEPALIVE(Python only)
Enable TCP keep-alive on network connections, sending packets over those connections periodically to prevent the required connections from being closed due to inactivity. Reference: See the socket.keepalive.enable setting in librdkafka
The process to configure the DataRobot API spooler is different than typical spooler configuration. Usually, the monitoring agent connects to the spooler, gathers information, and sends that information to DataRobot MLOps. Using the DataRobot API, you do not actually connect to a spooler, and the calls you make to the MLOps library are unchanged. The calls do not go to a spooler or the monitoring agent, and instead go directly to DataRobot MLOps via HTTPS. In this case, you do not need to configure a complex spooler and monitoring agent.
Use the following parameters to configure the DataRobot Python API spooler:
Parameter
Description
MLOPS_SERVICE_URL
Specify the service URL to access MLOps via this environment variable instead of specifying it in the YAML configuration file.
Azure supports the Kafka protocol for Event Hubs only for the Standard and Premium pricing tiers. The Basic tier does not offer Kafka API support, so it is not supported as a spooler for the monitoring agent. See Azure Event Hubs quotas and limits for details.
To use Azure Event Hubs as a spooler, you need to set up authentication for the monitoring agent and MLOps library using one of these methods:
To use Event Hubs SAS-based authentication for the monitoring agent and MLOps library, set the following environment variables using the example shell fragment below:
Sample environment variables script for SAS-based authentication
# Azure recommends setting the following values; see:# https://docs.microsoft.com/en-us/azure/event-hubs/apache-kafka-configurationsexportMLOPS_KAFKA_REQUEST_TIMEOUT_MS='60000'exportMLOPS_KAFKA_SESSION_TIMEOUT_MS='30000'exportMLOPS_KAFKA_METADATA_MAX_AGE_MS='180000'# Common configuration variables for both Java- and Python-based libraries.exportMLOPS_KAFKA_BOOTSTRAP_SERVERS='XXXX.servicebus.windows.net:9093'exportMLOPS_KAFKA_SECURITY_PROTOCOL='SASL_SSL'exportMLOPS_KAFKA_SASL_MECHANISM='PLAIN'# The following setting is specific to the Java SDK (and the monitoring agent daemon)exportMLOPS_KAFKA_SASL_JAAS_CONFIG='org.apache.kafka.common.security.plain.PlainLoginModule required username="$ConnectionString" password="Endpoint=sb://XXXX.servicebus.windows.net/;SharedAccessKeyName=XXXX;SharedAccessKey=XXXX";'# For the Python SDK, you will need the following settings (in addition to the common ones above)exportMLOPS_KAFKA_SASL_USERNAME='$ConnectionString'exportMLOPS_KAFKA_SASL_PASSWORD='Endpoint=sb://XXXX.servicebus.windows.net/;SharedAccessKeyName=XXX;SharedAccessKey=XXXX'
Note
The environment variable values above use single-quotes (') to ensure that the special characters $ and " are not interpreted by the shell when setting variables. If you are setting environment variables via DataBricks, you should follow their guidelines on escaping special characters for the version of the platform you are using.
DataRobot supports Azure Active Directory OAuth 2.0 for Event Hubs authentication. To use this authentication method, you must create a new Application Registration with the necessary permissions over your Event Hubs Namespace (i.e., Azure Event Hubs Data Owner). See Authenticate an application with Azure AD to access Event Hubs resources for details.
To use Event Hubs Azure Active Directory OAuth 2.0 authentication, set the following environment variables using the example shell fragment below:
Sample environment variables script for Azure AD OAuth 2.0 authentication
# Azure recommends setting the following values; see:# https://docs.microsoft.com/en-us/azure/event-hubs/apache-kafka-configurationsexportMLOPS_KAFKA_REQUEST_TIMEOUT_MS='60000'exportMLOPS_KAFKA_SESSION_TIMEOUT_MS='30000'exportMLOPS_KAFKA_METADATA_MAX_AGE_MS='180000'# Common configuration variables for both Java- and Python-based libraries.exportMLOPS_KAFKA_BOOTSTRAP_SERVERS='XXXX.servicebus.windows.net:9093'exportMLOPS_KAFKA_SECURITY_PROTOCOL='SASL_SSL'exportMLOPS_KAFKA_SASL_MECHANISM='OAUTHBEARER'# The following setting is specific to the Java SDK (and the tracking-agent daemon)exportMLOPS_KAFKA_SASL_JAAS_CONFIG='org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginModule required aad.tenant.id="XXXX" aad.client.id="XXXX" aad.client.secret="XXXX";'exportMLOPS_KAFKA_SASL_LOGIN_CALLBACK_CLASS='com.datarobot.mlops.spooler.kafka.ActiveDirectoryAuthenticateCallbackHandler'# For the Python SDK, you will need the following settings (in addition to the common ones above)exportMLOPS_KAFKA_SASL_OAUTHBEARER_CONFIG='aad.tenant.id=XXXX-XXXX-XXXX-XXXX-XXXX, aad.client.id=XXXX-XXXX-XXXX-XXXX-XXXX, aad.client.secret=XXXX'
Note
Some environment variable values contain double quotes ("). Take care when setting environment variables that include this special character (or others).
Dynamically load required spoolers in a Java application¶
To configure Monitoring Agent spoolers using third-party code, you can dynamically load a separate JAR file for the required spooler. This configuration is required for the Amazon SQS, RabbitMQ, Google Cloud Pub/Sub, and Apache Kafka spoolers. The natively supported file system spooler is configurable without loading a JAR file.
Note
Previously, the datarobot-mlops and mlops-agent packages included all spooler types by default; however, that configuration meant the code was always present, even if it was unused.
Include spooler dependencies in the project object model¶
To use a third-party spooler in your MLOps Java application, you must include the required spoolers as dependencies in your POM (Project Object Model) file, along with datarobot-mlops:
The spooler JAR files are included in the MLOps agent tarball. They are also available individually as downloadable JAR files in the public Maven repository for the DataRobot MLOps Agent.
To use a third-party spooler with the executable agent JAR file, add the path to the spooler to the classpath:
The start-agent.sh script provided as an example automatically performs this task, adding any spooler JAR files found in the lib directory to the classpath. If your spooler JAR files are in a different directory, set the MLOPS_SPOOLER_JAR_PATH environment variable.
If a dynamic spooler is loaded successfully, the Monitoring Agent logs an INFO message: Creating spooler type <type>: success.
If loading a dynamic spooler fails, the Monitoring Agent logs an ERROR message: Creating spooler type <type>: failed, followed by the reason (a class not found error, indicating a missing dependency) or more details (a system exception message, helping you diagnose the issue). If the class was not found, ensure the dependency for the spooler is included in the application's POM. Missing dependencies will not be discovered until runtime.
If a dynamic spooler is loaded successfully, the Monitoring Agent logs an INFO message: Creating spooler type <type>: success.
If loading a dynamic spooler fails, the Monitoring Agent logs an ERROR message: Creating spooler type <type>: failed, followed by the reason (a class not found error, indicating a missing JAR file) or more details (a system exception message, helping you diagnose the issue). If the class was not found, ensure the matching JAR file for that spooler is included in the classpath of the java command that starts the agent.
Tip
If the agent is configured with a predictionEnvironmentId and can connect to DataRobot, the agent sends an MLOps Spooler Channel Failed event to DataRobot MLOps with information from the log message. These events appear in the event log on the Service Health page of any deployment associated with that prediction environment. You can also create a notification channel and policy to be notified (by email, Slack, or webhook) of these errors.