MLOps > Deployment > MLOps agents > Monitoring agent > Library and agent spooler configuration

MLOps library and agent spooler configuration¶

The MLOps library communicates to the agent through a spooler, so it is important that the agent and library spooler configurations match. When configuring the MLOps agent and library's spooler settings, some settings are required, and some are optional (optional settings are identified in each table under Optional configuration). The required settings can be configured programmatically or through the environment variables documented in the General configuration and Spooler-specific configurations sections. If you configure any settings programmatically and by defining an environment variable, the environment variable takes precedence.

Java requirement

The MLOps monitoring library requires Java 11 or higher. Without monitoring, a model's Scoring Code JAR file requires Java 8 or higher; however, when using MLOps library to instrument monitoring, a model's Scoring Code JAR file requires Java 11 or higher. For Self-managed AI platform installations, the Java 11 requirement applies to DataRobot v11.0 and higher.

MLOps agent and library communication can be configured to use any of the following spoolers:

Filesystem
Amazon SQS
RabbitMQ
Google Cloud Pub/Sub
Apache Kafka
Azure Event Hubs
DataRobot API

MLOps agent configuration¶

When running the monitoring agent as a separate service, specify the spooler configuration in mlops.agent.conf.yaml by uncommenting the channelConfigs section and entering the required configs. For more information on setting the channelConfigs see Configure the monitoring agent.

MLOps library configuration¶

The MLOps library can be configured programmatically or by using environment variables. To configure the spooler programmatically, specify the spooler during the MLOps init call; for example, to configure the filesystem spooler using the Python library:


mlops = MLOps().set_filesystem_spooler("your_spooler_directory").init()

Note

You must create the directory specified in the code above; the program will not create it for you.

Equivalent interfaces exist for other spooler types.

To configure the MLOps library and agent using environment variables, see the General configuration and Spooler-specific configurations sections.

General configuration¶

Use the following environment variables to configure the MLOps agent and library and to select a spooler type:

Variable	Description
`MLOPS_DEPLOYMENT_ID`	The deployment ID of the DataRobot deployment that should receive metrics from the MLOps library.
`MLOPS_MODEL_ID`	The model ID of the DataRobot model that should be reported on by the MLOps library.
`MLOPS_SPOOLER_TYPE`	The spooler type that the MLOps library will use to communicate with the monitoring agent. The following are valid spooler types: `FILESYSTEM`: Enable local filesystem spooler. `SQS`: Enable Amazon SQS spooler. `RABBITMQ`: Enable RabbitMQ spooler. `KAFKA`: Enable Apache Kafka or Azure Event Hubs spooler. `PUBSUB`: Enable Google Cloud Pub/Sub spooler. `NONE`: Disable MLOps library reporting. `STDOUT`: Print the reported metrics to stdout rather than forward them to the agent. `API`: Enable DataRobot API spooler. .
Optional configuration
`MLOPS_SPOOLER_DEQUEUE_ACK_RECORDS`	Ensure that the monitoring agent does not dequeue a record until processing is complete. Set this option to `true` to ensure records are not dropped due to connection errors. Enabling this option is highly recommended. The dequeuing operation behaves as follows for the spooler channels: `SQS`: Deletes a message. `RABBITMQ` and `PUBSUB`: Acknowledges the message as complete. `KAFKA` and `FILESYSTEM`: Moves the offset.
`MLOPS_ASYNC_REPORTING`	Enable the MLOps library to asynchronously report metrics to the spooler.
`MLOPS_FEATURE_DATA_ROWS_IN_ONE_MESSAGE`	The number of feature rows that will be in a single message to the spooler.
`MLOPS_SPOOLER_CONFIG_RECORD_DELIMITER`	The delimiter to replace the default value of `;` between key-value pairs in a spooler configuration string (e.g., `key1=value1;key2=value2` to `key1=value1:key2=value2`).
`MLOPS_SPOOLER_CONFIG_KEY_VALUE_SEPARATOR`	The separator to replace the default value of `=` between keys and values in a spooler configuration string (e.g., `key1=value1` to `key1:value1`).

Note

Setting the environment variable here takes precedence over variables definitions specified in the configuration file or configured programmatically.

After setting a spooler type, you can configure the spooler-specific environment variables.

Spooler-specific configurations¶

Depending on the MLOPS_SPOOLER_TYPE you set, you can provide configuration information as environment variables unique to the supported spoolers.

Filesystem¶

Use the following environment variable to configure the FILESYSTEM spooler:

Variable	Description
`MLOPS_FILESYSTEM_DIRECTORY`	The directory to store the metrics to report to DataRobot. You must create this directory; the program will not create it for you.
Optional configuration
`MLOPS_FILESYSTEM_MAX_FILE_SIZE`	Override the default maximum file size (in bytes). Default value: 1 GB
`MLOPS_FILESYSTEM_MAX_NUM_FILE`	Override the default maximum number of files. Default value: 10 files

Programmatic configuration of the filesystem spooler

You can also programmatically configure the filesystem spooler for the MLOps library.

Portable Prediction Server filesystem directory

When using the Portable Prediction Server (PPS) with the FILESYSTEM spooler, the directory must be bind-mounted while launching the PPS. For example:

docker run \
-p 8080:8080 \
-v <local path to model package folder>/:/tmp/ \
-e MLOPS_SPOOLER_TYPE="FILESYSTEM" \
-e MLOPS_FILESYSTEM_DIRECTORY="/tmp/" \
datarobot-portable-prediction-api

For more information on launching the PPS, see the Launch the PPS with the code snippet documentation.

Amazon SQS¶

When using Amazon SQS as a spooler, you can provide your credential set in either of two ways:

Set your credentials in the AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_REGION or AWS_DEFAULT_REGION environment variables. Only AWS software packages use these credentials; DataRobot doesn't access them.
If you are in an AWS environment, create an AWS IAM (Identity and Access Management) role for credential authentication.

Use one of the following environment variables to configure the SQS spooler:

Variable	Description
`MLOPS_SQS_QUEUE_URL`	The URL of the SQS queue used for the spooler.
`MLOPS_SQS_QUEUE_NAME`	The queue name of the SQS queue used for the spooler.

Note

When using the SQS spooler type, only provide the spooler name or the URL.

RabbitMQ¶

Use the following environment variables to configure the RABBITMQ spooler:

Variable	Description
`MLOPS_RABBITMQ_QUEUE_URL`	The URL of the RabbitMQ queue used for the spooler.
`MLOPS_RABBITMQ_QUEUE_NAME`	The queue name of the RabbitMQ queue used for the spooler.
Optional configuration
`MLOPS_RABBITMQ_SSL_CA_CERTIFICATE_PATH`	The path to the CA certificate file (`.pem` file).
`MLOPS_RABBITMQ_SSL_CERTIFICATE_PATH`	The path to the client certificate (`.pem` file).
`MLOPS_RABBITMQ_SSL_KEYFILE_PATH`	The path to the client key (`.pem` file).
`MLOPS_RABBITMQ_SSL_TLS_VERSION`	The TLS version used for the client. The TLS version must match server version.

Note

RabbitMQ configuration requires keys in RSA format without a password. You can convert keys from PKCS8 to RSA as follows:

openssl rsa -in mykey_pkcs8_format.pem -text > mykey_rsa_format.pem

To generate keys, see RabbitMQ TLS Support.

Google Cloud Pub/Sub¶

When using Google Cloud PUBSUB as a spooler, you must provide the appropriate credentials in the GOOGLE_APPLICATION_CREDENTIALS environment variable. Only Google Cloud software packages use these credentials; DataRobot doesn't access them.

Use the following environment variables to configure the PUBSUB spooler:

Variable	Description
`MLOPS_PUBSUB_PROJECT_ID`	The Pub/Sub project ID of the project used by the spooler; this should be the full path of the project ID.
`MLOPS_PUBSUB_TOPIC_NAME`	The Pub/Sub topic name of the topic used by the spooler; this should be the topic name within the project, not the fully qualified topic name path that includes the project ID.
`MLOPS_PUBSUB_SUBSCRIPTION_NAME`	The Pub/Sub subscription name of the subscription used by the spooler.

Pub/Sub service account permissions

The following service account permissions are required to use Google Cloud Pub/Sub as a spooler for the monitoring agent:

Level	Permission(s)
Project	Project Viewer
Topic	Pub/Sub Publisher
Subscription	Pub/Sub Viewer, Pub/Sub Subscriber

The base permissions required for communicating with the monitoring agent outside of DataRobot are Pub/Sub Publisher and Pub/Sub Subscriber. Pub/Sub Viewer and Project Viewer are required by DataRobot prediction environments.

Apache Kafka¶

Use the following environment variables to configure the Apache KAFKA spooler:

Variable	Description
`MLOPS_KAFKA_TOPIC_NAME`	The name of the specific Kafka topic to produce to or consume from. Apache Kafka Reference: Main Concepts and Terminology
`MLOPS_KAFKA_BOOTSTRAP_SERVERS`	The list of servers that the agent connects to. Use the same syntax as the `bootstrap.servers` config used upstream. Apache Kafka Reference: `bootstrap.servers`
Optional configuration
`MLOPS_KAFKA_CONSUMER_POLL_TIMEOUT_MS`	The amount of time to wait while consuming messages before processing them and sending them to DataRobot Default value: 3000 ms.
`MLOPS_KAFKA_CONSUMER_GROUP_ID`	A unique string that identifies the consumer group this consumer belongs to. Default value: `tracking-agent`. Apache Kafka Reference: `group.id`
`MLOPS_KAFKA_CONSUMER_MAX_NUM_MESSAGES`	The maximum number of messages to consume at one time before processing them and sending the results to DataRobot MLOps. Default value: 500 Apache Kafka Reference: `max.poll.records`
`MLOPS_KAFKA_SESSION_TIMEOUT_MS`	The timeout used to detect client failures in the consumer group. Apache Kafka Reference: `session-timeout.ms`
`MLOPS_KAFKA_MESSAGE_BYTE_SIZE_LIMIT`	The maximum chunk size when producing events to the channel. Default value: 1000000 bytes
`MLOPS_KAFKA_DELIVERY_TIMEOUT_MS`	The absolute upper bound amount of time to send messages before considering it permanently failed. Apache Kafka Reference: `delivery.timeout.ms`
`MLOPS_KAFKA_REQUEST_TIMEOUT_MS`	The maximum amount of time a client will wait for a response to a request before retrying. Apache Kafka Reference: `request.timeout.ms`
`MLOPS_KAFKA_METADATA_MAX_AGE_MS`	The maximum amount of time (in ms) the client will wait before refreshing its cluster metadata. Apache Kafka Reference: `metadata.max.age.ms`
`MLOPS_KAFKA_SECURITY_PROTOCOL`	Protocols used to connect to the brokers. Apache Kafka Reference: `security.protocol` valid values.
`MLOPS_KAFKA_SASL_MECHANISM`	The mechanism clients use to authenticate with the broker. Apache Kafka Reference: `sasl.mechanism`
`MLOPS_KAFKA_SASL_JAAS_CONFIG` (Java only)	Connection settings in a format used by JAAS configuration files. Apache Kafka Reference: `sasl.jaas.config`
`MLOPS_KAFKA_SASL_LOGIN_CALLBACK_CLASS` (Java only)	A custom login handler class. Apache Kafka Reference: `sasl.login.callback.handler.class`
`MLOPS_KAFKA_CONNECTIONS_MAX_IDLE_MS` (Java only)	The maximum amount of time (in ms) before the client closes an inactive connection. This value should be set lower than any timeouts your network infrastructure may impose. Apache Kafka Reference: `connections.max.idle.ms`
`MLOPS_KAFKA_SASL_USERNAME` (Python only)	SASL username for use with the PLAIN and SASL-SCRAM-* mechanisms. Reference: See the `sasl.username` setting in `librdkafka`.
`MLOPS_KAFKA_SASL_PASSWORD` (Python only)	SASL password for use with the PLAIN and SASL-SCRAM-* mechanisms. Reference: See the `sasl.password` setting in `librdkafka`
`MLOPS_KAFKA_SASL_OAUTHBEARER_CONFIG` (Python only)	Custom configuration to pass the OAuth login callback. Reference: See the `sasl.oauthbearer.config` setting in `librdkafka`
`MLOPS_KAFKA_SOCKET_KEEPALIVE` (Python only)	Enable TCP keep-alive on network connections, sending packets over those connections periodically to prevent the required connections from being closed due to inactivity. Reference: See the `socket.keepalive.enable` setting in `librdkafka`

DataRobot API¶

The process to configure the DataRobot API spooler is different than typical spooler configuration. Usually, the monitoring agent connects to the spooler, gathers information, and sends that information to DataRobot MLOps. Using the DataRobot API, you do not actually connect to a spooler, and the calls you make to the MLOps library are unchanged. The calls do not go to a spooler or the monitoring agent, and instead go directly to DataRobot MLOps via HTTPS. In this case, you do not need to configure a complex spooler and monitoring agent.

Use the following parameters to configure the DataRobot Python API spooler:

Parameter	Description
`MLOPS_SERVICE_URL`	Specify the service URL to access MLOps via this environment variable instead of specifying it in the YAML configuration file.
`MLOPS_API_TOKEN`	The DataRobot API key.
`VERIFY SSL` (Boolean)	(Optional) Determines whether the client should verify the SSL connection. Default value: `True`
Optional configuration
`MLOPS_HTTP_RETRY`	The number of retries for a successful call. Default value: 3
`API_POST_TIMEOUT_SECONDS`	Sets the timeout value. Default value: 30
`API_HTTP_RETRY_WAIT_SECONDS`	Determines how long to wait after a timeout before retrying. Default value: 1

Azure Event Hubs¶

DataRobot allows you to use Microsoft Azure Event Hubs as a monitoring agent spooler by leveraging the existing Kafka spooler type. To set this up, see Using Azure Event Hubs from Apache Kafka applications.

Note

Azure supports the Kafka protocol for Event Hubs only for the Standard and Premium pricing tiers. The Basic tier does not offer Kafka API support, so it is not supported as a spooler for the monitoring agent. See Azure Event Hubs quotas and limits for details.

To use Azure Event Hubs as a spooler, you need to set up authentication for the monitoring agent and MLOps library using one of these methods:

SAS-based authentication
Azure Active Directory OAuth 2.0

SAS-based authentication for Event Hubs¶

To use Event Hubs SAS-based authentication for the monitoring agent and MLOps library, set the following environment variables using the example shell fragment below:

Sample environment variables script for SAS-based authentication


# Azure recommends setting the following values; see:
# https://docs.microsoft.com/en-us/azure/event-hubs/apache-kafka-configurations
export MLOPS_KAFKA_REQUEST_TIMEOUT_MS='60000'
export MLOPS_KAFKA_SESSION_TIMEOUT_MS='30000'
export MLOPS_KAFKA_METADATA_MAX_AGE_MS='180000'

# Common configuration variables for both Java- and Python-based libraries.
export MLOPS_KAFKA_BOOTSTRAP_SERVERS='XXXX.servicebus.windows.net:9093'
export MLOPS_KAFKA_SECURITY_PROTOCOL='SASL_SSL'
export MLOPS_KAFKA_SASL_MECHANISM='PLAIN'

# The following setting is specific to the Java SDK (and the monitoring agent daemon)
export MLOPS_KAFKA_SASL_JAAS_CONFIG='org.apache.kafka.common.security.plain.PlainLoginModule required username="$ConnectionString" password="Endpoint=sb://XXXX.servicebus.windows.net/;SharedAccessKeyName=XXXX;SharedAccessKey=XXXX";'

# For the Python SDK, you will need the following settings (in addition to the common ones above)
export MLOPS_KAFKA_SASL_USERNAME='$ConnectionString'
export MLOPS_KAFKA_SASL_PASSWORD='Endpoint=sb://XXXX.servicebus.windows.net/;SharedAccessKeyName=XXX;SharedAccessKey=XXXX'

Note

The environment variable values above use single-quotes (') to ensure that the special characters $ and " are not interpreted by the shell when setting variables. If you are setting environment variables via DataBricks, you should follow their guidelines on escaping special characters for the version of the platform you are using.

Azure Active Directory OAuth 2.0 for Event Hubs¶

DataRobot supports Azure Active Directory OAuth 2.0 for Event Hubs authentication. To use this authentication method, you must create a new Application Registration with the necessary permissions over your Event Hubs Namespace (i.e., Azure Event Hubs Data Owner). See Authenticate an application with Azure AD to access Event Hubs resources for details.

To use Event Hubs Azure Active Directory OAuth 2.0 authentication, set the following environment variables using the example shell fragment below:

Sample environment variables script for Azure AD OAuth 2.0 authentication


# Azure recommends setting the following values; see:
# https://docs.microsoft.com/en-us/azure/event-hubs/apache-kafka-configurations
export MLOPS_KAFKA_REQUEST_TIMEOUT_MS='60000'
export MLOPS_KAFKA_SESSION_TIMEOUT_MS='30000'
export MLOPS_KAFKA_METADATA_MAX_AGE_MS='180000'

# Common configuration variables for both Java- and Python-based libraries.
export MLOPS_KAFKA_BOOTSTRAP_SERVERS='XXXX.servicebus.windows.net:9093'
export MLOPS_KAFKA_SECURITY_PROTOCOL='SASL_SSL'
export MLOPS_KAFKA_SASL_MECHANISM='OAUTHBEARER'

# The following setting is specific to the Java SDK (and the tracking-agent daemon)
export MLOPS_KAFKA_SASL_JAAS_CONFIG='org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginModule required aad.tenant.id="XXXX" aad.client.id="XXXX" aad.client.secret="XXXX";'
export MLOPS_KAFKA_SASL_LOGIN_CALLBACK_CLASS='com.datarobot.mlops.spooler.kafka.ActiveDirectoryAuthenticateCallbackHandler'

# For the Python SDK, you will need the following settings (in addition to the common ones above)
export MLOPS_KAFKA_SASL_OAUTHBEARER_CONFIG='aad.tenant.id=XXXX-XXXX-XXXX-XXXX-XXXX, aad.client.id=XXXX-XXXX-XXXX-XXXX-XXXX, aad.client.secret=XXXX'

Note

Some environment variable values contain double quotes ("). Take care when setting environment variables that include this special character (or others).

Dynamically load required spoolers in a Java application¶

To configure Monitoring Agent spoolers using third-party code, you can dynamically load a separate JAR file for the required spooler. This configuration is required for the Amazon SQS, RabbitMQ, Google Cloud Pub/Sub, and Apache Kafka spoolers. The natively supported file system spooler is configurable without loading a JAR file.

Note

Previously, the datarobot-mlops and mlops-agent packages included all spooler types by default; however, that configuration meant the code was always present, even if it was unused.

Include spooler dependencies in the project object model¶

To use a third-party spooler in your MLOps Java application, you must include the required spoolers as dependencies in your POM (Project Object Model) file, along with datarobot-mlops:

Dependencies in a POM file


<properties>
    <mlops.version>8.3.0</mlops.version>
</properties>

    <dependency>
        <groupId>com.datarobot</groupId>
        <artifactId>datarobot-mlops</artifactId>
        <version>${mlops.version}</version>
    </dependency>
    <dependency>
        <groupId>com.datarobot</groupId>
        <artifactId>spooler-sqs</artifactId>
        <version>${mlops.version}</version>
    </dependency>
    <dependency>
        <groupId>com.datarobot</groupId>
        <artifactId>spooler-rabbitmq</artifactId>
        <version>${mlops.version}</version>
    </dependency>
    <dependency>
        <groupId>com.datarobot</groupId>
        <artifactId>spooler-pubsub</artifactId>
        <version>${mlops.version}</version>
    </dependency>
    <dependency>
        <groupId>com.datarobot</groupId>
        <artifactId>spooler-kafka</artifactId>
        <version>${mlops.version}</version>
    </dependency>

Provide an executable JAR file for the spooler¶

The spooler JAR files are included in the MLOps agent tarball. They are also available individually as downloadable JAR files in the public Maven repository for the DataRobot MLOps Agent.

To use a third-party spooler with the executable agent JAR file, add the path to the spooler to the classpath:

Classpath without spooler


java ... -cp path/to/mlops-agent-8.2.0.jar com.datarobot.mlops.agent.Agent

Classpath with Kafka spooler


java ... -cp path/to/mlops-agent-8.3.0.jar:path/to/spooler-kafka-8.3.0.jar com.datarobot.mlops.agent.Agent

The start-agent.sh script provided as an example automatically performs this task, adding any spooler JAR files found in the lib directory to the classpath. If your spooler JAR files are in a different directory, set the MLOPS_SPOOLER_JAR_PATH environment variable.

Troubleshoot MLOps applicationsTroubleshooting the Monitoring Agent

If a dynamic spooler is loaded successfully, the Monitoring Agent logs an INFO message: Creating spooler type <type>: success.
If loading a dynamic spooler fails, the Monitoring Agent logs an ERROR message: Creating spooler type <type>: failed, followed by the reason (a class not found error, indicating a missing dependency) or more details (a system exception message, helping you diagnose the issue). If the class was not found, ensure the dependency for the spooler is included in the application's POM. Missing dependencies will not be discovered until runtime.

If a dynamic spooler is loaded successfully, the Monitoring Agent logs an INFO message: Creating spooler type <type>: success.
If loading a dynamic spooler fails, the Monitoring Agent logs an ERROR message: Creating spooler type <type>: failed, followed by the reason (a class not found error, indicating a missing JAR file) or more details (a system exception message, helping you diagnose the issue). If the class was not found, ensure the matching JAR file for that spooler is included in the classpath of the java command that starts the agent.

Tip

If the agent is configured with a predictionEnvironmentId and can connect to DataRobot, the agent sends an MLOps Spooler Channel Failed event to DataRobot MLOps with information from the log message. These events appear in the event log on the Service Health page of any deployment associated with that prediction environment. You can also create a notification channel and policy to be notified (by email, Slack, or webhook) of these errors.

MLOps library and agent spooler configuration¶

MLOps agent configuration¶

MLOps library configuration¶

General configuration¶

Spooler-specific configurations¶

Filesystem¶

Amazon SQS¶

RabbitMQ¶

Google Cloud Pub/Sub¶

Apache Kafka¶

DataRobot API¶

Azure Event Hubs¶

SAS-based authentication for Event Hubs¶

Azure Active Directory OAuth 2.0 for Event Hubs¶

Dynamically load required spoolers in a Java application¶

Include spooler dependencies in the project object model¶

Provide an executable JAR file for the spooler¶

Was this page helpful?

Great! Let us know what you found helpful.

What can we do to improve the content?