Skip to content

On-premise users: click in-app to access the full platform documentation for your version of DataRobot.

MLOps library and agent spooler configuration

The MLOps library communicates to the agent through a spooler, so it is important that the agent and library spooler configurations match. When configuring the MLOps agent and library's spooler settings, some settings are required, and some are optional (optional settings are identified in each table under Optional configuration). The required settings can be configured programmatically or through the environment variables documented in the General configuration and Spooler-specific configurations sections. If you configure any settings programmatically and by defining an environment variable, the environment variable takes precedence.

MLOps agent and library communication can be configured to use any of the following spoolers:

MLOps agent configuration

When running the monitoring agent as a separate service, specify the spooler configuration in mlops.agent.conf.yaml by uncommenting the channelConfigs section and entering the required configs. For more information on setting the channelConfigs see Configure the monitoring agent.

MLOps library configuration

The MLOps library can be configured programmatically or by using environment variables. To configure the spooler programmatically, specify the spooler during the MLOps init call; for example, to configure the filesystem spooler using the Python library:

mlops = MLOps().set_filesystem_spooler("your_spooler_directory").init()

Note

You must create the directory specified in the code above; the program will not create it for you.

Equivalent interfaces exist for other spooler types.

To configure the MLOps library and agent using environment variables, see the General configuration and Spooler-specific configurations sections.

General configuration

Use the following environment variables to configure the MLOps agent and library and to select a spooler type:

Variable Description
MLOPS_DEPLOYMENT_ID The deployment ID of the DataRobot deployment that should receive metrics from the MLOps library.
MLOPS_MODEL_ID The model ID of the DataRobot model that should be reported on by the MLOps library.
MLOPS_SPOOLER_TYPE The spooler type that the MLOps library will use to communicate with the monitoring agent. The following are valid spooler types:
  • FILESYSTEM: Enable local filesystem spooler.
  • SQS: Enable Amazon SQS spooler.
  • RABBITMQ: Enable RabbitMQ spooler.
  • KAFKA: Enable Apache Kafka or Azure Event Hubs spooler.
  • PUBSUB: Enable Google Cloud Pub/Sub spooler.
  • NONE: Disable MLOps library reporting.
  • STDOUT: Print the reported metrics to stdout rather than forward them to the agent.
  • API: Enable DataRobot API spooler.
.
Optional configuration
MLOPS_SPOOLER_DEQUEUE_ACK_RECORDS Ensure that the monitoring agent does not dequeue a record until processing is complete. Set this option to true to ensure records are not dropped due to connection errors. Enabling this option is highly recommended. The dequeuing operation behaves as follows for the spooler channels:
  • SQS: Deletes a message.
  • RABBITMQ and PUBSUB: Acknowledges the message as complete.
  • KAFKA and FILESYSTEM: Moves the offset.
MLOPS_ASYNC_REPORTING Enable the MLOps library to asynchronously report metrics to the spooler.
MLOPS_FEATURE_DATA_ROWS_IN_ONE_MESSAGE The number of feature rows that will be in a single message to the spooler.
MLOPS_SPOOLER_CONFIG_RECORD_DELIMITER The delimiter to replace the default value of ; between key-value pairs in a spooler configuration string (e.g., key1=value1;key2=value2 to key1=value1:key2=value2).
MLOPS_SPOOLER_CONFIG_KEY_VALUE_SEPARATOR The separator to replace the default value of = between keys and values in a spooler configuration string (e.g., key1=value1 to key1:value1).

Note

Setting the environment variable here takes precedence over variables definitions specified in the configuration file or configured programmatically.

After setting a spooler type, you can configure the spooler-specific environment variables.

Spooler-specific configurations

Depending on the MLOPS_SPOOLER_TYPE you set, you can provide configuration information as environment variables unique to the supported spoolers.

Filesystem

Use the following environment variable to configure the FILESYSTEM spooler:

Variable Description
MLOPS_FILESYSTEM_DIRECTORY The directory to store the metrics to report to DataRobot. You must create this directory; the program will not create it for you.
Optional configuration
MLOPS_FILESYSTEM_MAX_FILE_SIZE Override the default maximum file size (in bytes).
Default: 1 GB
MLOPS_FILESYSTEM_MAX_NUM_FILE Override the default maximum number of files.
Default: 10 files

Amazon SQS

When using Amazon SQS as a spooler, you can provide your credential set in either of two ways:

  • Set your credentials in the AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_REGION or AWS_DEFAULT_REGION environment variables. Only AWS software packages use these credentials; DataRobot doesn't access them.

  • If you are in an AWS environment, create an AWS IAM (Identity and Access Management) role for credential authentication.

Use one of the following environment variables to configure the SQS spooler:

Variable Description
MLOPS_SQS_QUEUE_URL The URL of the SQS queue used for the spooler.
MLOPS_SQS_QUEUE_NAME The queue name of the SQS queue used for the spooler.

Note

When using the SQS spooler type, only provide the spooler name or the URL.

RabbitMQ

Use the following environment variables to configure the RABBITMQ spooler:

Variable Description
MLOPS_RABBITMQ_QUEUE_URL The URL of the RabbitMQ queue used for the spooler.
MLOPS_RABBITMQ_QUEUE_NAME The queue name of the RabbitMQ queue used for the spooler.
Optional configuration
MLOPS_RABBITMQ_SSL_CA_CERTIFICATE_PATH The path to the CA certificate file (.pem file).
MLOPS_RABBITMQ_SSL_CERTIFICATE_PATH The path to the client certificate (.pem file).
MLOPS_RABBITMQ_SSL_KEYFILE_PATH The path to the client key (.pem file).
MLOPS_RABBITMQ_SSL_TLS_VERSION The TLS version used for the client. The TLS version must match server version.

Note

RabbitMQ configuration requires keys in RSA format without a password. You can convert keys from PKCS8 to RSA as follows:

openssl rsa -in mykey_pkcs8_format.pem -text > mykey_rsa_format.pem

To generate keys, see RabbitMQ TLS Support.

Google Cloud Pub/Sub

When using Google Cloud PUBSUB as a spooler, you must provide the appropriate credentials in the GOOGLE_APPLICATION_CREDENTIALS environment variable. Only Google Cloud software packages use these credentials; DataRobot doesn't access them.

Use the following environment variables to configure the PUBSUB spooler:

Variable Description
MLOPS_PUBSUB_PROJECT_ID The Pub/Sub project ID of the project used by the spooler; this should be the full path of the project ID.
MLOPS_PUBSUB_TOPIC_NAME The Pub/Sub topic name of the topic used by the spooler; this should be the topic name within the project, not the fully qualified topic name path that includes the project ID.
MLOPS_PUBSUB_SUBSCRIPTION_NAME The Pub/Sub subscription name of the subscription used by the spooler.

Apache Kafka

Use the following environment variables to configure the Apache KAFKA spooler:

Variable Description
MLOPS_KAFKA_TOPIC_NAME The name of the specific Kafka topic to produce to or consume from.
Apache Kafka Reference: Main Concepts and Terminology
MLOPS_KAFKA_BOOTSTRAP_SERVERS The list of servers that the agent connects to. Use the same syntax as the bootstrap.servers config used upstream.
Apache Kafka Reference: bootstrap.servers
Optional configuration
MLOPS_KAFKA_CONSUMER_POLL_TIMEOUT_MS The amount of time to wait while consuming messages before processing them and sending them to DataRobot
Default value: 3000 ms.
MLOPS_KAFKA_CONSUMER_GROUP_ID A unique string that identifies the consumer group this consumer belongs to.
Default value: tracking-agent.
Apache Kafka Reference: group.id
MLOPS_KAFKA_CONSUMER_MAX_NUM_MESSAGES The maximum number of messages to consume at one time before processing them and sending the results to DataRobot MLOps.
Default value: 500
Apache Kafka Reference: max.poll.records
MLOPS_KAFKA_SESSION_TIMEOUT_MS The timeout used to detect client failures in the consumer group.
Apache Kafka Reference: session-timeout.ms
MLOPS_KAFKA_MESSAGE_BYTE_SIZE_LIMIT The maximum chunk size when producing events to the channel.
Default value: 1000000 bytes
MLOPS_KAFKA_DELIVERY_TIMEOUT_MS The absolute upper bound amount of time to send messages before considering it permanently failed.
Apache Kafka Reference: delivery.timeout.ms
MLOPS_KAFKA_REQUEST_TIMEOUT_MS The maximum amount of time a client will wait for a response to a request before retrying.
Apache Kafka Reference: request.timeout.ms
MLOPS_KAFKA_METADATA_MAX_AGE_MS The maximum amount of time (in ms) the client will wait before refreshing its cluster metadata.
Apache Kafka Reference: metadata.max.age.ms
MLOPS_KAFKA_SECURITY_PROTOCOL Protocols used to connect to the brokers.
Apache Kafka Reference: security.protocol valid values.
MLOPS_KAFKA_SASL_MECHANISM The mechanism clients use to authenticate with the broker.
Apache Kafka Reference: sasl.mechanism
MLOPS_KAFKA_SASL_JAAS_CONFIG (Java only) Connection settings in a format used by JAAS configuration files.
Apache Kafka Reference: sasl.jaas.config
MLOPS_KAFKA_SASL_LOGIN_CALLBACK_CLASS (Java only) A custom login handler class.
Apache Kafka Reference: sasl.login.callback.handler.class
MLOPS_KAFKA_CONNECTIONS_MAX_IDLE_MS (Java only) The maximum amount of time (in ms) before the client closes an inactive connection. This value should be set lower than any timeouts your network infrastructure may impose.
Apache Kafka Reference: connections.max.idle.ms
MLOPS_KAFKA_SASL_USERNAME (Python only) SASL username for use with the PLAIN and SASL-SCRAM-* mechanisms.
Reference: See the sasl.username setting in librdkafka.
MLOPS_KAFKA_SASL_PASSWORD (Python only) SASL password for use with the PLAIN and SASL-SCRAM-* mechanisms.
Reference: See the sasl.password setting in librdkafka
MLOPS_KAFKA_SASL_OAUTHBEARER_CONFIG (Python only) Custom configuration to pass the OAuth login callback.
Reference: See the sasl.oauthbearer.config setting in librdkafka
MLOPS_KAFKA_SOCKET_KEEPALIVE (Python only) Enable TCP keep-alive on network connections, sending packets over those connections periodically to prevent the required connections from being closed due to inactivity.
Reference: See the socket.keepalive.enable setting in librdkafka

DataRobot API

The process to configure the DataRobot API spooler is different than typical spooler configuration. Usually, the monitoring agent connects to the spooler, gathers information, and sends that information to DataRobot MLOps. Using the DataRobot API, you do not actually connect to a spooler, and the calls you make to the MLOps library are unchanged. The calls do not go to a spooler or the monitoring agent, and instead go directly to DataRobot MLOps via HTTPS. In this case, you do not need to configure a complex spooler and monitoring agent.

Use the following parameters to configure the DataRobot Python API spooler:

Parameter Description
MLOPS_SERVICE_URL Specify the service URL to access MLOps via this environment variable instead of specifying it in the YAML configuration file.
MLOPS_API_TOKEN The DataRobot API key.
VERIFY SSL (Boolean) (Optional) Determines whether the client should verify the SSL connection. Set to True by default.
MLOPS_HTTP_RETRY (Optional) The number of retries for a successful call.
API_POST_TIMEOUT_SECONDS (Optional) Sets the timeout value.
API_HTTP_RETRY_WAIT_SECONDS (Optional) Determines how long to wait after a timeout before retrying.

Azure Event Hubs

DataRobot allows you to use Microsoft Azure Event Hubs as a monitoring agent spooler by leveraging the existing Kafka spooler type. To set this up, see Using Azure Event Hubs from Apache Kafka applications.

Note

Azure supports the Kafka protocol for Event Hubs only for the Standard and Premium pricing tiers. The Basic tier does not offer Kafka API support, so it is not supported as a spooler for the monitoring agent. See Azure Event Hubs quotas and limits for details.

To use Azure Event Hubs as a spooler, you need to set up authentication for the monitoring agent and MLOps library using one of these methods:

SAS-based authentication for Event Hubs

To use Event Hubs SAS-based authentication for the monitoring agent and MLOps library, set the following environment variables using the example shell fragment below:

Sample environment variables script for SAS-based authentication
# Azure recommends setting the following values; see:
# https://docs.microsoft.com/en-us/azure/event-hubs/apache-kafka-configurations
export MLOPS_KAFKA_REQUEST_TIMEOUT_MS='60000'
export MLOPS_KAFKA_SESSION_TIMEOUT_MS='30000'
export MLOPS_KAFKA_METADATA_MAX_AGE_MS='180000'

# Common configuration variables for both Java- and Python-based libraries.
export MLOPS_KAFKA_BOOTSTRAP_SERVERS='XXXX.servicebus.windows.net:9093'
export MLOPS_KAFKA_SECURITY_PROTOCOL='SASL_SSL'
export MLOPS_KAFKA_SASL_MECHANISM='PLAIN'

# The following setting is specific to the Java SDK (and the monitoring agent daemon)
export MLOPS_KAFKA_SASL_JAAS_CONFIG='org.apache.kafka.common.security.plain.PlainLoginModule required username="$ConnectionString" password="Endpoint=sb://XXXX.servicebus.windows.net/;SharedAccessKeyName=XXXX;SharedAccessKey=XXXX";'

# For the Python SDK, you will need the following settings (in addition to the common ones above)
export MLOPS_KAFKA_SASL_USERNAME='$ConnectionString'
export MLOPS_KAFKA_SASL_PASSWORD='Endpoint=sb://XXXX.servicebus.windows.net/;SharedAccessKeyName=XXX;SharedAccessKey=XXXX'

Note

The environment variable values above use single-quotes (') to ensure that the special characters $ and " are not interpreted by the shell when setting variables. If you are setting environment variables via DataBricks, you should follow their guidelines on escaping special characters for the version of the platform you are using.

Azure Active Directory OAuth 2.0 for Event Hubs

DataRobot supports Azure Active Directory OAuth 2.0 for Event Hubs authentication. To use this authentication method, you must create a new Application Registration with the necessary permissions over your Event Hubs Namespace (i.e., Azure Event Hubs Data Owner). See Authenticate an application with Azure AD to access Event Hubs resources for details.

To use Event Hubs Azure Active Directory OAuth 2.0 authentication, set the following environment variables using the example shell fragment below:

Sample environment variables script for Azure AD OAuth 2.0 authentication
# Azure recommends setting the following values; see:
# https://docs.microsoft.com/en-us/azure/event-hubs/apache-kafka-configurations
export MLOPS_KAFKA_REQUEST_TIMEOUT_MS='60000'
export MLOPS_KAFKA_SESSION_TIMEOUT_MS='30000'
export MLOPS_KAFKA_METADATA_MAX_AGE_MS='180000'

# Common configuration variables for both Java- and Python-based libraries.
export MLOPS_KAFKA_BOOTSTRAP_SERVERS='XXXX.servicebus.windows.net:9093'
export MLOPS_KAFKA_SECURITY_PROTOCOL='SASL_SSL'
export MLOPS_KAFKA_SASL_MECHANISM='OAUTHBEARER'

# The following setting is specific to the Java SDK (and the tracking-agent daemon)
export MLOPS_KAFKA_SASL_JAAS_CONFIG='org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginModule required aad.tenant.id="XXXX" aad.client.id="XXXX" aad.client.secret="XXXX";'
export MLOPS_KAFKA_SASL_LOGIN_CALLBACK_CLASS='com.datarobot.mlops.spooler.kafka.ActiveDirectoryAuthenticateCallbackHandler'

# For the Python SDK, you will need the following settings (in addition to the common ones above)
export MLOPS_KAFKA_SASL_OAUTHBEARER_CONFIG='aad.tenant.id=XXXX-XXXX-XXXX-XXXX-XXXX, aad.client.id=XXXX-XXXX-XXXX-XXXX-XXXX, aad.client.secret=XXXX'

Note

Some environment variable values contain double quotes ("). Take care when setting environment variables that include this special character (or others).

Dynamically load required spoolers in a Java application

To configure Monitoring Agent spoolers using third-party code, you can dynamically load a separate JAR file for the required spooler. This configuration is required for the Amazon SQS, RabbitMQ, Google Cloud Pub/Sub, and Apache Kafka spoolers. The natively supported file system spooler is configurable without loading a JAR file.

Note

Previously, the datarobot-mlops and mlops-agent packages included all spooler types by default; however, that configuration meant the code was always present, even if it was unused.

Include spooler dependencies in the project object model

To use a third-party spooler in your MLOps Java application, you must include the required spoolers as dependencies in your POM (Project Object Model) file, along with datarobot-mlops:

Dependencies in a POM file
<properties>
    <mlops.version>8.3.0</mlops.version>
</properties>

    <dependency>
        <groupId>com.datarobot</groupId>
        <artifactId>datarobot-mlops</artifactId>
        <version>${mlops.version}</version>
    </dependency>
    <dependency>
        <groupId>com.datarobot</groupId>
        <artifactId>spooler-sqs</artifactId>
        <version>${mlops.version}</version>
    </dependency>
    <dependency>
        <groupId>com.datarobot</groupId>
        <artifactId>spooler-rabbitmq</artifactId>
        <version>${mlops.version}</version>
    </dependency>
    <dependency>
        <groupId>com.datarobot</groupId>
        <artifactId>spooler-pubsub</artifactId>
        <version>${mlops.version}</version>
    </dependency>
    <dependency>
        <groupId>com.datarobot</groupId>
        <artifactId>spooler-kafka</artifactId>
        <version>${mlops.version}</version>
    </dependency>

Provide an executable JAR file for the spooler

The spooler JAR files are included in the MLOps agent tarball. They are also available individually as downloadable JAR files in the public Maven repository for the DataRobot MLOps Agent.

To use a third-party spooler with the executable agent JAR file, add the path to the spooler to the classpath:

Classpath without spooler
java ... -cp path/to/mlops-agent-8.2.0.jar com.datarobot.mlops.agent.Agent
Classpath with Kafka spooler
java ... -cp path/to/mlops-agent-8.3.0.jar:path/to/spooler-kafka-8.3.0.jar com.datarobot.mlops.agent.Agent

The start-agent.sh script provided as an example automatically performs this task, adding any spooler JAR files found in the lib directory to the classpath. If your spooler JAR files are in a different directory, set the MLOPS_SPOOLER_JAR_PATH environment variable.

  • If a dynamic spooler is loaded successfully, the Monitoring Agent logs an INFO message: Creating spooler type <type>: success.

  • If loading a dynamic spooler fails, the Monitoring Agent logs an ERROR message: Creating spooler type <type>: failed, followed by the reason (a class not found error, indicating a missing dependency) or more details (a system exception message, helping you diagnose the issue). If the class was not found, ensure the dependency for the spooler is included in the application's POM. Missing dependencies will not be discovered until runtime.

  • If a dynamic spooler is loaded successfully, the Monitoring Agent logs an INFO message: Creating spooler type <type>: success.

  • If loading a dynamic spooler fails, the Monitoring Agent logs an ERROR message: Creating spooler type <type>: failed, followed by the reason (a class not found error, indicating a missing JAR file) or more details (a system exception message, helping you diagnose the issue). If the class was not found, ensure the matching JAR file for that spooler is included in the classpath of the java command that starts the agent.

Tip

If the agent is configured with a predictionEnvironmentId and can connect to DataRobot, the agent sends an MLOps Spooler Channel Failed event to DataRobot MLOps with information from the log message. These events appear in the event log on the Service Health page of any deployment associated with that prediction environment. You can also create a notification channel and policy to be notified (by email, Slack, or webhook) of these errors.


Updated February 16, 2024