Skip to content

Click in-app to access the full platform documentation for your version of DataRobot.

Spooler configuration

The MLOps library communicates to the agent through a spooler, so it is important that the library and agent spooler configurations match. The following spoolers are available for configuration:

  • Filesystem
  • SQS
  • Rabbit MQ
  • PubSub
  • Kafka

In addition to the configuration steps outlined below, reference the environment variables available for the agent and each spooler.

Agent configuration

When running the agent as a separate service, specify the spooler configuration in mlops.agent.conf.yaml by uncommenting and filling out the channelConfigs section.

For a filesystem spooler, the agent is configured by the MLOps library when it is run through it.

MLOps library configuration

The MLOps library can be configured programmatically or by using environment variables. To configure the spooler programmatically, specify the spooler during the MLOps init call. For example, to configure the filesystem spooler using the Python library:

    mlops = MLOps().set_filesystem_spooler(your_spooler_directory).init()

Equivalent interfaces exist for other spooler types. See the MLOps API documentation. To access the MLOps API documentation, sign in to DataRobot, click the question mark on the upper right, select API Documentation. In the table of contents, select MLOps. complete list.

To configure the library via environment variables, see below.

Environment variables

The MLOps library and agent can be configured with the following environment variables.

General configuration

Variable Description
MLOPS_DEPLOYMENT_ID Set the deployment ID that the MLOps library reports to.
MLOPS_MODEL_ID Set the model ID that the MLOps library is reporting about.
MLOPS_SPOOLER_TYPE Set the spooler type that the MLOps library will use to communicate with the agent. Valid options are FILESYSTEM, SQS, RABBITMQ, KAFKA and PUBSUB. The value of NONE can be used to disable MLOps library reporting and a value of STDOUT can be used to have the MLOps library print the reported metrics to stdout rather than forward them to the agent.
Advanced configuration
MLOPS_ASYNC_REPORTING Have the MLOps library report its metrics asynchronously to the spooler.
MLOPS_FEATURE_DATA_ROWS_IN_ONE_MESSAGE The number of feature rows that should be included in a single message to the spooler.

Spooler-specific configuration

The following sections detail the environment variables unique to each spooler.

Filesystem

Variable Description
MLOPS_FILESYSTEM_DIRECTORY When using the FILESYSTEM spooler type, use this directory to store the metrics.

SQS

When using Amazon's SQS as a spooler, you must set the environment variables AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_REGION or AWS_DEFAULT_REGION. These credentials are used by AWS' software packages only and not accessed by DataRobot.

Variable Description
MLOPS_SQS_QUEUE_URL When using the SQS spooler type, use this URL as the spooler. When using SQS, either the spooler name or URL must be provided.
MLOPS_SQS_QUEUE_NAME When using the SQS spooler type, use this queue name as the spooler. When using SQS, either the spooler name or URL must be provided.

RabbitMQ

Variable Description
MLOPS_RABBITMQ_QUEUE_URL When using the RABBITMQ spooler type, use this URL for the spooler.
MLOPS_RABBITMQ_QUEUE_NAME When using the RABBITMQ spooler type, use this queue name.
MLOPS_RABBITMQ_SSL_CA_CERTIFICATE_PATH Path to the CA certificate file (.pem file).
MLOPS_RABBITMQ_SSL_CERTIFICATE_PATH Path to the client certificate (.pem file).
MLOPS_RABBITMQ_SSL_KEYFILE_PATH Path to the client key (.pem file).
MLOPS_RABBITMQ_SSL_TLS_VERSION The TLS version used for the client. The TLS version must match server version.

Note

RabbitMQ configuration requires keys in RSA format, without a password. You can convert keys from PKCS8 to RSA as follows:

openssl rsa -in mykey_pkcs8_format.pem -text > mykey_rsa_format.pem

To generate keys, see RabbitMQ TLS Support.

PubSub

When using PubSub, appropriate credentials must be provided. The environment variable GOOGLE_APPLICATION_CREDENTIALS must be set appropriately. These credentials are used by AWS' software packages only and not accessed by DataRobot.

Variable Description
MLOPS_PUBSUB_PROJECT_ID When using the PUBSUB spooler type, use this project ID. Note this should be the full path of the project ID.
MLOPS_PUBSUB_TOPIC_NAME When using the PUBSUB spooler type, use this topic name. Note this should not include the project id.

Kafka

Variable Description
MLOPS_KAFKA_TOPIC_NAME The name of the specific Kafka topic to produce to or consume from. Reference the Kafka terminology for more information.
MLOPS_KAFKA_BOOTSTRAP_SERVERS The list of servers that the agent connects to. Use the same syntax as the bootstrap.servers config, used upstream.
MLOPS_KAFKA_CONSUMER_POLL_TIMEOUT_MS The amount of time to wait while consuming messages before processing them and sending them to DataRobot (default 3000 ms).
MLOPS_KAFKA_MESSAGE_BYTE_SIZE_LIMIT The maximum chunk size when producing events to the channel (default 1048588 bytes).
MLOPS_KAFKA_CONFIG_LOCATION The filesystem path to an optional config file used to specify any of the additional config options that the Kafka producer or consumer supports. The file must be in INI format. Parameters must be in one of the following sections: all, Java, or Python (default path is ~/.datarobot-mlops/kafka.conf).
MLOPS_KAFKA_AUTO_RELEASE_OFFSET Determines how the consumer behaves when reconnecting to a topic. Note changing this value may cause duplicate events to be sent to DataRobot MLOps. Reference the documentation for a list of values.
MLOPS_KAFKA_MAX_FLUSH_MS The maximum chunk size when producing events to the channel (default 1048588 bytes). Reference the Kafka documentation for more information.
MLOPS_KAFKA_CONSUMER_GROUP_ID A unique string that identifies the consumer group this consumer belongs to. Default value: tracking-agent. Reference the Kafka documentation for more information.
MLOPS_KAFKA_CONSUMER_MAX_NUM_MESSAGES The maximum number of messages to consume at one time before processing them and sending the results to DataRobot MLOps. Default value: 500. Reference the Kafka documentation for more information.

Updated November 11, 2021
Back to top