Monitoring agent installation and configuration¶
When the monitoring agent is running, it looks for buffered messages in the configured directory or a message queuing system and forwards them. To forward buffered messages from the MLOps library to DataRobot MLOps, install and configure the monitoring agent as indicated below.
-
Unpack the MLOps .tar file:
tar -xvf datarobot_mlops_package-*.tar.gz
-
Update the configuration file:
cd datarobot_mlops_package-*; <your-favorite-editor> ./conf/mlops.agent.conf.yaml
-
Configure the monitoring agent:
In the agent configuration file,
conf\mlops.agent.conf.yaml
, you must update the values formlopsUrl
andapiToken
. By default, the agent will use thefilesystem
channel. If you use thefilesystem
channel, make sure you create the spooler directory (by default, this is/tmp/ta
).Important
For the
filesystem
spooler channel, thedirectory
path you provide must be an absolute path (containing the complete directory list) for the agent to access the/tmp/ta
directory (or a custom directory you create).If you want to use a different channel, follow the comments in the agent configuration file to update the path.
mlops.agent.conf.yaml# This file contains configuration for the MLOps agent # URL to the DataRobot MLOps service mlopsUrl: "https://<MLOPS_HOST>" # DataRobot API token apiToken: "<MLOPS_API_TOKEN>" # Execute the agent once, then exit runOnce: false # When dryrun mode is true, do not report the metrics to MLOps service dryRun: false # When verifySSL is true, SSL certification validation will be performed when # connecting to MLOps DataRobot. When verifySSL is false, these checks are skipped. # Note: It is highly recommended to keep this config variable as true. verifySSL: true # Path to write agent stats statsPath: "/tmp/tracking-agent-stats.json" # Prediction Environment served by this agent. # Events and errors not specific to a single deployment are reported against this Prediction Environment. # predictionEnvironmentId: "<PE_ID_FROM_DATAROBOT_UI>" # Number of times the agent will retry sending a request to the MLOps service on failure. httpRetry: 3 # Http client timeout in milliseconds (30sec timeout) httpTimeout: 30000 # Number of concurrent http request, default=1 -> synchronous mode; > 1 -> asynchronous httpConcurrentRequest: 10 # Number of HTTP Connections to establish with the MLOps service, Default: 1 numMLOpsConnections: 1 # Comment out and configure the lines below for the spooler type(s) you are using. # Note: The spooler configuration must match that used by the MLOps library. # Note: The filesystem spooler directory must be an absolute path to the "/tmp/ta" directory. # Note: Spoolers must be set up before using them. # - For the filesystem spooler, create the directory that will be used. # - For the SQS spooler, create the queue. # - For the PubSub spooler, create the project and topic. # - For the Kafka spooler, create the topic. channelConfigs: - type: "FS_SPOOL" details: {name: "filesystem", directory: "<path_to_spooler_directory>/tmp/ta"} # - type: "SQS_SPOOL" # details: {name: "sqs", queueUrl: "your SQS queue URL", queueName: "<your AWS SQS queue name>"} # - type: "RABBITMQ_SPOOL" # details: {name: "rabbit", queueName: <your rabbitmq queue name>, queueUrl: "amqp://<ip address>", # caCertificatePath: "<path_to_ca_certificate>", # certificatePath: "<path_to_client_certificate>", # keyfilePath: "<path_to_key_file>"} # - type: "PUBSUB_SPOOL" # details: {name: "pubsub", projectId: <your project ID>, topicName: <your topic name>, subscriptionName: <your sub name>} # - type: "KAFKA_SPOOL" # details: {name: "kafka", topicName: "<your topic name>", bootstrapServers: "<ip address 1>,<ip address 2>,..."} # The number of threads that the agent will launch to process data records. agentThreadPoolSize: 4 # The maximum number of records each thread will process per fetchNewDataFreq interval. agentMaxRecordsTask: 100 # Maximum number of records to aggregate before sending to DataRobot MLOps agentMaxAggregatedRecords: 500 # A timeout for pending records before aggregating and submitting agentPendingRecordsTimeoutMs: 5000
-
To run the monitoring agent natively in Docker, first build the
datarobot/mlops-tracking-agent
image from the MLOps agent tarball:make build -C tools/agent_docker
-
Configure the monitoring agent in Docker, mounted to the default directory or a custom location:
-
To run the monitoring agent with the configuration mounted to the default directory:
docker run \ -v /path/to/mlops.agent.conf.yaml:/opt/datarobot/mlops/agent/conf/mlops.agent.conf.yaml \ datarobot/mlops-tracking-agent
-
To run the monitoring agent with the configuration mounted to a custom location:
docker run \ -v /path/to/mlops.agent.conf.yaml:/var/tmp/mlops.agent.conf.yaml \ -e MLOPS_AGENT_CONFIG_YAML=/var/tmp/mlops.agent.conf.yaml \ datarobot/mlops-tracking-agent
-
Use the monitoring agent¶
Once the monitoring agent is configured, you can run the agent, check the agent status, and shut down the agent.
Run the monitoring agent¶
Start the agent using the config file:
cd datarobot_mlops_package-*;
./bin/start-agent.sh
Alternatively, start the agent using environment variables:
export AGENT_CONFIG_YAML=<path/to/conf/mlops.agent.conf.yaml>
export AGENT_LOG_PROPERTIES=<path/to/conf/mlops.log4j2.properties>
export AGENT_JVM_OPT=-Xmx4G
export AGENT_JAR_PATH=<path/to/bin/mlops-agent-ver.jar>
./bin/start-agent.sh
For a complete reference of the available environment variables, see MLOps agent environment variables.
Check the agent's status¶
To check the agent's status:
./bin/status-agent.sh
./bin/status-agent.sh --verbose
Shut down the agent¶
To shut down the agent:
./bin/stop-agent.sh