MLOps > Deployment > MLOps agents > Monitoring agent > Installation and configuration

Monitoring agent installation and configuration¶

When the monitoring agent is running, it looks for buffered messages in the configured directory or a message queuing system and forwards them. To forward buffered messages from the MLOps library to DataRobot MLOps, install and configure the monitoring agent as indicated below.

Run on a host machineRun natively in Docker

Unpack the MLOps .tar file:


tar -xvf datarobot_mlops_package-*.tar.gz

Update the configuration file:


cd datarobot_mlops_package-*;
<your-favorite-editor> ./conf/mlops.agent.conf.yaml

Configure the monitoring agent:

In the agent configuration file, conf\mlops.agent.conf.yaml, you must update the values for mlopsUrl and apiToken. By default, the agent will use the filesystem channel. If you use the filesystem channel, make sure you create the spooler directory (by default, this is /tmp/ta).

Important

For the filesystem spooler channel, the directory path you provide must be an absolute path (containing the complete directory list) for the agent to access the /tmp/ta directory (or a custom directory you create).

If you want to use a different channel, follow the comments in the agent configuration file to update the path.

mlops.agent.conf.yaml


# This file contains configuration for the MLOps agent

# URL to the DataRobot MLOps service
mlopsUrl: "https://<MLOPS_HOST>"

# DataRobot API token
apiToken: "<MLOPS_API_TOKEN>"

# Execute the agent once, then exit
runOnce: false

# When dryrun mode is true, do not report the metrics to MLOps service
dryRun: false

# When verifySSL is true, SSL certification validation will be performed when
# connecting to MLOps DataRobot. When verifySSL is false, these checks are skipped.
# Note: It is highly recommended to keep this config variable as true.
verifySSL: true

# Path to write agent stats
statsPath: "/tmp/tracking-agent-stats.json"

# Prediction Environment served by this agent.
# Events and errors not specific to a single deployment are reported against this Prediction Environment.
# predictionEnvironmentId: "<PE_ID_FROM_DATAROBOT_UI>"

# Number of times the agent will retry sending a request to the MLOps service on failure.
httpRetry: 3

# Http client timeout in milliseconds (30sec timeout)
httpTimeout: 30000

# Number of concurrent http request, default=1 -> synchronous mode; > 1 -> asynchronous
httpConcurrentRequest: 10

# Number of HTTP Connections to establish with the MLOps service, Default: 1
numMLOpsConnections: 1

# Comment out and configure the lines below for the spooler type(s) you are using.
# Note: The spooler configuration must match that used by the MLOps library.
# Note: The filesystem spooler directory must be an absolute path to the "/tmp/ta" directory.
# Note: Spoolers must be set up before using them.
#       - For the filesystem spooler, create the directory that will be used.
#       - For the SQS spooler, create the queue.
#       - For the PubSub spooler, create the project and topic.
#       - For the Kafka spooler, create the topic.
channelConfigs:
- type: "FS_SPOOL"
    details: {name: "filesystem", directory: "<path_to_spooler_directory>/tmp/ta"}
#  - type: "SQS_SPOOL"
#    details: {name: "sqs", queueUrl: "your SQS queue URL", queueName: "<your AWS SQS queue name>"}
#  - type: "RABBITMQ_SPOOL"
#    details: {name: "rabbit",  queueName: <your rabbitmq queue name>,  queueUrl: "amqp://<ip address>",
#              caCertificatePath: "<path_to_ca_certificate>",
#              certificatePath: "<path_to_client_certificate>",
#              keyfilePath: "<path_to_key_file>"}

#  - type: "PUBSUB_SPOOL"
#    details: {name: "pubsub", projectId: <your project ID>, topicName: <your topic name>, subscriptionName: <your sub name>}
#  - type: "KAFKA_SPOOL"
#    details: {name: "kafka", topicName: "<your topic name>", bootstrapServers: "<ip address 1>,<ip address 2>,..."}

# The number of threads that the agent will launch to process data records.
agentThreadPoolSize: 4

# The maximum number of records each thread will process per fetchNewDataFreq interval.
agentMaxRecordsTask: 100

# Maximum number of records to aggregate before sending to DataRobot MLOps
agentMaxAggregatedRecords: 500

# A timeout for pending records before aggregating and submitting
agentPendingRecordsTimeoutMs: 5000

To run the monitoring agent natively in Docker, first build the datarobot/mlops-tracking-agent image from the MLOps agent tarball:
```
make build -C tools/agent_docker
```

Configure the monitoring agent in Docker, mounted to the default directory or a custom location:

To run the monitoring agent with the configuration mounted to the default directory:


docker run \
    -v /path/to/mlops.agent.conf.yaml:/opt/datarobot/mlops/agent/conf/mlops.agent.conf.yaml \
    datarobot/mlops-tracking-agent

To run the monitoring agent with the configuration mounted to a custom location:


docker run \
    -v /path/to/mlops.agent.conf.yaml:/var/tmp/mlops.agent.conf.yaml \
    -e MLOPS_AGENT_CONFIG_YAML=/var/tmp/mlops.agent.conf.yaml \
    datarobot/mlops-tracking-agent

Use the monitoring agent¶

Once the monitoring agent is configured, you can run the agent, check the agent status, and shut down the agent.

Run the monitoring agent¶

Start the agent using the config file:


cd datarobot_mlops_package-*;
./bin/start-agent.sh

Alternatively, start the agent using environment variables:


export AGENT_CONFIG_YAML=<path/to/conf/mlops.agent.conf.yaml>
export AGENT_LOG_PROPERTIES=<path/to/conf/mlops.log4j2.properties>
export AGENT_JVM_OPT=-Xmx4G
export AGENT_JAR_PATH=<path/to/bin/mlops-agent-ver.jar>
./bin/start-agent.sh

For a complete reference of the available environment variables, see MLOps agent environment variables.

Check the agent's status¶

To check the agent's status:

Check status


./bin/status-agent.sh

Check status with real-time resource usage


./bin/status-agent.sh --verbose

Shut down the agent¶

To shut down the agent:


./bin/stop-agent.sh

Monitoring agent installation and configuration¶

Use the monitoring agent¶

Run the monitoring agent¶

Check the agent's status¶

Shut down the agent¶

Was this page helpful?

Great! Let us know what you found helpful.

What can we do to improve the content?