Skip to content

Click in-app to access the full platform documentation for your version of DataRobot.

Monitoring agent installation and configuration

When the monitoring agent is running, it looks for buffered messages in the configured directory or a message queuing system and forwards them. To forward buffered messages from the MLOps library to DataRobot MLOps, install and configure the monitoring agent as indicated below.

  1. Unpack the MLOps .tar file:

    tar -xvf datarobot_mlops_package-*.tar.gz
    
  2. Update the configuration file:

    cd datarobot_mlops_package-*;
    <your-favorite-editor> ./conf/mlops.agent.conf.yaml
    
  3. Configure the monitoring agent:

    In the agent configuration file, conf\mlops.agent.conf.yaml, you must update the values for mlopsUrl and apiToken. By default, the agent will use the filesystem channel. If you use the filesystem channel, make sure you create the spooler directory (by default, this is /tmp/ta). If you want to use a different channel, update the agent configuration file following the comments in the file.

    mlops.agent.conf.yaml
    # This file contains configuration for the MLOps agent
    
    # URL to the DataRobot MLOps service
    mlopsUrl: "https://<MLOPS_HOST>"
    
    # DataRobot API token
    apiToken: "<MLOPS_API_TOKEN>"
    
    # Execute the agent once, then exit
    runOnce: false
    
    # When dryrun mode is true, do not report the metrics to MLOps service
    dryRun: false
    
    # When verifySSL is true, SSL certification validation will be performed when
    # connecting to MLOps DataRobot. When verifySSL is false, these checks are skipped.
    # Note: It is highly recommended to keep this config variable as true.
    verifySSL: true
    
    # Path to write agent stats
    statsPath: "/tmp/tracking-agent-stats.json"
    
    # Prediction Environment served by this agent.
    # Events and errors not specific to a single deployment are reported against this Prediction Environment.
    # predictionEnvironmentId: "<PE_ID_FROM_DATAROBOT_UI>"
    
    # Number of times the agent will retry sending a request to the MLOps service on failure.
    httpRetry: 3
    
    # Http client timeout in milliseconds (30sec timeout)
    httpTimeout: 30000
    
    # Number of concurrent http request, default=1 -> synchronous mode; > 1 -> asynchronous
    httpConcurrentRequest: 10
    
    # Number of HTTP Connections to establish with the MLOps service, Default: 1
    numMLOpsConnections: 1
    
    # Comment out and configure the lines below for the spooler type(s) you are using.
    # Note: the spooler configuration must match that used by the MLOps library.
    # Note: Spoolers must be set up before using them.
    #       - For the filesystem spooler, create the directory that will be used.
    #       - For the SQS spooler, create the queue.
    #       - For the PubSub spooler, create the project and topic.
    #       - For the Kafka spooler, create the topic.
    channelConfigs:
    - type: "FS_SPOOL"
        details: {name: "filesystem", directory: "/tmp/ta"}
    #  - type: "SQS_SPOOL"
    #    details: {name: "sqs", queueUrl: "your SQS queue URL", queueName: "<your AWS SQS queue name>"}
    #  - type: "RABBITMQ_SPOOL"
    #    details: {name: "rabbit",  queueName: <your rabbitmq queue name>,  queueUrl: "amqp://<ip address>",
    #              caCertificatePath: "<path_to_ca_certificate>",
    #              certificatePath: "<path_to_client_certificate>",
    #              keyfilePath: "<path_to_key_file>"}
    
    #  - type: "PUBSUB_SPOOL"
    #    details: {name: "pubsub", projectId: <your project ID>, topicName: <your topic name>, subscriptionName: <your sub name>}
    #  - type: "KAFKA_SPOOL"
    #    details: {name: "kafka", topicName: "<your topic name>", bootstrapServers: "<ip address 1>,<ip address 2>,..."}
    
    # The number of threads that the agent will launch to process data records.
    agentThreadPoolSize: 4
    
    # The maximum number of records each thread will process per fetchNewDataFreq interval.
    agentMaxRecordsTask: 100
    
    # Maximum number of records to aggregate before sending to MMM
    agentMaxAggregatedRecords: 500
    
    # A timeout for pending records before aggregating and submitting
    agentPendingRecordsTimeoutMs: 5000
    
  1. To run the monitoring agent natively in Docker, first build the datarobot/mlops-tracking-agent image from the MLOps agent tarball:

    make build -C tools/agent_docker
    
  2. Configure the monitoring agent in Docker, mounted to the default directory or a custom location:

    • To run the monitoring agent with the configuration mounted to the default directory:

      docker run \
          -v /path/to/mlops.agent.conf.yaml:/opt/datarobot/mlops/agent/conf/mlops.agent.conf.yaml \
          datarobot/mlops-tracking-agent
      
    • To run the monitoring agent with the configuration mounted to a custom location:

      docker run \
          -v /path/to/mlops.agent.conf.yaml:/var/tmp/mlops.agent.conf.yaml \
          -e MLOPS_AGENT_CONFIG_YAML=/var/tmp/mlops.agent.conf.yaml \
          datarobot/mlops-tracking-agent
      

Use the monitoring agent

Once the monitoring agent is configured, you can run the agent, check the agent status, and shut down the agent.

Run the monitoring agent

Start the agent using the config file:

cd datarobot_mlops_package-*;
./bin/start-agent.sh

Alternatively, start the agent using environment variables:

export AGENT_CONFIG_YAML=<path/to/conf/mlops.agent.conf.yaml>
export AGENT_LOG_PROPERTIES=<path/to/conf/mlops.log4j2.properties>
export AGENT_JVM_OPT=-Xmx4G
export AGENT_JAR_PATH=<path/to/bin/mlops-agent-ver.jar>
./bin/start-agent.sh

For a complete reference of the available environment variables, see MLOps agent environment variables.

Check the agent's status

To check the agent's status:

Check status
./bin/status-agent.sh
Check status with real-time resource usage
./bin/status-agent.sh --verbose

Shut down the agent

To shut down the agent:

./bin/stop-agent.sh

Updated August 2, 2022
Back to top