Skip to content

On-premise users: click in-app to access the full platform documentation for your version of DataRobot.

Agent event log

On a deployment's Service Health tab, under Recent Activity, you can view Management events (e.g., deployment actions) and Monitoring events (e.g., spooler channel and rate limit events).

The Monitoring events can help you quickly diagnose MLOps agent issues. The spooler channel error events can help you diagnose and fix spooler configuration issues. The rate limit enforcement events can help you identify if service health stats, data drift values, or accuracy values aren't updating because you exceeded the API request rate limit.

Enable agent event log

To view Monitoring events, you must provide a predictionEnvironmentID in the agent configuration file (conf\mlops.agent.conf.yaml) as shown below. If you haven't already installed and configured the MLOps agent, see the Installation and configuration guide.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
# This file contains configuration for the MLOps agent

# URL to the DataRobot MLOps service
mlopsUrl: "https://<MLOPS_HOST>"

# DataRobot API token
apiToken: "<MLOPS_API_TOKEN>"

# Execute the agent once, then exit
runOnce: false

# When dryrun mode is true, do not report the metrics to MLOps service
dryRun: false

# When verifySSL is true, SSL certification validation will be performed when
# connecting to MLOps DataRobot. When verifySSL is false, these checks are skipped.
# Note: It is highly recommended to keep this config variable as true.
verifySSL: true

# Path to write agent stats
statsPath: "/tmp/tracking-agent-stats.json"

# Prediction Environment served by this agent.
# Events and errors not specific to a single deployment are reported against this Prediction Environment.
predictionEnvironmentId: "<PE_ID_FROM_DATAROBOT_UI>"

# Number of times the agent will retry sending a request to the MLOps service on failure.
httpRetry: 3

# Http client timeout in milliseconds (30sec timeout)
httpTimeout: 30000

# Number of concurrent http request, default=1 -> synchronous mode; > 1 -> asynchronous
httpConcurrentRequest: 10

# Number of HTTP Connections to establish with the MLOps service, Default: 1
numMLOpsConnections: 1

# Comment out and configure the lines below for the spooler type(s) you are using.
# Note: the spooler configuration must match that used by the MLOps library.
# Note: Spoolers must be set up before using them.
#       - For the filesystem spooler, create the directory that will be used.
#       - For the SQS spooler, create the queue.
#       - For the PubSub spooler, create the project and topic.
#       - For the Kafka spooler, create the topic.
channelConfigs:
- type: "FS_SPOOL"
    details: {name: "filesystem", directory: "/tmp/ta"}
#  - type: "SQS_SPOOL"
#    details: {name: "sqs", queueUrl: "your SQS queue URL", queueName: "<your AWS SQS queue name>"}
#  - type: "RABBITMQ_SPOOL"
#    details: {name: "rabbit",  queueName: <your rabbitmq queue name>,  queueUrl: "amqp://<ip address>",
#              caCertificatePath: "<path_to_ca_certificate>",
#              certificatePath: "<path_to_client_certificate>",
#              keyfilePath: "<path_to_key_file>"}

#  - type: "PUBSUB_SPOOL"
#    details: {name: "pubsub", projectId: <your project ID>, topicName: <your topic name>, subscriptionName: <your sub name>}
#  - type: "KAFKA_SPOOL"
#    details: {name: "kafka", topicName: "<your topic name>", bootstrapServers: "<ip address 1>,<ip address 2>,..."}

# The number of threads that the agent will launch to process data records.
agentThreadPoolSize: 4

# The maximum number of records each thread will process per fetchNewDataFreq interval.
agentMaxRecordsTask: 100

# Maximum number of records to aggregate before sending to DataRobot MLOps
agentMaxAggregatedRecords: 500

# A timeout for pending records before aggregating and submitting
agentPendingRecordsTimeoutMs: 5000

View agent activity

To view the agent event log, on the Service Health tab, navigate to the Recent Activity section. The most recent events appear at the top of the list.

Event information

Each event shows the time it occurred, a description, and an icon indicating its status:

Status icon Description
Green / Passing No action needed.
Red / Failing Immediate action needed.
Gray / Informational Details a deployment action (e.g., deployment launch has started).

Recent activity log

In the Recent Activity log, you can filter the activity list and access additional information:

Element Description
1 Filters Set the Event Type filter to limit the list to Management events (e.g., deployment actions) or Monitoring events (e.g., spooler channel and rate limit events).
2 Events Click an event in the log to view additional Event Details for that event. The Event Details include the Event name, a Timestamp, a Channel Name, the event Type, the associated Prediction Environment, and an event Message.
3 Event Details Click the Prediction Environment name to open the Prediction Environments tab, where you can create, manage, and share prediction environments.

Monitoring events

Monitoring events can help you diagnose and fix MLOps agent issues. Currently, the following events can appear in the Recent Activity log:

Event Description
Monitoring Spooler Channel Identify spooler configuration issues so you can resolve them.
Rate limit was enforced Identify when an operation exceeds API request rate limits, resulting in updates to service health stats, data drift calculations, or accuracy calculations stalling. This event reports how long the affected operation is suspended. Rate limits are applied per deployment, per operation.
What are the rate limits for the deployments API?
Operation Endpoint (POST) Limit
Submit Metrics (Service Health) api/v2/deployments/<id>/predictionRequests/fromJSON/ 1M requests / hour
Submit Prediction Results (Data Drift) api/v2/deployments/<id>/predictionInputs/fromJSON/ 1M requests / hour
Submit Actuals (Accuracy) api/v2/deployments/<id>/actuals/fromJSON/ 40 requests / second

Updated February 15, 2024