Skip to content

Click in-app to access the full platform documentation for your version of DataRobot.

Monitoring agent use cases

Reference the use cases below for examples of how to apply the monitoring agent:

Enable large-scale monitoring

To support large-scale monitoring, the MLOps library provides a way to calculate statistics from raw data on the client side. Then, instead of reporting raw features and predictions to the DataRobot MLOps service, the client can report anonymized statistics without the feature and prediction data. Reporting prediction data statistics calculated on the client side is the optimal (and highly performant) method compared to reporting raw data, especially at scale (billions of rows of features and predictions). In addition, because client-side aggregation only sends aggregates of feature values, it is suitable for environments where you don't want to disclose the actual feature values.

The large-scale monitoring functionality is available for the Java Software Development Kit (SDK) and the MLOps Spark Utils Library:

Replace calls to reportPredictionsData() with calls to reportAggregatePredictionsData().

Replace calls to reportPredictions() with calls to predictionStatisticsParameters.report().

You can find an example of this use-case in the agent .tar file in examples/java/PredictionStatsSparkUtilsExample.

Note

To support the use of challenger models, you must send raw features. For large datasets, you can report a small sample of raw feature and prediction data to support challengers and reporting; then, you can send the remaining data in aggregate format.

Map supported feature types

Currently, large-scale monitoring supports numeric and categorical features. When configuring this monitoring method, you must map each feature name to the corresponding feature type (either numeric or categorical). When mapping feature types to feature names, there is a method for Scoring Code models and a method for all other models.

Often, a model can output the feature name and the feature type using an existing access method; however, if access is not available, you may have to manually categorize each feature you want to aggregate as Numeric or Categorical.

Map a feature type (Numeric or Categorical) to each feature name using the setFeatureTypes method on predictionStatisticsParameters.

Map a feature type (Numeric or Categorical) to each feature name after using the getFeatures query on the Predictor object to obtain the features.

You can find an example of this use-case in the agent .tar file in examples/java/PredictionStatsSparkUtilsExample/src/main/scala/com/datarobot/dr_mlops_spark/Main.scala.

Report metrics

If your prediction environment cannot be network-connected to DataRobot, you can instead use monitoring agent reporting in a disconnected manner.

  1. In the prediction environment, configure the MLOps library to use the filesystem spooler type. The MLOps library will report metrics into its configured directory, e.g., /disconnected/predictions_dir.

  2. Run the monitoring agent on a machine that is network-connected to DataRobot.

  3. Configure the agent to use the filesystem spooler type and receive its input from a local directory, e.g., /connected/predictions_dir.

  4. Migrate the contents of the directory /disconnected/predictions_dir to the connected environment /connected/predictions_dir.

Reports for Scoring Code models

You can also use monitoring agent reporting to send monitoring metrics to DataRobot for downloaded Scoring Code models. Reference an example of this use case in the MLOps agent tarball at examples/java/CodeGenExample.

Monitor a Spark environment

A common use case for the monitoring agent is monitoring scoring in Spark environments where scoring happens in Spark and you want to report the predictions and features to DataRobot. Since Spark usually uses a multi-node setup, it is difficult to use the agent's fileystem spooler channel because a shared consistent file system is uncommon in Spark installations.

To work around this, use a network-based channel like RabbitMQ or AWS SQS. These channels can work with multiple writers and single (or multiple) readers.

The following example outlines how to set up agent monitoring on a Spark system using the MLOps Spark Util module, which provides a way to report scoring results on the Spark framework. Reference the documentation for the MLOpsSparkUtils module in the MLOps Java examples directory at examples/java/SparkUtilsExample/.

The Spark example's source code performs three steps:

  1. Given a scoring JAR file, it scores data and delivers results in a DataFrame.
  2. Merges the feature's DataFrame and the prediction results into a single DataFrame.
  3. Calls the mlops_spark_utils.MLOpsSparkUtils.reportPredictions helper to report the predictions using the merged DataFrame.

You can use mlops_spark_utils.MLOpsSparkUtils.reportPredictions to report predictions generated by any model as long as the function retrieves the data via a DataFrame.

This example uses RabbitMQ as the channel of communication and includes channel setup. Since Spark is a distributed framework, DataRobot requires a network-based channel like RabbitMQ or AWS SQS in order for the Spark workers to be able to send the monitoring data to the same channel regardless of the node the worker is running on.

Spark prerequisites

The following steps outline the prerequisites necessary to execute the Spark monitoring use case.

  1. Run a spooler (RabbitMQ in this example) in a container:

    docker run -d -p 15672:15672 -p 5672:5672 --name rabbitmq-spark-example rabbitmq:3-management
    

    • This Docker command will also run the management console for RabbitMQ.
    • You can access the console via your browser at http://localhost:15672 (username=guest, password=guest).
    • In the console, view the message queue in action when you run the ./run_example.sh script below.
  2. Configure and start the monitoring agent.

    • Follow the quickstart guide provided in the agent tarball.
    • Set up the agent to communicate with RabbitMQ.
    • Edit the agent channel config to match the following:
         - type: "RABBITMQ_SPOOL"
           details: {name: "rabbit", queueUrl: "amqp://localhost:5672", queueName: "spark_example"}
      
  3. If you are using mvn, install the datarobot-mlops JAR into your local mvn repository before testing the examples by running:

    ./examples/java/install_jar_into_maven.sh
    

  4. Create the example JAR files.

  5. If you are using Spark3/Java11: export USE_JAVA11=true
  6. Run make to compile.

  7. Set your JAVA_HOME environment variable.

  8. For Spark2/Java8: export JAVA_HOME=$(/usr/libexec/java_home -v 1.8)
  9. For Spark3/Java11: export JAVA_HOME=$(/usr/libexec/java_home -v 11)

  10. Install and run Spark locally.

  11. Download Spark 3.2.1 built for Hadoop 3.3+ onto your local machine: http://spark.apache.org/downloads.html
  12. Unarchive the tarball: tar xvf ~/Downloads/spark-3.2.*-bin-hadoop3.*.tgz
  13. In the created spark-3.2.*-bin-hadoop3.* directory, start the Spark cluster:
    sbin/start-master.sh -i localhost`
    sbin/start-slave.sh -i localhost -c 8 -m 2G spark://localhost:7077
    
  14. Ensure your installation is successful:
    bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://localhost:7077 --num-executors 1 --driver-memory 512m --executor-memory 512m --executor-cores 1 examples/jars/spark-examples_*.jar 10
    

Spark use case

After meeting the prerequisites outlined above, run the Spark example.

  1. Create the model package and initialize the deployment:

    ./create_deployment.sh
    
    Alternatively, use the DataRobot UI to create an external model package and deploy it.

  2. Set the environment variables for the deployment and the model returned from creating the deployment by copying and pasting them into your shell.

    export MLOPS_DEPLOYMENT_ID=<deployment_id>
    export MLOPS_MODEL_ID=<model_id>
    

  3. Generate predictions and report statistics to DataRobot:

    run_example.sh
    
    You may need to place the Spark bin directory in your $PATH first, i.e.:
    env SPARK_BIN=/opt/ml/spark-3.2.1-bin-hadoop3.*/bin ./run_example.sh
    

  4. If you want to change the spooler type (the communication channel between the Spark job and the monitoring agent):

  5. Edit the Scala code under src/main/scala/com/datarobot/dr_mlops_spark/Main.scala
  6. Modify the following line to contain the required channel configuration:
    val channelConfig = "output_type=rabbitmq;rabbitmq_url=amqp://localhost;rabbitmq_queue_name=spark_example"
    
  7. Recompile the code by running make.

Monitor using the MLOps CLI

MLOps supports a command line interface (CLI) for interacting with the MLOps application. You can use the CLI for most MLOps actions, including creating deployments and model packages, uploading datasets, reporting metrics on predictions and actuals, and more.

Use the MLOps CLI help page for a list of available operations and syntax examples:

mlops-cli [-h]

Monitoring agent vs. MLOps CLI

Like the monitoring agent, the MLOps CLI is also able to post prediction data to the MLOps service, but its usage is slightly different. MLOps CLI is a Python app that can send an HTTP request to the MLOps service with the current contents of the spool file. It does not run the monitoring agent or call any Java process internally, and it does not continuously poll or wait for new spool data; once the existing spool data is consumed, it exits. The monitoring agent, on the other hand, is a Java process that continuously polls for new data in the spool file as long as it is running, and posts that data in an optimized form to the MLOps service.


Updated July 7, 2022
Back to top