Skip to content

title: モデルマネジメント {: #model-management } description: Model Management configuration platform: self-managed-only


モデル管理

Dedicated prediction servers

In DataRobot <=8.X, order to be able to "deploy" DataRobot models to dedicated prediction servers and use advanced model monitoring features like tracking of target or feature drift required separate hosts. In DataRobot 10.0 this functionality is moved into a Kubernetes deployment service and doesn't require extra hosts.

Deployment monitoring functionality (e.g. service health, drift tracking, accuracy tracking, etc.) are also built-in.

Extended drift tracking

Extended Drift Tracking is enabled by default.

Features include the following:

  • Tracking up to 25 features (limit raised from 10 in previous DataRobot versions). This applies to existing deployments after a successful model replacement and to all new deployments.
  • Tracking text features.
  • Removing the previously recommended limit of 5MB for the size of prediction requests that are eligible to be tracked by Model Monitoring.

ファインチューニング

Analytics for deployment monitoring are periodically processed by background jobs, thus they might not become available immediately after predictions are made. By default prediction requests made might take up to 30 seconds to impact the Data Drift / Accuracy results of a particular deployment. A more frequent processing makes statistics available quicker, but comes with a price of a higher I/O and less efficient processing.

To configure these options, refer to the Tuning DataRobot Environment Variables section of this guide.

Example of a customization:

# helm chart values snippet
core:
  config_env_vars:
    MMM_PREDICTIONS_DATA_FLUSH_INTERVAL_SECONDS: 10 

The centerpiece of the monitoring system are the modmon-* services. They make use of DataRobot models metadata and feature information for analytical purposes. For better performance they utilize a model cache. The default cache size is 4.

A bigger cache size allows the service to retain more models in memory and increase the throughput of the modmon-* services, however this may increase the memory footprint of the services.

In most cases due to the async / delayed nature of the work this service is performing, increasing the cache size should be considered only if an intense high-load system with many various models is desired.

In general memory required to load one model is a fraction of an estimate of the model file size, unless it's an import .mlpkg file (Model file size is available under the Describe -> Model Info section of a leaderboard model). Model package files are fully loaded and their size can be used as an approximate estimate.

Example of a customization:

# helm chart values snippet
core:
  config_env_vars:
    # Increased size of the Monitoring model cache size from 4 up to 10.
    MMM_PREDICTIONS_DATA_MODEL_LOADER_CACHE_SIZE: 10 

Down-scaling in order to never keep more than one model can be done by setting the cache size to 1.

Prediction row storage

DataRobot can store prediction request data at the row level for deployments. This involves prediction scoring data and prediction results. Storing prediction request rows enables an organization to request a thorough audit of the predictions they made and use that data to troubleshoot operational issues. For instance, examining the data to understand an anomalous prediction result.

Contact Support to discuss prediction auditing needs or to troubleshoot previously made predictions for deployments with enabled prediction row storage.

To enable the collection of prediction request rows, navigate to the Data Drift Settings modal for a deployment from the Actions menu or during deployment creation by enabling the "Enable prediction rows storage" toggle. Note that during deployment creation this toggle appears under the Inference Data section.

Once toggled on, prediction requests made for that deployment are collected by DataRobot. Note that requests are collected if the scoring data is of a valid data format, that can be interpreted by DataRobot, i.e. a valid CSV/json. Requests with a valid data format that don't satisfy all the input feature requirements of an underlying deployed model, are still eligible.

Data is stored in DataRobot's backend storage.

Disk space requirements: * Expect the stored prediction data to take up to 20-100% of the size of prediction requests made. * For example you are making predictions against a deployment every second using a 10kb CSV file. This means a 10kb * 60 * 60 * 24 = 884MB of prediction requests daily.

It would be recommended to have as much free disk space for each day of collecting data. Due to compression, much less storage may be required.

The following configuration disables data collection for the entire DataRobot cluster and prevents prediction row storage from being enabled on any deployment

# helm chart values snippet
core:
  config_env_vars:
    DISABLE_MMM_PREDICTIONS_DATA_COLLECTION: true 

Data isn't removed automatically. Removal of deployment data can be performed as a part of a deployment perma-deletion, available via an API endpoint /api/v2/deletedDeployments/. Using this endpoint requires a CAN_DELETE_APP_PROJECTS user permission. Visit the API reference for more details.

Deploy modmon in high availability mode

Starting from 9.2 release modmon deployment has 1 replica set for each component by default instead of from 2 to 8 replicas for different components. If a customer uses or requires intensive ModelManangement/MLOps operations modmon deployment should be installed in HA mode.

installer-tools/example_tshirt_size_values/modmon_medium.yaml contains HA setup.

modmon:
  component:
    access-processor:
      replicaCount: 4
    actuals-processor:
      replicaCount: 4
    custom-metrics-processor:
      replicaCount: 2
    worker-predictions-data:
      replicaCount: 8
    worker-scheduled-job:
      replicaCount: 6 

To apply the change to the cluster, run the helm upgrade command to upgrade the release.

Configuring additional OpenTelemetry exporters

You can configure the DataRobot OpenTelemetry (OTel) Collector to send telemetry data (metrics, traces, and logs) to additional destinations besides the default Elasticsearch backend. This is useful for integrating with external monitoring and observability platforms like DataDog. However, it's important to note that only the DataDog and standard OTLP (OpenTelemetry Protocol) exporters are officially supported by DataRobot.

Configuration is managed by setting the OTEL_EXTRA_CONFIG environment variable on the DataRobot-otel-collector deployment. This variable accepts a semicolon-separated list of configuration filenames that are merged on top of the base configuration.


Method 1: Using a pre-built configuration (example: DataDog)

The collector includes pre-built configuration files for common exporters. This example shows how to enable the DataDog exporter.

1. Set Required Credentials

You must provide your DataDog credentials as environment variables for the collector deployment. The required variables are: * DATADOG_API_KEY: Your DataDog API key. * DATADOG_HOST: Your DataDog intake site (e.g., datadoghq.com).

2. Enable the DataDog exporter

Set the OTEL_EXTRA_CONFIG environment variable to use the pre-built DataDog configuration file. Assuming the collector is installed in the mmm-otel-collector namespace, run the following command:

kubectl set env deployment/datarobot-otel-collector \
  -n mmm-otel-collector \
  OTEL_EXTRA_CONFIG=datadog_with_elastic_exporter_config.yaml 

Method 2: Using a custom ConfigMap

For destinations that don't have a pre-built configuration, or for more advanced setups, you can provide a custom configuration snippet via a Kubernetes ConfigMap.

1. Define your custom configuration

You can create or edit the provided ConfigMap to define your own exporters and update the service pipelines. The collector mounts a file named extra-config.yaml from this ConfigMap.

Below is an example that adds an OTLP exporter to send data to an external endpoint defined by environment variables.

apiVersion: v1
kind: ConfigMap
metadata:
  name: {{ include "..fullname" . }}-config
  labels:
    {{- include "..labels" . | nindent 4 }}
data:
  extra-config.yaml: |
    exporters:
      otlp:
        endpoint: ${env:DR_OTEL_OTLP_HTTP_ENDPOINT}
        tls:
          insecure: ${env:DR_OTEL_OTLP_HTTP_INSECURE:-true}

    service:
      pipelines:
        traces/fast:
          exporters: [elasticsearch, otlp]
        traces/default:
          exporters: [elasticsearch, otlp]
        metrics:
          exporters: [elasticsearch, otlp]
        logs:
          exporters: [elasticsearch, otlp] 

2. Enable the custom configuration

To apply your custom settings, add extra-config.yaml to the OTEL_EXTRA_CONFIG environment variable.

kubectl set env deployment/datarobot-otel-collector \
  -n mmm-otel-collector \
  OTEL_EXTRA_CONFIG=extra-config.yaml 

Combining multiple configurations

To enable multiple extra configurations simultaneously, provide a semicolon-separated list of filenames in the OTEL_EXTRA_CONFIG variable.

For example, to enable both the pre-built DataDog exporter and your custom OTLP exporter, run:

kubectl set env deployment/datarobot-otel-collector \
  -n mmm-otel-collector \
  "OTEL_EXTRA_CONFIG=datadog_with_elastic_exporter_config.yaml;extra-config.yaml" 

Important: configuration merging behavior

The OpenTelemetry Collector merges the configuration files. This means that settings from a later file overrides settings from earlier ones. When you define a service pipeline in your custom configuration, it replaces the entire pipeline definition from the base config.

To ensure that DataRobot's internal monitoring continues to function correctly, you must always include elasticsearch in your list of exporters for any pipeline you modify. Failure to do so stops telemetry data from being sent to the platform's default backend.

```