Skip to content

Enterprise monitoring guide

This section provides an overview of and links to detailed documentation for monitoring the health, performance, and availability of your DataRobot platform. Effective monitoring is critical for ensuring that your DataRobot installation operates reliably and that any potential issues are identified and resolved quickly.

For general health monitoring, DataRobot provides a series of REST API endpoints that can be integrated with most standard monitoring tools. These endpoints provide status checks for core services and end-to-end test jobs.

Kubernetes Availability Monitor (Kavmon)

DataRobot includes the Kubernetes Availability Monitor (Kavmon), a powerful command-line tool designed to run a comprehensive suite of health checks against your installation.

Observability with OpenTelemetry

DataRobot includes an observability subchart (named datarobot-observability-core) that installs the required agents to observe a DataRobot cluster, which can be configured with a number of observability providers.