Skip to content

User-Metrics-Collector

This service collects Datarobot Deployment metrics, eg cpu/mem/network/disk, as well as NIM (Nvidia) metrics if the Datarobot Deployment is GPU enabled.

User-metrics-collector uses a cluster role binding that may not be allowed in certain onprem cases. The reason it needs this binding is that the source of truth of container and pod metrics is from the k8s nodes themselves.

Modes of Operation: * Metrics being collected from kubelet API. Via cluster role binding, and the default mode. * Pros: works in all the clusters. * Cons: Requires read-only access to Node resources acquired via cluster role bindings. * Metrics server. * Pros: does not require elevated permissions. * Cons: Won't work in environments without metrics server enabled. Metrics are basic and many not available. * No metrics. * Cons: Deployment metrics are not available.

If it is found that the deployment has failed or the cluster role binding is not allowed, please reconfigure user-metrics-collector to use metrics-server or disable it all together.

This can be accomplished in the normal way with values.yaml

user-metrics-collector:
  enabled: true # Set false to completely disable the service
  clusterRoleCreation: true # Set false to change to metrics-server as source of truth of metrics if cluster role not allowed