LRS Auto-scaling with KEDA¶

概要¶

DataRobot's Long Running Service (LRS) auto-scaling feature provides dynamic horizontal scaling capabilities for custom model deployments. This feature leverages KEDA (Kubernetes Event-Driven Autoscaling) and the KEDA HTTP Add-on to automatically scale LRS instances based on HTTP workload and CPU usage, including the ability to scale to zero for cost optimization.

The auto-scaling functionality provides a standardized scaling approach that enables:

CPU-based scaling: Scale based on CPU utilization thresholds
HTTP-based scaling: Scale based on HTTP request metrics
Scale-to-zero: Automatically scale down to zero replicas during periods of inactivity
Cost optimization: Reduce infrastructure costs by only running resources when needed

Note: This feature is initially available for deployed custom models on LRS and not for other LRS-based workloads.

前提条件¶

Before enabling LRS auto-scaling, the following components must be installed and configured in your Kubernetes cluster:

Required components¶

KEDA (Kubernetes Event-Driven Autoscaling)
Version 2.17 or later recommended
Must be installed cluster-wide
https://github.com/kedacore/keda?tab=readme-ov-file#deploying-keda
KEDA HTTP Add-on
Compatible with your KEDA version
Provides HTTP-specific scaling capabilities
https://github.com/kedacore/http-add-on/?tab=readme-ov-file#installation

Installation requirements¶

The KEDA and KEDA HTTP Add-on must be installed by your platform administrator before enabling auto-scaling features. Contact your DataRobot administrator or refer to the KEDA documentation for installation instructions.

Configuration values for KEDA-Http-Add-On¶

When installing the KEDA HTTP Add-on, ensure the following default values are configured:

keda-add-ons-http:
    interceptor:
      responseHeaderTimeout: 600s
      replicas:
        waitTimeout: 300s

This configuration ensures proper handling of long-running requests and prevents premature timeouts

See additional configuration options here https://github.com/kedacore/charts/blob/main/http-add-on/Chart.yaml

Configuration values for LRS¶

These are the default values set in the LRS chart

lrs-operator:
   keda:
      enabled: false
      kedaNamespace: "keda-operator"
      kedaHTTPNamespace: "keda-http-addon"
      proxyServiceName: "keda-add-ons-http-interceptor-proxy"
      proxyServicePort: "8080"
      externalScalerName: "keda-add-ons-http-external-scaler"
      externalScalerPort: "9090"
   operator:
      ...
      config:
         scaledObjectCooldownSeconds: 86400 #  24 hours
      ...

To enable scaling, set lrs-operator.keda.enabled to true. Make sure to update the rest of the values to match your installation of keda and keda-http-add-on so the LRS controller knows which namespace the components are installed and what they're called. Set scaledObjectCooldownSeconds: 604800 which translates to scaling to zero after 7 days of idle time

ネットワーク¶

The http-add-on brings a proxy component which records metrics and routes the traffic from the ingress to the LRS. it's therefore crucial that the http-add-on has access to send traffic to LRS pods.