Skip to content

LRS Auto-Scaling with KEDA

概要

DataRobot's Long Running Service (LRS) auto-scaling feature provides dynamic horizontal scaling capabilities for custom model deployments. This feature leverages KEDA (Kubernetes Event-Driven Autoscaling) and the KEDA HTTP Add-on to automatically scale LRS instances based on HTTP workload and CPU usage, including the ability to scale to zero for cost optimization.

The auto-scaling functionality provides a standardized scaling approach that enables:

  • CPU-based scaling: Scale based on CPU utilization thresholds
  • HTTP-based scaling: Scale based on HTTP request metrics
  • Scale-to-zero: Automatically scale down to zero replicas during periods of inactivity
  • Cost optimization: Reduce infrastructure costs by only running resources when needed

Note: This feature is initially available for deployed custom models on LRS and not for other LRS-based workloads.

前提条件

Before enabling LRS auto-scaling, the following components must be installed and configured in your Kubernetes cluster:

Required Components

  1. KEDA (Kubernetes Event-Driven Autoscaling)
  2. Version 2.17 or later recommended
  3. Must be installed cluster-wide
  4. https://github.com/kedacore/keda?tab=readme-ov-file#deploying-keda

  5. KEDA HTTP Add-on

  6. Compatible with your KEDA version
  7. Provides HTTP-specific scaling capabilities
  8. https://github.com/kedacore/http-add-on/?tab=readme-ov-file#installation

Installation Requirements

The KEDA and KEDA HTTP Add-on must be installed by your platform administrator before enabling auto-scaling features. Contact your DataRobot administrator or refer to the KEDA documentation for installation instructions.

Configuration Values for KEDA-Http-Add-On

When installing the KEDA HTTP Add-on, ensure the following default values are configured:

keda-add-ons-http:
    interceptor:
      responseHeaderTimeout: 600s
      replicas:
        waitTimeout: 300s 

This configuration ensures proper handling of long-running requests and prevents premature timeouts

See additional configuration options here https://github.com/kedacore/charts/blob/main/http-add-on/Chart.yaml

Configuration Values for LRS

These are the default values set in the LRS chart

lrs-operator:
   keda:
      enabled: false
      kedaNamespace: "keda-operator"
      kedaHTTPNamespace: "keda-http-addon"
      proxyServiceName: "keda-add-ons-http-interceptor-proxy"
      proxyServicePort: "8080"
      externalScalerName: "keda-add-ons-http-external-scaler"
      externalScalerPort: "9090"
   operator:
      ...
      config:
         scaledObjectCooldownSeconds: 86400 #  24 hours
      ... 

To enable scaling please make sure lrs-operator.keda.enabled is set to true Make sure to update the rest of the values to match your installation of keda and keda-http-add-on so the LRS controller knows which namespace the components are installed and what the are called. We recommend setting scaledObjectCooldownSeconds: 604800 which translates to scaling to zero after 7 days of idle time

ネットワーク

The http-add-on brings a proxy component which records metrics and routes the traffic from the ingress to the LRS. It is therefore crucial that the http-add-on has access to send traffic to LRS pods.