LRS Auto-Scaling with KEDA¶
概要¶
DataRobot's Long Running Service (LRS) auto-scaling feature provides dynamic horizontal scaling capabilities for custom model deployments. This feature leverages KEDA (Kubernetes Event-Driven Autoscaling) and the KEDA HTTP Add-on to automatically scale LRS instances based on HTTP workload and CPU usage, including the ability to scale to zero for cost optimization.
The auto-scaling functionality provides a standardized scaling approach that enables:
- CPU-based scaling: Scale based on CPU utilization thresholds
- HTTP-based scaling: Scale based on HTTP request metrics
- Scale-to-zero: Automatically scale down to zero replicas during periods of inactivity
- Cost optimization: Reduce infrastructure costs by only running resources when needed
Note: This feature is initially available for deployed custom models on LRS and not for other LRS-based workloads.
前提条件¶
Before enabling LRS auto-scaling, the following components must be installed and configured in your Kubernetes cluster:
Required Components¶
- KEDA (Kubernetes Event-Driven Autoscaling)
- Version 2.17 or later recommended
- Must be installed cluster-wide
-
https://github.com/kedacore/keda?tab=readme-ov-file#deploying-keda
-
KEDA HTTP Add-on
- Compatible with your KEDA version
- Provides HTTP-specific scaling capabilities
- https://github.com/kedacore/http-add-on/?tab=readme-ov-file#installation
Installation Requirements¶
The KEDA and KEDA HTTP Add-on must be installed by your platform administrator before enabling auto-scaling features. Contact your DataRobot administrator or refer to the KEDA documentation for installation instructions.
Configuration Values for KEDA-Http-Add-On¶
When installing the KEDA HTTP Add-on, ensure the following default values are configured:
keda-add-ons-http:
interceptor:
responseHeaderTimeout: 600s
replicas:
waitTimeout: 300s
This configuration ensures proper handling of long-running requests and prevents premature timeouts
See additional configuration options here https://github.com/kedacore/charts/blob/main/http-add-on/Chart.yaml
Configuration Values for LRS¶
These are the default values set in the LRS chart
lrs-operator:
keda:
enabled: false
kedaNamespace: "keda-operator"
kedaHTTPNamespace: "keda-http-addon"
proxyServiceName: "keda-add-ons-http-interceptor-proxy"
proxyServicePort: "8080"
externalScalerName: "keda-add-ons-http-external-scaler"
externalScalerPort: "9090"
operator:
...
config:
scaledObjectCooldownSeconds: 86400 # 24 hours
...
To enable scaling please make sure lrs-operator.keda.enabled is set to true
Make sure to update the rest of the values to match your installation of keda and keda-http-add-on so the LRS controller knows which namespace the components are installed and what the are called.
We recommend setting scaledObjectCooldownSeconds: 604800 which translates to scaling to zero after 7 days of idle time
ネットワーク¶
The http-add-on brings a proxy component which records metrics and routes the traffic from the ingress to the LRS. It is therefore crucial that the http-add-on has access to send traffic to LRS pods.