Serverless Predictions¶
Default "Serverless Compute" prediction environment¶
A default Serverless prediction environment (named "Serverless Compute") is created automatically for new organizations when CREATE_DEFAULT_SERVERLESS_PREDICTION_ENVIRONMENT setting is enabled (default: enabled) in the System Configuration > Predictions section.
Configuration¶
- Modify your
values.yaml:
core:
config_env_vars:
CREATE_DEFAULT_SERVERLESS_PREDICTION_ENVIRONMENT: "True"
Note This will only impact newly created organizations and thus not existing organizations. Those organizations will have to create a serverless environment themselves (Console - Prediction environments - Add prediction environment).
Deployment auto-stopping policy¶
Idle inference servers for deployments are auto-stopped (K8s resources are auto-scaled to zero) if INFERENCE_SERVER_AUTO_STOPPING setting is enabled (default: disabled) in the System Configuration > Predictions section. For the enabled auto-stopping policy to have an effect, the auto-stop job must also be enabled and configured.
Configuration¶
- Modify your
values.yaml:
core:
config_env_vars:
INFERENCE_SERVER_AUTO_STOPPING: "True"
pred-environments-api:
jobs:
auto_stop:
schedule: "*/15 * * * *"
enabled: true
grpc_secure: false
dry_run: false
inactivity: 1800
Inference server memory for custom model deployments¶
The default requested memory in MB for Inference Server of a custom model deployment is controlled by INFERENCE_SERVER_MEMORY_FOR_CUSTOM_DEPLOYMENTS setting (default: 512) in the System Configuration > Predictions section.
Configuration¶
- Modify your
values.yaml:
core:
config_env_vars:
INFERENCE_SERVER_MEMORY_FOR_CUSTOM_DEPLOYMENTS: 512
Number of threads for custom model deployments¶
The number of threads per uWSGI worker in the Inference Server of a custom model deployment is controlled by INFERENCE_SERVER_THREADS_FOR_CUSTOM_DEPLOYMENTS setting (default: 50) in the System Configuration > Predictions section. If value is 0, we keep using process workers only.
Configuration¶
- Modify your
values.yaml:
core:
config_env_vars:
INFERENCE_SERVER_THREADS_FOR_CUSTOM_DEPLOYMENTS: 50