Skip to content

Serverless Predictions

Default "Serverless Compute" prediction environment

A default Serverless prediction environment (named "Serverless Compute") is created automatically for new organizations when CREATE_DEFAULT_SERVERLESS_PREDICTION_ENVIRONMENT setting is enabled (default: enabled) in the System Configuration > Predictions section.

設定

  • Modify your values.yaml:
core:
  config_env_vars:
    CREATE_DEFAULT_SERVERLESS_PREDICTION_ENVIRONMENT: "True" 

Note This will only impact newly created organizations and thus not existing organizations. Those organizations will have to create a serverless environment themselves (Console - Prediction environments - Add prediction environment).

Deployment auto-stopping policy

Idle inference servers for deployments are auto-stopped (K8s resources are auto-scaled to zero) if INFERENCE_SERVER_AUTO_STOPPING setting is enabled (default: disabled) in the System Configuration > Predictions section. For the enabled auto-stopping policy to have an effect, the auto-stop job must also be enabled and configured.

設定

  • Modify your values.yaml:
core:
  config_env_vars:
    INFERENCE_SERVER_AUTO_STOPPING: "True"

pred-environments-api:
  jobs:
    auto_stop:
      schedule: "*/15 * * * *"
      enabled: true
      grpc_secure: false
      dry_run: false
      inactivity: 1800 

Inference server memory for custom model deployments

The default requested memory in MB for Inference Server of a custom model deployment is controlled by INFERENCE_SERVER_MEMORY_FOR_CUSTOM_DEPLOYMENTS setting (default: 512) in the System Configuration > Predictions section.

設定

  • Modify your values.yaml:
core:
  config_env_vars:
    INFERENCE_SERVER_MEMORY_FOR_CUSTOM_DEPLOYMENTS: 512 

Number of threads for custom model deployments

The number of threads per uWSGI worker in the Inference Server of a custom model deployment is controlled by INFERENCE_SERVER_THREADS_FOR_CUSTOM_DEPLOYMENTS setting (default: 50) in the System Configuration > Predictions section. If value is 0, we keep using process workers only.

設定

  • Modify your values.yaml:
core:
  config_env_vars:
    INFERENCE_SERVER_THREADS_FOR_CUSTOM_DEPLOYMENTS: 50