Serverless Predictions¶

Default "serverless compute" prediction environment¶

A default Serverless prediction environment (named "Serverless Compute") is created automatically for new organizations when CREATE_DEFAULT_SERVERLESS_PREDICTION_ENVIRONMENT setting is enabled (default: enabled) in the System Configuration > Predictions section.

設定¶

Modify your values.yaml:

core:
  config_env_vars:
    CREATE_DEFAULT_SERVERLESS_PREDICTION_ENVIRONMENT: "True"

Note This only impacts newly created organizations and thus not existing organizations. Those organizations must create a serverless environment themselves (Console - Prediction environments - Add prediction environment).

Deployment auto-stopping policy¶

Idle inference servers for deployments are auto-stopped (K8s resources are auto-scaled to zero) if INFERENCE_SERVER_AUTO_STOPPING setting is enabled (default: disabled) in the System Configuration > Predictions section. For the enabled auto-stopping policy to have an effect, the auto-stop job must also be enabled and configured.

設定¶

Modify your values.yaml:

core:
  config_env_vars:
    INFERENCE_SERVER_AUTO_STOPPING: "True"

pred-environments-api:
  jobs:
    auto_stop:
      schedule: "*/15 * * * *"
      enabled: true
      grpc_secure: false
      dry_run: false
      inactivity: 1800

Inference server memory for custom model deployments¶

The default requested memory in MB for Inference Server of a custom model deployment is controlled by INFERENCE_SERVER_MEMORY_FOR_CUSTOM_DEPLOYMENTS setting (default: 512) in the System Configuration > Predictions section.

設定¶

Modify your values.yaml:

core:
  config_env_vars:
    INFERENCE_SERVER_MEMORY_FOR_CUSTOM_DEPLOYMENTS: 512

Number of threads for custom model deployments¶

The number of threads per uWSGI worker in the Inference Server of a custom model deployment is controlled by INFERENCE_SERVER_THREADS_FOR_CUSTOM_DEPLOYMENTS setting (default: 50) in the System Configuration > Predictions section. If value is 0, only process workers are used.

設定¶

Modify your values.yaml:

core:
  config_env_vars:
    INFERENCE_SERVER_THREADS_FOR_CUSTOM_DEPLOYMENTS: 50