Skip to content

Generative AI Service

Enabling the service

In order to enable the service, edit the values.yaml file. Set enabled: true for two services (core and LLM Gateway), and set up object storage:

buzok-onprem:
  # -- Enable Generative AI core sub-chart
  enabled: true

buzok-llm-gateway:
  # -- Enable Generative AI LLM Gateway sub-chart
  enabled: true

Additionally, enable LLM_GATEWAY feature flag (a.k.a. "Enable consumption based pricing access to LLMs in production") in core configuration env by adding following env var (IMPORTANT: ENABLE_LLM_GATEWAY should be set to false for STS customers with DR-provided credentials without consumption based pricing SKU purchased. Please check with Customer Support regarding the details of the customer deal):

core:
  config_env_vars:
    ENABLE_LLM_GATEWAY: true

Please refer to the next section for large language model (LLM) access configuration.

Storage and Encryption configuration

The Generative AI Service uses global DataRobot settings. For details please refer to the Object Storage Configuration guide.

If you define a FILE_STORAGE_PREFIX in global DataRobot settings, in all storage types Generative AI Service will use that prefix and append genai/ to it when storing its files.

Large Language Models (LLM) Configuration

The Generative AI (GenAI) service supports managed LLMs from the following providers: Azure OpenAI, Amazon Bedrock, Google Vertex AI, Anthropic, Cerebras, TogetherAI. In order to enable these models, you need to provision LLM resources using specified providers.

Please refer to the LLM vendor documentation for provisioning LLM resources (links are subject to change over time):

Once LLMs are configured on the provider side, set up provider credentials in the GenAI service (credentials used to access LLMs in Azure/GCP/AWS/etc). For customer-managed credentials (always used for self-managed, and often, single tenant SaaS installations)—to provide the most flexibility and security—it is recommended to use secure configurations in DataRobot. In certain cases, for instance a single tenant SaaS environment with DataRobot-managed credentials, a DataRobot admin sets up credentials directly in values.yaml.

Set up credentials via secure configurations

Prior to installation, ensure the respective provider is enabled under the llmGateway configuration section within the buzok-llm-gateway sub-chart, for example:

buzok-llm-gateway:
  enabled: true
  llmGateway:
    providers:
      <supported-provider>:
        enabled: true
        credentialsSecureConfigName: genai-<provider>-llm-credentials

NOTE: credentialsSecureConfigName: genai-<provider>-llm-credentials is only needed for when Secure Configuration setup is used.

List of supported providers and corresponding LLM credentials secure config names:

anthropic:
  credentialsSecureConfigName: genai-anthropic-llm-credentials
aws:
  credentialsSecureConfigName: genai-aws-llm-credentials
azure:
  credentialsSecureConfigName: genai-azure-llm-credentials
cerebras:
  credentialsSecureConfigName: genai-cerebras-llm-credentials
google:
  credentialsSecureConfigName: genai-gcp-llm-credentials
togetherai:
  credentialsSecureConfigName: genai-togetherai-llm-credentials

NOTE: The following providers secure configs may be listed but are not yet supported: - cohere (genai-cohere-llm-credentials) - openai (genai-openai-llm-credentials) - groq (genai-groq-llm-credentials)

After installation is complete, perform the following steps to provide model credentials to DataRobot Generative AI:

  • Format access credentials into a JSON using the structure described in the next sections (LLM provider dependent)
  • In the DataRobot Secure Configuration admin UI, create new Secure Configuration with a name and type described in the next sections azure llm deployments
  • Paste formatted JSON credentials into the configuration value
  • Share the newly created Secure Configuration with the GenAI Admin system user (genai-admin@datarobot.com), which is automatically created during installation. The configuration must be shared with either Owner or Editor role azure llm deployments

The Generative AI Service caches credentials retrieved from Secure Configuration for a short period of time; when rotating these credentials it may take a couple of minutes for the new credentials to be used.

Set up credentials directly during installation

NOTE: In this case, updating credentials requires reinstalling/updating DataRobot and the credentials will be visible for the person performing the installation. This is considered a less secure and flexible way of setting up credentials for the GenAI service.

Credentials can be specified directly in the buzok-llm-gateway configuration section. Secure configuration won't be used in this case. The example below is how a configuration for AWS credentials (exact keys differ for various providers, see provider-specific sections below for credentials structure details):

buzok-llm-gateway:
  enabled: true
  llmGateway:
    providers:
      aws:
        enabled: true
        credentialsSecureConfigName: ""
        credentials:
          endpoints:
          - region: "<aws_region>"
            access_key_id: "<aws_access_key_id>"
            secret_access_key: "<aws_secret_access_key>"

Azure OpenAI Models

Azure OpenAI models must be configured with specific deployment names, where deployment name can be equal to the model name. For example, a deployment for the gpt-35-turbo-16k model can be named gpt-35-turbo-16k. Note that deployment names can be different from the actual model name, such as creating a deployment named gpt-4o-2024-11-20 for the model gpt-4o.

azure llm deployments

After configuring Azure OpenAI LLM deployment, format the credentials using the following JSON structure:

{
    "endpoints": [
        {
            "region": "<region-code>",
            "api_type": "azure",
            "api_base": "https://<your-llm-deployment-endpoint>.openai.azure.com/",
            "api_version": "2024-10-21",
            "api_key": "<your-api-key>"
        }
    ]
}

When creating a Secure Configuration with these credentials, use the [GenAI] Azure OpenAI LLM Credentials configuration type, and use genai-azure-llm-credentials as the configuration name.

NOTE: "2024-10-21" is the latest stable version of API at the moment of writing this guide. To use latest models, you may require to use preview version of API. Please refer to the official Azure API documentation for details on Azure Inference API versions.

The provider Secure Configuration section should look like:

buzok-llm-gateway:
  enabled: true
  llmGateway:
    providers:
      azure:
        enabled: true
        credentialsSecureConfigName: genai-azure-llm-credentials

Amazon Bedrock Models

The Generative AI Service supports the various first-party (e.g. Amazon Nova) and third-party (e.g. Anthropic Claude, Meta Llama, and Mistral) models from Amazon Bedrock.

There are two options of setting up Amazon Bedrock models: using IAM roles and using static AWS credentials.

Using IAM roles (IRSA) is recommended as a more secure option becaue it uses dynamic short-lived AWS credentials. However, if your Kubernetes cluster is not in AWS cloud, setting up this authorization mechanism will require setting up Workload Identity Federation. For this reason, the Generative AI Service also supports using static AWS credentials. For security reasons, when using static credentials, it is recommended to create a separate AWS user that only has permissions to access AWS Bedrock.

After enabling model access in AWS Bedrock, format the credentials using the following JSON structure for using static credentials:

{
    "endpoints": [
        {
            "region": "<your-aws-region, e.g. us-east-1>",
            "access_key_id": "<your-aws-key-id>",
            "secret_access_key": "<your-aws-secret-access-key>",
            "session_token": null,
        }
    ]
}

Use the following JSON structure when using a role different from IRSA (ensure there is a policy that allows IRSA to assume this role):

{
    "endpoints": [
        {
            "region": "<your-aws-region, e.g. us-east-1>",
            "role_arn": "<your-aws-role-arn>"
        }
    ]
}

Using IRSA to access Bedrock is available in 11.1.2 and later. Set "use_web_identity": true to use AssumeRoleWithWebIdentity from the pod's service account token.

{
    "endpoints": [
        {
            "region": "<your-aws-region, e.g. us-east-1>",
            "role_arn": "<your-aws-role-arn>",
            "use_web_identity": true
        }
    ]
}

Using FIPS-enabled endpoints is available in 11.1.2 and later. Set "use_fips_endpoint": true to do that. Note, that model availability may vary between FIPS-enabled and regular endpoints.

{
    "endpoints": [
        {
            "region": "<your-aws-region, e.g. us-east-1>",
            "role_arn": "<your-aws-role-arn>",
            "use_fips_endpoint": true
        }
    ]
}

When creating a Secure Configuration with these credentials, use the [GenAI] AWS Bedrock LLM Credentials configuration type, and use genai-aws-llm-credentials as the configuration name.

The provider Secure Configuration section should look like:

buzok-llm-gateway:
  enabled: true
  llmGateway:
    providers:
      aws:
        enabled: true
        credentialsSecureConfigName: genai-aws-llm-credentials

Google Vertex AI Models

The Generative AI Service supports multiple models from Google Vertex AI, including first-party Gemini models and third-party Claude/Llama/Mistral models.

After provisioning the model, you should receive a json file with access credentials. Format contents of this file into the following JSON structure to provide credentials to DataRobot Generative AI:

{
    "endpoints": [
        {
            "region": "us-central1",
            "service_account_info": {
                "type": "service_account",
                "project_id": "<your-project-id>",
                "private_key_id": "<your-private-key>",
                "private_key": "----- <your-private-key>-----\n",
                "client_email": "<your-email>.iam.gserviceaccount.com",
                "client_id": "<your-client-id>",
                "auth_uri": "https://accounts.google.com/o/oauth2/auth",
                "token_uri": "https://oauth2.googleapis.com/token",
                "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
                "client_x509_cert_url": "https://<your-cert-url>.iam.gserviceaccount.com",
                "universe_domain": "googleapis.com"
            }
        }
    ]
}

When creating a Secure Configuration with these credentials, use the [GenAI] Google VertexAI LLM Credentials configuration type, and use genai-gcp-llm-credentials as the configuration name.

The provider Secure Configuration section should look like:

buzok-llm-gateway:
  enabled: true
  llmGateway:
    providers:
      google:
        enabled: true
        credentialsSecureConfigName: genai-gcp-llm-credentials

Anthropic Models

The Generative AI Service provides first-party integration with Anthropic, giving access to various Claude models.

To configure access to Anthropic, the API Key has to be provided:

{
    "endpoints": [
        {
            "region": "us",
            "api_key": "<your-anthropic-api-key>"
        }
    ]
}

When creating a Secure Configuration with these credentials, use the [GenAI] Anthropic LLM Credentials configuration type, and use genai-anthropic-llm-credentials as the configuration name.

The provider Secure Configuration section should look like:

buzok-llm-gateway:
  enabled: true
  llmGateway:
    providers:
      anthropic:
        enabled: true
        credentialsSecureConfigName: genai-anthropic-llm-credentials

Cerebras Models

The Generative AI Service provides integration with Cerebras, giving access to high-performance inference models.

To configure access to Cerebras, sign in to the Cerebras platform and create a new API key. Format the credentials using the following JSON structure:

{
    "endpoints": [
        {
            "region": "global",
            "api_key": "<your-cerebras-api-key>"
        }
    ]
}

When creating a Secure Configuration with these credentials, use the [GenAI] Cerebras LLM Credentials configuration type, and use genai-cerebras-llm-credentials as the configuration name.

The provider Secure Configuration section should look like:

buzok-llm-gateway:
  enabled: true
  llmGateway:
    providers:
      cerebras:
        enabled: true
        credentialsSecureConfigName: genai-cerebras-llm-credentials

TogetherAI Models

The Generative AI Service provides integration with TogetherAI, offering access to a wide range of open-source language models.

To configure access to TogetherAI, sign in to the TogetherAI console and create a new API key. Format the credentials using the following JSON structure:

{
    "endpoints": [
        {
            "region": "global",
            "api_key": "<your-togetherai-api-key>"
        }
    ]
}

When creating a Secure Configuration with these credentials, use the [GenAI] TogetherAI LLM Credentials configuration type, and use genai-togetherai-llm-credentials as the configuration name.

The provider Secure Configuration section should look like:

buzok-llm-gateway:
  enabled: true
  llmGateway:
    providers:
      togetherai:
        enabled: true
        credentialsSecureConfigName: genai-togetherai-llm-credentials

LLM Catalog updates

Occasionally it may be required to update some information on LLMs in a catalog. For instance, set retirement_date or modify available_regions. In this release it can be done by manually updating buzok-llm-gateway-catalog config map in Kubernetes cluster after deployment. Once configmap is updated, please restart buzok-llm-gateway-app deployment for changes to be applied.

NOTE: do not modify LLM Catalog unless absolutely necessary. It's very easy to unintentionally break things.

NOTE: since changes are done to configmap in Kubernetes, they are not persisted during redeploy. Make sure to get and keep and copy of modified Kubernetes resource and reapply it after redeployment.

Refreshing LLM Catalog cache

The LLM Catalog cache has a TTL (time-to-live) of 1 hour and refreshes automatically every hour. If you need to manually refresh the LLM catalog cache, such as after configuring new LLM provider credentials, you can do so by restarting the buzok-llm-gateway-app deployment in your Kubernetes cluster. This will force the service to update the LLM Catalog data using the latest configured LLM credentials.

Disabling internet dependent features

The Generative AI service in DataRobot, by default, relies on internet connectivity to interact with various Language Model (LLM) types for chat prompting. However, in scenarios where internet access is restricted, the availability of specific LLM types can be managed by organization administrators.

Administrators have the ability to disable selected LLM types via DataRobot UI. Each LLM type is controlled by a dedicated feature flag. A few examples of such flags are listed below, but full list is available under Generative AI section on Datarobot Feature Flags configuration UI.

  • Disable Azure OpenAI GPT-3.5 Turbo (DISABLE_LLM_AZURE_OPENAI_GPT_35_TURBO): Controls access to Azure OpenAI GPT-3.5 Turbo.
  • Disable Anthropic Claude 2 (DISABLE_LLM_AWS_ANTHROPIC_CLAUDE_2): Controls access to Anthropic Claude 2.
  • Disable Google Bison (DISABLE_LLM_GCP_BISON): Controls access to Google Bison.

All of the above LLM types require internet access. When disabled, the respective LLMs will not be available in the LLM selection list when configuring an LLM blueprint in a playground.

The only LLM model type that does not require internet access is the custom model type. Access to custom models is also controlled by a dedicated "Disable Custom Model LLMs in the LLM playground" (DISABLE_LLM_CUSTOM_MODEL) feature flag. This can be disabled like any other LLM type if necessary.

NOTE: As of 11.1.0 release, there's a known bug with LLM Gateway /catalog/ API route where "404 Not Found" error is returned for the case when all LLM feature flags are disabled and all external LLM providers are disabled in the chart. A workaround for this is to enable at least one LLM feature flag, such as LLM_AZURE_OPENAI_GPT_4_O, which will fix the error and allow LLM Gateway service to return list of supported LLMs (i.e., enabled provider LLMs + validated BYO LLM/Deployed LLMs).

Connected Vector Database (VDB) Configuration

The Generative AI Service supports connected VDBs from the following vendors: Pinecone, Elasticsearch. The connection is configured when the VDB is created by the user supplying an API token credential for Pinecone or a basic or API token credential and a URL or cloud ID for their Elasticsearch instance.

By default the Generative AI service has network egress access to the necessary endpoints, but if there is a network configuration restricting this access one or more of the following patterns would need to be added to an allowlist: 1. *.pinecone.io and *.*.*.pinecone.io for Pinecone 2. *.*.*.elastic-cloud.com for Elasticsearch Cloud instances 3. URL for an on premise Elasticsearch instance 4. URL for a hosted or managed Elasticsearch instance

Static Files Migration

The Generative AI Service uses embedding models for the creation of Vector Databases and the augmentation of LLM queries. The service expects the weights for these models in blob storage. During the deployment of the service, a migration job is executed, downloading a tar archive (and checksum file) with the weights of the embedding models from DataRobot CDN and stores them in the blob storage.

For offline installations without access to the internet, the deployment requires additional steps from you.

Install in offline clusters

You have two options:

  1. You disable the migration job to run during the deployment, upload the files to a worker pod and trigger the migration script manually.
  2. You make the files available to the migration job through a self-hosted HTTP server, serving the files to the migration job.

Option 1: Execute the migration script manually

  1. Find the static files archive URL in the chart:
> helm template /path/to/datarobot/chart.tgz --set buzok-onprem.enabled=true | grep "genai.cdn"
     - https://genai.cdn.app.datarobot.com/weights/datarobot-genai-weights-2024-01-12-18.tar.xz
  1. Disable the migration job in the values.yamlfile.

    buzok-onprem:
      buzok-worker:
        jobs:
          job-migration-static-files:
            enable: false
    
  2. Install DataRobot including the Generative AI service.

  3. Download the tar archive and the checksum file to your machine. The checksum URL is the archive URL with an appended .sha256sum.

    export ARCHIVE_URL="THE_ARCHIVE_URL_FROM_STEP_ONE"
    
    curl -fLo datarobot-genai-weights.tar.xz $ARCHIVE_URL
    curl -fLo datarobot-genai-weights.tar.xz.sha256sum $ARCHIVE_URL.sha256sum
    
  4. Find a running worker pod.

     kubectl -n <k8s-namespace> -l role=buzok-worker-app-io get pods
    
  5. Copy the downloaded files into the worker pod.

    export POD_NAME="WORKER_POD_NAME_FROM_THE_PREVIOUS_STEP"
    kubectl -n <k8s-namespace> cp datarobot-genai-weights.tar.xz $POD_NAME:/app/ephemeral
    kubectl -n <k8s-namespace> cp datarobot-genai-weights.tar.xz.sha256sum $POD_NAME:/app/ephemeral
    
  6. Execute the migration script:

    kubectl exec -n <k8s-namespace> $POD_NAME -- bash engine/docker_entrypoint.sh python -m worker.scripts.migrate_static_files with-this-file /app/ephemeral/datarobot-genai-weights.tar.xz
    

Option 2: Make files accessible through a self-hosted HTTP server

For this method you will need an HTTP server that is accessible from pods running in the Kubernetes cluster. It can be the machine you use to install DataRobot, or some other machine accessible by the cluster. Be aware that you could run into networking issues, and you are advised to verify that pods can access your HTTP server before you proceed with the installation.

  1. Find the static files archive URL in the chart:
   > helm template /path/to/datarobot/chart.tgz --set buzok-onprem.enabled=true | grep "genai.cdn"
      - https://genai.cdn.app.datarobot.com/weights/datarobot-genai-weights-2024-01-12-18.tar.xz
  1. Download the tar archive and the checksum file to your HTTP server. The checksum URL is the archive URL with an appended .sha256sum.

    export ARCHIVE_URL="THE_ARCHIVE_URL_FROM_STEP_ONE"
    
    curl -fLo datarobot-genai-weights.tar.xz $ARCHIVE_URL
    curl -fLo datarobot-genai-weights.tar.xz.sha256sum $ARCHIVE_URL.sha256sum
    
  2. If you don't have an HTTP server running yet, you can start one e.g. via python3 -m http.server 8080.

  3. (Optional) Test if pods can access your HTTP server.
  4. Override the staticFilesArchiveUrl in your values.yaml file.

    buzok-onprem:
      buzok-worker:
        staticFilesArchiveUrl: "YOUR_NEW_URL"
    
  5. Proceed with the installation of the Generative AI service.

  6. (Optional) Remove the HTTP server or the files from it. After the installation you don't have to serve them anymore.

Troubleshooting Static Files Migration

If the static file migration job fails, or when you change the file storage settings (for example, storage prefix), you need to force the job to rerun by taking the following steps. This section assumes your cluster has internet access; if this is not the case, please refer to the options mentioned above.

  1. Find the static files archive URL in the chart:
   > helm template /path/to/datarobot/chart.tgz --set buzok-onprem.enabled=true | grep "genai.cdn"
      - https://genai.cdn.app.datarobot.com/weights/datarobot-genai-weights-2024-01-12-18.tar.xz
  1. Find a CPU worker pod:

    kubectl -n <k8s-namespace> -l role=buzok-worker-app-io get pods
    
  2. Exec into the pod and force re-run the migration:

kubectl exec -n <k8s-namespace> -it <pod-name> -- bash engine/docker_entrypoint.sh bash

# Run the following command inside the pod
FORCE_STATIC_FILE_MIGRATION=true python -m worker.scripts.migrate_static_files with-this-url <datarobot-genai-weights-file-url>

Migration of Execution Environments

Generative AI supports generation of custom models, but this requires execution environments installed on a cluster. Execution environments are automatically installed or updated with a migration job during the deployment of the Generative AI service.

For offline installations without access to the internet, the deployment requires additional steps.

Install in offline clusters

This job requires an external network connection: the execution environments pull additional dependencies from the internet. If there is no internet access, the job will fail. You can install the execution environments manually from a prebuilt package.

  1. Disable the migration job in the values.yamlfile:
buzok-onprem:
   buzok-worker:
     jobs:
       job-migration-execution-environments:
         enable: false
  1. Find the environment version id in the chart:
> helm template /path/to/datarobot/chart.tgz --set buzok-onprem.enabled=true | grep CUSTOM_MODEL_ENVIRONMENT_VERSION_ID:
CUSTOM_MODEL_ENVIRONMENT_VERSION_ID: "660577ba61d4c9b498e6bdaf"
  1. Download and unpack the pre-built image from the DataRobot CDN:
export CUSTOM_MODEL_ENVIRONMENT_VERSION_ID=<CUSTOM_MODEL_ENVIRONMENT_VERSION_ID>
curl -O https://genai.cdn.app.datarobot.com/execution-environments/execution-environments-v$CUSTOM_MODEL_ENVIRONMENT_VERSION_ID.tar.gz
tar -xvf execution-environments-v$CUSTOM_MODEL_ENVIRONMENT_VERSION_ID.tar.gz
  1. Obtain API token for GenAI service account:
GENAI_API_TOKEN=$(kubectl get secret buzok-secrets -o jsonpath='{.data.DATAROBOT_ADMIN_V2_API_TOKEN}' | base64 -d)
  1. Install the pre-built image using a generated script:
source installer_env/bin/activate
python install_public_environments.py <V2_API_ENDPOINT> $GENAI_API_TOKEN

You have to use V2_API_TOKEN specifically from the buzok-secrets secret: it has necessary permissions, and this service account should be the owner of the execution environment for GenAI. If you create the environment using any other user, we won't be able to update it automatically.

Troubleshooting Execution Environments

In case automated or manual installation of execution environments fails or timeouts, you can find logs in the job:

kubectl logs job.batch/buzok-worker-job-migration-execution-environments-<revision>

If the job installation timeouts waiting for environment build, check logs of the build service:

kubectl logs deployment/build-service | grep ERROR
kubectl logs deployment/queue-exec-manager-build-service | grep ERROR

In case of a recoverable error, you can retry the installation with the same execution environment version id.

GPU Configuration

Warning: GPU configuration for the Generative AI service has changed in 10.2. When upgrading from earlier version of DataRobot with enabled Generative AI feature, please follow guide below to configure from scratch.

The Generative AI service supports the usage of GPUs for the creation of Vector Databases. It doesn't require dedicated nodes and uses shared cluster resources for this type of workloads targeting GPUs. GPUs are able to significantly reduce the time required to build Vector Databases. However, they could be also significantly more expensive than regular instances. The Generative AI service is able to run with or without GPU nodes. You have the chance to fully decide on yourself if you want to use them.

The following table shows the time it takes to creates a Vector Database for different sized datasets. Dataset size indicates the size of the uncompressed text file that make up the datataset. For the GPU runtime, an instance with a Nvidia T4 GPU was used. For the CPU runtime, 5 cores of a 2nd Generation Intel Xeon Scalable CPU was used.

Dataset size Embedding Model CPU runtime GPU runtime GPU x times faster than CPU Creation time improvement on GPU
6 MB jinaai/jina-embedding-t-en-v1 00h 02m 00h 00m 7 00h 01m
6 MB intfloat/multilingual-e5-base 00h 20m 00h 01m 11 00h 18m
6 MB intfloat/e5-large-v2 01h 20m 00h 06m 12 01h 14m
47 MB jinaai/jina-embedding-t-en-v1 00h 14m 00h 02m 7 00h 12m
47 MB intfloat/multilingual-e5-base 02h 14m 00h 12m 11 02h 02m
47 MB intfloat/e5-large-v2 09h 08m 00h 44m 12 08h 24m
200 MB jinaai/jina-embedding-t-en-v1 01h 08m 00h 10m 7 00h 58m
200 MB intfloat/multilingual-e5-base 09h 22m 00h 53m 11 08h 28m

GPU Requirements

If you plan to utilize GPUs, be aware of the following requirements for the GPU nodes, based on maximum dataset size of 10Gb:

  • Nvidia GPU with one of these compute capabilities: 7.0 7.5 8.0 8.6 8.9 9.0 10.0 See this website for a list of GPUs and their compute capabilities.
  • 16GiB of GPU memory
  • 50GiB of RAM
  • 75GiB of storage
  • x86-64 processor with at least 2 CPU cores

Enable the use of GPUs

After you have prepared your Kubernetes cluster and your GPU nodes (see general GPU instructions), enable the usage of GPU nodes for the Generative AI service in the values.yaml file of the DataRobot helm chart. First, update core settings and describe GPU nodes that should be used by the Generative AI service.

core:
  config_env_vars:
    # An array of supported GPU types configured for worker job pods. Used to map vram and GPU
    # capabilities to the tolerations and node selectors Kubernetes administrators have set on GPU
    # enabled node groups.
    # NOTE: This setting is a JSON-formatted string. Please ensure it's properly formatted.
    KUBEWORKERS_GPU_SUPPORTED_TYPES: |-
      [
        {
          "priority": 1,
          "description": "gpu-node",
          "nodeSelectors": ["datarobot.com/node-capability: gpu"],
          "tolerations": ["nvidia.com/gpu=true:NoExecute"],
          "vram": 16,
          "capabilities": ["7.5"]
        }
      ]

Enable GPUs for VDB creation in Buzok configuration. Default configuration will be requested in this case (16 Gb vRAM, CUDA capabilities ["7.5"] - should match configuration/capabilities of GPU nodes in cluster described in KUBEWORKERS_GPU_SUPPORTED_TYPES).

buzok-onprem:
  buzok-worker:
    computeApi:
      enableGpuForVDB: true

In case your cluster has configured GPUs that have other set of CUDA Capabilities or less vRAM, you can customize these like below:

buzok-onprem:
  buzok-worker:
    computeApi:
      enableGpuForVDB: true
      jobResourcesMapOverrides:
        jobs:
          create_vector_database:
            gpu_vram: "8"
            gpu_capabilities:
              - "8.0"

Finally, to explicitly override all settings for a particular GenAI job executed via Compute API, you can do that using jobResourceMap parameter (for example, here's how overriding settings for create_vector_database_with_custom_embeddings job would look like):

buzok-onprem:
  buzok-worker:
    computeApi:
      jobResourcesMapOverrides:
        jobs:
          create_vector_database_with_custom_embeddings:
            cpu_request: "1"
            cpu_limit: "2"
            memory_request: 26Gi
            memory_limit: 32Gi
            storage_request: 75Gi
            gpu_enabled: false
            env:
              NVIDIA_TF32_OVERRIDE: "0"
              TDQM_DISABLE: "1"
              DEVICE_FOR_NEURAL_NETWORK_COMPUTATIONS: cpu
              NVIDIA_VISIBLE_DEVICES: none
              CUDA_VISIBLE_DEVICES: none

Here is a short description of configuration fields above: - cpu_request / cpu_limit - CPU cores required (passed to K8s on job creation) - memory_request / memory_limit - RAM required (passed to K8s on job creation) - storage_request - disk space required (passed to K8s on job creation) - gpu_enabled (false by default) - enables request for GPUs for the job (used by DR Kubeworkers Controller) - gpu_vram - VRAM required (used by DR Kubeworkers Controller). - gpu_capabilities - compute capabilities required by the job to execute. Refer to section above for details on capabilities supported by Vector Database creation job and different GPU models. - env - environment variables to set for a job. Usually these do not require any modifications. Just use one of sets below depending on GPU usage: - for jobs using GPU:

  NVIDIA_TF32_OVERRIDE: "0"                     # Disables TensorFloat-32 precision on Nvidia GPUs
  TDQM_DISABLE: "1"                             # Disables logs from certain libraries
  DEVICE_FOR_NEURAL_NETWORK_COMPUTATIONS: cuda  # Internal instruction to enable use of GPU in PyTorch
- for regular jobs that doesn't use GPU (NVIDIA_VISIBLE_DEVICES and CUDA_VISIBLE_DEVICES are used to explicitly prevent job from using GPU, even if pod is scheduled on GPU-enabled node):
  NVIDIA_TF32_OVERRIDE: "0"                     # Disables TensorFloat-32 precision on Nvidia GPUs
  TDQM_DISABLE: "1"                             # Disables logs from certain libraries
  DEVICE_FOR_NEURAL_NETWORK_COMPUTATIONS: cpu   # Internal instruction to use CPU in PyTorch
  NVIDIA_VISIBLE_DEVICES: none                  # Used to restrict GPU access at the container level
  CUDA_VISIBLE_DEVICES: none                    # Used to hide GPU from application inside container

If job requests gpu (gpu_enabled: true) - VRAM and GPU capabilities settings are used by DataRobot Kubeworkers Controller to match job request with one of nodegroups described in KUBEWORKERS_GPU_SUPPORTED_TYPES (job's VRAM request must be lesser/equal than nodegroup's, requested capabilities must be a subset of nodegroup's). Other request settings (CPU/RAM/Storage) are handled directly by K8s, as well as nodeSelectors and tolerations from selected nodegroup.

Installation runtime

Installing an umbrella chart with Generative AI service enabled takes up to 15-20 minutes. Add --timeout 20m0s to helm commands to avoid timeouts.

The runtime is mainly increased due to 2 heavy migrations: static files and execution environments. They are set to run post-upgrade, so helms run them sequentially and tracks their status. If needed, one can disable hooks usage to run migration jobs in parallel:

buzok-onprem:
   buzok-worker:
     runMigrationsPostInstall: false

NOTE: that option this will speed up installation, but Helm will not track the status of the migrations, and the installation script will not report their failure. So you need to track that those jobs are completed manually.