Skip to content

Image build service

This service aims to support all of DR's requirements for container image building. The main feature is accepting incoming REST API calls for building Docker images from a given set of artifacts, running those builds on k8s infrastructure, and publishing the resulting image to a target repository.

This is the list of known internal consumers (DataRobot components) of Image Build Service: - Custom Models - Custom Apps - Custom Jobs - GenAI (Buzok) - Notebooks

For additional information on object storage, please refer to object-storage-configuration

Configurations for onprem

IBS runs buildkit behind the scenes. For the known limitations, troubleshooting guide and for running BuildKit daemon as a non-root user please refer to https://github.com/moby/buildkit/blob/master/docs/rootless

For best performance, the node group running IBS should be running on Linux kernel 5.11+ and set buildService.envApp.secret.BUILDKIT_OCI_WORKER_SNAPSHOTTER to "overlayfs" in the IBS charts. For example:

build-service:
  buildService:
    envApp:
      secret:
        BUILDKIT_OCI_WORKER_SNAPSHOTTER: "overlayfs"

Example AWS AMIs with Linux Kernel Support: - Amazon Linux 2023 (AL2023) - ubuntu-eks/k8s_1.28/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20240301

To check the kernel version run on the node:

    uname -r

Image signing

In 10.2.3, the ability to sign images using the Notary project and AWS Signer was added. A few things are needed to utilize this feature.

Please follow the AWS Signer guide to setup image signing prerequisites on aws. - Create a signing profile https://docs.aws.amazon.com/signer/latest/developerguide/signing-profiles.html - Make sure the IAM identity has the right permissions: https://docs.aws.amazon.com/signer/latest/developerguide/image-signing-prerequisites.html#signer-iam-policy

When the AWS config is done, provide the Build service with the required values. An example is provided below but it's important to know that NOTATION_PROFILE and NOTATION_REGION has to match the previously created signer.

build-service:
  buildService:
    envApp:
      secret:
        IMAGE_SIGNER_TYPE: "notation"
        NOTATION_PROFILE: "arn:aws:signer:us-east-1:1234567890:/signing-profiles/your-signing-profiles"
        NOTATION_PLUGIN: "com.amazonaws.signer.notation.plugin"
        NOTATION_REGION: "us-east-1"

Build service pushes the signature to ECR when an image has been built so it's important that access is given.

Object storage and other options in springProfile

The springProfile values in the values.yaml file support various configurations, including different object storage options. The format is <cluster_type>,<registry_type>,<storage_type>. The supported configurations are:

  • rhos,private_registry,{s3,azure,gcs,minio}
  • aws,private_registry,{s3,azure,gcs,minio}
  • rhos,ecr,{s3,azure,gcs,minio}
  • aws,ecr,{s3,azure,gcs,minio}
  • azure,acr,{s3,azure,gcs,minio}
  • gcp,gcr,{s3,azure,gcs,minio}

If not specified, springProfile is generated automatically by checking registry and storage-specific environment variables.

The first value (cluster type) is always set to aws, unless it's installed in an OpenShift cluster, then rhos.

The second value (registry type) is always set to private_registry unless it should use ECR and authenticate to it with IRSA (IAM roles for ServiceAccounts).

The third value (storage type) is chosen between s3, azure, gcs, and minio.

Example for GKE installation:

build-service:
  buildService:
    springProfile: aws,private_registry,gcs

OAuth2 registry authentication (ACR / GCR)

Build Service supports OAuth2 Bearer token authentication for Private Registry, ACR, and GCR, with automatic fallback to Basic authentication.

Azure Container registry

AZURE_TENANT_ID is required. The service principal must have AcrPush role on the target registry.

Credentials Setup

To configure your registry, you must first authenticate and obtain a Service Principal. You can then choose to use the Service Principal password directly or generate a temporary Access Token.

1. Authenticate and Create a Service Principal If you do not already have a Service Principal, create one using the Azure CLI:

az ad sp create-for-rbac --name "my-build-service-sp" --role Contributor

Record the appId (this is your DOCKERHUB_USERNAME) and the password (this is your DOCKERHUB_PASSWORD and the Service Principal Password).

2. Generate an Access Token (Optional) If you prefer to use a short-lived token instead of the Service Principal password as the DOCKERHUB_PASSWORD, first log in and then request the token:

# Log in using the Service Principal created above
az login --service-principal -u "APP_ID" -p "SERVICE_PRINCIPAL_PASSWORD" --tenant "TENANT_ID"

# Generate the access token
az account get-access-token --resource-type oss-rdbms
Configure values.yaml

In your values.yaml file, configure the username and password using the values obtained above:

build-service:
  buildService:
    springProfile: "azure,acr,s3"
    envApp:
      secret:
        DOCKERHUB_USERNAME: "<service_principal_appId>"
        DOCKERHUB_PASSWORD: "<access-token> or <service_principal_password>"
        AZURE_TENANT_ID: "<your_tenant_id>"

Google Container Registry / Artifact Registry

build-service:
  buildService:
    springProfile: "google,gcr,s3"
    envApp:
      secret:
        DOCKERHUB_USERNAME: "_json_key"
        DOCKERHUB_PASSWORD: "<raw_service_account_json>"

DOCKERHUB_PASSWORD must be the raw JSON from the service account key file (not base64-encoded). The service account must have roles/artifactregistry.writer.

Access Tokens can also be utilized as the DOCKERHUB_PASSWORD for private registries.

No additional changes are required to use access tokens as the DOCKERHUB_PASSWORD in the values.yaml file

Extra image pull credentials

By default, IBS is configured to mount global image pull credentials into Image Builder pods, so it can pull base images from the same registry during custom image builds. In cases where custom images (models and/or apps) aren't trusted, an administrator might want to disable them during installation by setting the following Helm values:

build-service:
  imagePullSecrets:
    mount: false

In such cases, or when multiple different image registries are going to be used for pulling base images, additional image pull credentials can be provided for Image Builder:

build-service:
  imageBuilder:
    # -- Image pull credentials for the image builder
    imagePullCredentials:
      # -- Plain text credentials for OCI (docker) registries
      # Each entry should have host, user, and password fields
      plain:
        - host: docker.io
          user: dockerhub_user1
          password: dockerhub_password
        - host: private-registry.example
          user: user1
          password:

      # -- Secrets of type kubernetes.io/dockerconfigjson for OCI (docker) registries
      # Each entry is an existing k8s secret name
      secret:
        - name: dockerhub-image-pull-secret-read-only

      # -- External secrets for OCI (docker) registries
      # Works only if buildService.secretManager.enabled is true
      # Each entry is an external secret reference to create a k8s secret from
      # Each entry should have secretName, remoteRefKey, and remoteRefProperty fields
      externalSecret:
        - secretName: dockerhub-image-pull-secret-external
          remoteRefKey: /ibs/image-builder-pull-credentials
          remoteRefProperty: dockerconfigjson

Wildcard registry credentials

To use wildcard registry credentials (for example, *.myregistry.com) in image builds in qemu images, follow the steps below.

Prerequisites

A Kubernetes secret of type kubernetes.io/dockerconfigjson containing a wildcard entry in the auths section, for example:

{
  "auths": {
    "*.myregistry.com": {
      "auth": "<base64-encoded-username:password>"
    }
  }
}

Configure build-service

Set the following environment variables on the build-service deployment:

Variable Description Example
IMAGE_BUILDER_IMAGE_PULL_SECRETS_TO_MOUNT Comma-separated list of secret names to mount into image-builder pods my-registry-creds
IMAGE_BUILDER_WILDCARD_REGISTRY_HOSTS Comma-separated list of exact hostnames to expand wildcard credentials to repo1.myregistry.com,repo2.myregistry.com

Notes

  • IMAGE_BUILDER_WILDCARD_REGISTRY_HOSTS must explicitly list every hostname that should inherit the wildcard credentials — BuildKit requires exact hostname matches and does not support wildcards natively.
  • Multiple secrets can be provided as a comma-separated list in IMAGE_BUILDER_IMAGE_PULL_SECRETS_TO_MOUNT.

Resources requests and limits

For supporting building of extra large custom images (usually 4+GB), Image Builder might need to be configured with increased CPU, memory, and storage requests and limits:

build-service:
  imageBuilder:
    resources:
      requests:
        cpu: "1"
        memory: "4G"
      limits:
        cpu: "2"
        memory: "4G"

The default requests/limits are: CPU 1/1, RAM 1GB/1GB, Storage 1GB/100GB.

Environment variables under envApp

The envApp section in the values.yaml file defines the environment variables required for different profiles. Here are the supported environment variables and their explanations:

Secrets

  • LOGS_BUCKET: Specifies the bucket name for storing logs (data by default).
  • ALLOW_SELF_SIGNED_CERTS: Controls whether TLS verification is enabled for requests to external resources like Minio, Container Registry, and Ingress (false by default).
  • DISABLE_HTTPS: Determines whether HTTP or HTTPS is used for the Private Container Registry (false by default).

Database parameters

  • POSTGRES_USER: The username for the PostgreSQL database.
  • POSTGRES_PASSWORD: The password for the PostgreSQL database.
  • POSTGRES_HOST: The host address of the PostgreSQL database.
  • POSTGRES_PORT: The port number for the PostgreSQL database (5432 by default).
  • POSTGRES_DB: The database name for PostgreSQL.

Parameters required by MinIO profile

  • MINIO_SERVER_HOST: The endpoint URL for the Minio server (e.g., http://<minio_endpoint>:9000).
  • MINIO_SERVER_ROOT_USER: The root username for the Minio server.
  • MINIO_SERVER_ROOT_PASSWORD: The root password for the Minio server.

Parameters required by private_registry profile

  • DOCKERHUB_USERNAME: The username for DockerHub or other private OCI registry.
  • DOCKERHUB_PASSWORD: The password for DockerHub or other private OCI registry.

External secrets

The externalsecret section allows defining environment variables that are retrieved from an external secret store. These values can override any keys defined in the secret section. Example:

build-service:
  buildService:
    envApp:
      externalsecret:
        POSTGRES_USER:
          name: external-postgres-secret
          key: username
        POSTGRES_PASSWORD:
          name: external-postgres-secret
          key: password

This configuration retrieves the POSTGRES_USER and POSTGRES_PASSWORD from an external secret named external-postgres-secret.

QEMU Image builder for a rootless mode without capabilities (security)

In order to comply with pod security policies, Image Builder pods might be required to run with the following security context applied:

build-service:
  imageBuilder:
    securityContext:
      allowPrivilegeEscalation: false
      seccompProfileType: "RuntimeDefault"
      seLinuxType: ""  # Leave empty for non-SELinux clusters; use "container_t" for SELinux-enabled environments
      capabilities:
        drop:
        - ALL
        add: []

Without SETUID and SETGID capabilities added, BuildKit is unable to build images (in rootless mode), so it needs to be run inside a QEMU virtual machine. To make that possible, set useQemu: true in the ImageBuilder configuration—this automatically selects the correct QEMU image tag:

build-service:
  imageBuilder:
    useQemu: true
    resources:
      requests:
        cpu: "2"
        memory: "4G"
      limits:
        cpu: "4"
        memory: "8G"

Note: Ensure the LOGS_BUCKET in values.yaml is set to your actual S3 bucket name (default is "data"):

buildService:
  envApp:
    secret:
      LOGS_BUCKET: your-s3-bucket-name

Please note, that running Image Builder within QEMU causes its performance to degrade significantly and requires more compute resources (CPU and RAM) compared to the default mode.

OpenShift Security context constraints (scc) for QEMU

When running Image Builder in QEMU mode on OpenShift, you can use the nonroot-v2 SCC instead of requiring privileged access. This provides better security by avoiding privileged mode while still allowing QEMU-based image builds.

To configure this in values.yaml:

build-service:
  buildService:
    serviceAccount:
      securityContextConstraints: nonroot-v2

Note

This SCC configuration is only applicable for OpenShift clusters and should only be used when running Image Builder in QEMU mode as described above.

Kaniko image builder

Kaniko is an alternative build engine that builds container images entirely in user space, without requiring a daemon or privileged containers. This is available in 11.7.0 and onwards. It is a common choice when BuildKit’s default pod security or runtime requirements are not acceptable on your cluster. Kaniko still uses a container security context tuned for successful builds (root in-container, specific capabilities, and related settings); align imageBuilder.kaniko.containerSecurityContext and namespace policies with your admission rules as needed—see Kaniko worker security context.

To enable Kaniko, set the following in values.yaml:

build-service:
  imageBuilder:
    buildEngine: "kaniko"

Kaniko builder image

The container image used for Kaniko build pods (KANIKO_IMAGE) is chosen as follows:

  • Default: The same registry and repository prefix as the standard Image Builder image (IMAGE_BUILDER_REPOSITORY in the build-service deployment—typically derived from your chart image settings when you mirror build-service), plus imageBuilder.name (default image-builder), with an image tag of <version>-kaniko-image. You do not need a separate Docker Hub path for Kaniko if build-service and Image Builder already use your private registry mirror.
  • Override: Set imageBuilder.kaniko.image to a full image reference (registry, path, tag, or digest) if the Kaniko image is hosted elsewhere or you need a fixed digest.

Kaniko worker security context

imageBuilder.securityContext applies to the BuildKit image-builder workload (and related env vars such as IMAGE_BUILDER_SECURITY_CONTEXT_*). It does not control Kaniko build pods.

When buildEngine: "kaniko", the build service reads imageBuilder.kaniko.containerSecurityContext from values.yaml and applies it to Kaniko containers (and Kaniko init containers in the scanner flow) via KANIKO_CONTAINER_SECURITY_CONTEXT_* on the build-service deployment.

Chart defaults (you can override any field under imageBuilder.kaniko.containerSecurityContext):

runAsUser: 0
runAsGroup: 0
runAsNonRoot: false
privileged: false
allowPrivilegeEscalation: false
readOnlyRootFilesystem: false
seccompProfileType: "Unconfined"
seLinuxType: "spc_t"
capabilities:
  drop: [ALL]
  add: [SETUID, SETGID, CHOWN, FOWNER]

The FOWNER capability is required for many Dockerfiles (for example Debian-based images): without it, layer unpack or package scripts can fail with chmod: operation not permitted on paths such as /var/cache/apt.

Overriding capabilities

If you set containerSecurityContext.capabilities.add or .drop in values.yaml, you must supply the full list you need. The chart merge replaces each capabilities list as a whole; partial lists are not merged field-by-field.

Strict admission policies (Kyverno, Pod Security, SELinux)

Defaults such as runAsUser: 0, added capabilities, readOnlyRootFilesystem: false, seccompProfileType: Unconfined, and seLinuxType: spc_t are chosen so typical Kaniko builds succeed. Stricter cluster policies may still reject these settings until you tune containerSecurityContext (for example seccompProfileType: RuntimeDefault or a different seLinuxType where your platform allows it) or negotiate policy exceptions for the image-builder namespace. There is no single configuration that satisfies every policy while supporting all customer Dockerfiles; validate against your admission rules.

Example override (illustrative only—adjust to match what your cluster allows):

build-service:
  imageBuilder:
    buildEngine: "kaniko"
    kaniko:
      containerSecurityContext:
        seccompProfileType: "RuntimeDefault"
        seLinuxType: "container_t"
        capabilities:
          drop: [ALL]
          add: [SETUID, SETGID, CHOWN, FOWNER]

OpenTelemetry for Kaniko builds (optional)

Kaniko build pods receive OpenTelemetry-related environment variables from the build service when OTLP export is enabled on the chart.

build-service:
  imageBuilder:
    buildEngine: "kaniko"
    otlpEnabled: true
  otlpExporter: "http://your-otel-collector:4318"

Optional intervals can be set on the build-service deployment (for example via extra env in your umbrella chart): OTEL_METRIC_EXPORT_INTERVAL, OTEL_BLRP_SCHEDULE_DELAY.

User namespaces (hostUsers: false)

For clusters with policies that explicitly require runAsNonRoot: true, you can use Kubernetes user namespaces alongside Kaniko. With hostUsers: false, UID 0 inside the container is mapped to an unprivileged UID (65536+) on the host, satisfying host-level security policies while still allowing Kaniko to function normally inside the pod.

By default, the build service sets hostUsers: false on Kaniko pods when Kaniko is enabled. You do not need extra values.yaml keys for that default behavior—however, user namespaces have strict infrastructure requirements:

Requirement Minimum version
Kubernetes 1.30 (feature is Beta)
Linux kernel 6.3
containerd 2.0
runc 1.2

Note

User namespace support is a Beta feature in Kubernetes 1.30+ and requires coordination with your platform and security teams to enable. Verify your node OS and container runtime versions before relying on this for policy compliance.

If your cluster cannot use user namespaces (unsupported Kubernetes or kernel/runtime, or admission policies that reject hostUsers: false), disable the default by setting imageBuilder.kaniko.userNamespaceIsolation to false (with buildEngine: "kaniko"). The build service then omits hostUsers on Kaniko pods so the cluster default applies. Tradeoff: you lose UID remapping to the host (UID 0 in the container no longer maps to an unprivileged host UID); use this only when required by your platform.

build-service:
  imageBuilder:
    buildEngine: "kaniko"
    kaniko:
      userNamespaceIsolation: false

Network requirements

During image builds, IBS build pods require outbound access to external resources. The same requirements apply to all IBS consumers: custom models, applications, custom jobs, GenAI, and custom execution environments in notebooks. In restricted environments, these endpoints must be explicitly allowed.

Container registries

Build pods pull base images from external container registries. The following registries are supported by default:

Registry Description
registry-1.docker.io, *.docker.io Docker Hub—DataRobot base images and user-supplied images.
gcr.io Google Container Registry.
ghcr.io GitHub Container Registry.
quay.io Red Hat Quay.
public.ecr.aws Amazon ECR Public Gallery.
registry.gitlab.com GitLab Container Registry.
*.jfrog.io JFrog Artifactory.
*.azurecr.io Azure Container Registry.
*.pkg.dev Google Artifact Registry.
*.dkr.ecr.<region>.amazonaws.com AWS Elastic Container Registry (ECR).

Note

Docker Hub also requires access to its Content Delivery Network (CDN) layer. See Docker Hub artifacts for a full list of required Docker Hub endpoints.

The set of registries that users can specify as base images is controlled by the EXECUTION_ENVIRONMENT_VERSION_URI_VALIDATION_ALLOWLIST environment variable, configured in core.config_env_vars in your values.yaml file. By default, docker.io/datarobot/* and docker.io/datarobotdev/* are blocked via EXECUTION_ENVIRONMENT_VERSION_URI_VALIDATION_DENYLIST to prevent users from overriding internal DataRobot images.

Python package index

When an execution environment declares Python dependencies (requirements.txt), packages are installed from PyPI during the image build:

Endpoint Description
pypi.org Python package index.
files.pythonhosted.org Python package file downloads.

To use an internal PyPI mirror instead, configure the following environment variables in core.config_env_vars in your values.yaml file:

Variable Description
CUSTOM_MODEL_DEPENDENCIES_PYTHON_INDEX Override the pip index URL (e.g., your Artifactory PyPI mirror).
CUSTOM_MODEL_DEPENDENCIES_PYTHON_TRUSTED_HOST Skip SSL verification for the mirror host.
CUSTOM_MODEL_DEPENDENCIES_HTTP_PROXY Route pip installs through an HTTP proxy.

R package repository

For R-based execution environments, packages are installed from CRAN during image builds:

Endpoint Description
cran.rstudio.com Default CRAN mirror.

To use an internal CRAN mirror, configure the following environment variable in core.config_env_vars in your values.yaml file:

Variable Description
CUSTOM_MODEL_DEPENDENCIES_CRAN_MIRROR Override the default CRAN mirror URL.

OS-level package managers

Dockerfile-based execution environments may invoke OS package managers during the build. The required endpoints depend on the base image operating system:

OS Endpoints
Debian deb.debian.org, security.debian.org
Ubuntu archive.ubuntu.com, security.ubuntu.com
Alpine dl-cdn.alpinelinux.org
RHEL / CentOS mirror.centos.org

npm (Node.js)

If execution environment Dockerfiles install Node.js packages, access to the npm registry is required:

Endpoint Description
registry.npmjs.org npm package registry.

AWS Signer (image signing only)

If image signing is enabled, IBS calls the AWS Signer API after each build to sign the resulting image and push the signature to ECR:

Endpoint Description
signer.<region>.amazonaws.com AWS Signer API
*.dkr.ecr.<region>.amazonaws.com ECR — signature push destination

Replace <region> with the AWS region of your signing profile and ECR registry.

Image scanning

Image Build Service supports pre-push security scanning of container images. This feature is opt-in and disabled by default. To enable it, configure the following in values.yaml:

build-service:
  imageBuilder:
    imageScanner:
      enabled: true
      image: "customer-registry.com/customer-scanner:v1.0.0"  # Custom scanner image (must include curl)
      command: ["/bin/sh", "-c"]  # Required for report upload functionality
      args:
        - |
          snyk container test --file=/shared/image.tar \
            --severity-threshold=medium \
            --json-file-output=/shared/scan-report.json \
            || exit 1
      env:
        SNYK_TOKEN: "customer-token-here"  # Scanner-specific credentials
        SNYK_CACHE_PATH: "./shared/.snyk-cache"
      reportUploadPath: "s3://customer-bucket/scan-reports"
      resources:  # Optional: defaults to 512Mi/256Mi memory, 500m/250m CPU
        limits:
          memory: "1Gi"
          cpu: "1000m"
        requests:
          memory: "512Mi"
          cpu: "500m"

Requirements: - Custom scanner image: You must provide a custom scanner container image that includes both your scanning tool and curl. Base scanner images (e.g., aquasec/trivy:latest) can't be used directly. - Scanner interface: The scanner must read from /shared/image.tar and write a JSON report to /shared/scan-report.json. Exit code 0 allows the build to continue; non-zero stops the build.