Image build service¶
This service aims to support all of DR's requirements for container image building. The main feature is accepting incoming REST API calls for building Docker images from a given set of artifacts, running those builds on k8s infrastructure, and publishing the resulting image to a target repository.
This is the list of known internal consumers (DataRobot components) of Image Build Service: - Custom Models - Custom Apps - Custom Jobs - GenAI (Buzok) - Notebooks
For additional information on object storage, please refer to object-storage-configuration
Configurations for onprem¶
IBS runs buildkit behind the scenes. For the known limitations, troubleshooting guide and for running BuildKit daemon as a non-root user please refer to https://github.com/moby/buildkit/blob/master/docs/rootless
For best performance, the node group running IBS should be running on Linux kernel 5.11+ and set buildService.envApp.secret.BUILDKIT_OCI_WORKER_SNAPSHOTTER to "overlayfs" in the IBS charts. For example:
build-service:
buildService:
envApp:
secret:
BUILDKIT_OCI_WORKER_SNAPSHOTTER: "overlayfs"
Example AWS AMIs with Linux Kernel Support: - Amazon Linux 2023 (AL2023) - ubuntu-eks/k8s_1.28/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20240301
To check the kernel version run on the node:
uname -r
Image signing¶
In 10.2.3, the ability to sign images using the Notary project and AWS Signer was added.
A few things are needed to utilize this feature.
Please follow the AWS Signer guide to setup image signing prerequisites on aws. - Create a signing profile https://docs.aws.amazon.com/signer/latest/developerguide/signing-profiles.html - Make sure the IAM identity has the right permissions: https://docs.aws.amazon.com/signer/latest/developerguide/image-signing-prerequisites.html#signer-iam-policy
When the AWS config is done, provide the Build service with the required values. An example is provided below but it's important to know that NOTATION_PROFILE and NOTATION_REGION has to match the previously created signer.
build-service:
buildService:
envApp:
secret:
IMAGE_SIGNER_TYPE: "notation"
NOTATION_PROFILE: "arn:aws:signer:us-east-1:1234567890:/signing-profiles/your-signing-profiles"
NOTATION_PLUGIN: "com.amazonaws.signer.notation.plugin"
NOTATION_REGION: "us-east-1"
Build service pushes the signature to ECR when an image has been built so it's important that access is given.
Object storage and other options in springProfile¶
The springProfile values in the values.yaml file support various configurations, including different object storage options. The format is <cluster_type>,<registry_type>,<storage_type>. The supported configurations are:
rhos,private_registry,{s3,azure,gcs,minio}aws,private_registry,{s3,azure,gcs,minio}rhos,ecr,{s3,azure,gcs,minio}aws,ecr,{s3,azure,gcs,minio}azure,acr,{s3,azure,gcs,minio}gcp,gcr,{s3,azure,gcs,minio}
If not specified, springProfile is generated automatically by checking registry and storage-specific environment variables.
The first value (cluster type) is always set to aws, unless it's installed in an OpenShift cluster, then rhos.
The second value (registry type) is always set to private_registry unless it should use ECR and authenticate to it with IRSA (IAM roles for ServiceAccounts).
The third value (storage type) is chosen between s3, azure, gcs, and minio.
Example for GKE installation:
build-service:
buildService:
springProfile: aws,private_registry,gcs
OAuth2 registry authentication (ACR / GCR)¶
Build Service supports OAuth2 Bearer token authentication for Private Registry, ACR, and GCR, with automatic fallback to Basic authentication.
Azure Container registry¶
AZURE_TENANT_ID is required. The service principal must have AcrPush role on the target registry.
Credentials Setup¶
To configure your registry, you must first authenticate and obtain a Service Principal. You can then choose to use the Service Principal password directly or generate a temporary Access Token.
1. Authenticate and Create a Service Principal If you do not already have a Service Principal, create one using the Azure CLI:
az ad sp create-for-rbac --name "my-build-service-sp" --role Contributor
Record the appId (this is your DOCKERHUB_USERNAME) and the password (this is your DOCKERHUB_PASSWORD and the Service Principal Password).
2. Generate an Access Token (Optional)
If you prefer to use a short-lived token instead of the Service Principal password as the DOCKERHUB_PASSWORD, first log in and then request the token:
# Log in using the Service Principal created above
az login --service-principal -u "APP_ID" -p "SERVICE_PRINCIPAL_PASSWORD" --tenant "TENANT_ID"
# Generate the access token
az account get-access-token --resource-type oss-rdbms
Configure values.yaml¶
In your values.yaml file, configure the username and password using the values obtained above:
build-service:
buildService:
springProfile: "azure,acr,s3"
envApp:
secret:
DOCKERHUB_USERNAME: "<service_principal_appId>"
DOCKERHUB_PASSWORD: "<access-token> or <service_principal_password>"
AZURE_TENANT_ID: "<your_tenant_id>"
Google Container Registry / Artifact Registry¶
build-service:
buildService:
springProfile: "google,gcr,s3"
envApp:
secret:
DOCKERHUB_USERNAME: "_json_key"
DOCKERHUB_PASSWORD: "<raw_service_account_json>"
DOCKERHUB_PASSWORD must be the raw JSON from the service account key file (not base64-encoded). The service account must have roles/artifactregistry.writer.
Access Tokens can also be utilized as the DOCKERHUB_PASSWORD for private registries.
No additional changes are required to use access tokens as the DOCKERHUB_PASSWORD in the values.yaml file
Extra image pull credentials¶
By default, IBS is configured to mount global image pull credentials into Image Builder pods, so it can pull base images from the same registry during custom image builds. In cases where custom images (models and/or apps) aren't trusted, an administrator might want to disable them during installation by setting the following Helm values:
build-service:
imagePullSecrets:
mount: false
In such cases, or when multiple different image registries are going to be used for pulling base images, additional image pull credentials can be provided for Image Builder:
build-service:
imageBuilder:
# -- Image pull credentials for the image builder
imagePullCredentials:
# -- Plain text credentials for OCI (docker) registries
# Each entry should have host, user, and password fields
plain:
- host: docker.io
user: dockerhub_user1
password: dockerhub_password
- host: private-registry.example
user: user1
password:
# -- Secrets of type kubernetes.io/dockerconfigjson for OCI (docker) registries
# Each entry is an existing k8s secret name
secret:
- name: dockerhub-image-pull-secret-read-only
# -- External secrets for OCI (docker) registries
# Works only if buildService.secretManager.enabled is true
# Each entry is an external secret reference to create a k8s secret from
# Each entry should have secretName, remoteRefKey, and remoteRefProperty fields
externalSecret:
- secretName: dockerhub-image-pull-secret-external
remoteRefKey: /ibs/image-builder-pull-credentials
remoteRefProperty: dockerconfigjson
Wildcard registry credentials¶
To use wildcard registry credentials (for example, *.myregistry.com) in image builds in qemu images, follow the steps below.
Prerequisites¶
A Kubernetes secret of type kubernetes.io/dockerconfigjson containing a wildcard entry in the auths section, for example:
{
"auths": {
"*.myregistry.com": {
"auth": "<base64-encoded-username:password>"
}
}
}
Configure build-service¶
Set the following environment variables on the build-service deployment:
| Variable | Description | Example |
|---|---|---|
IMAGE_BUILDER_IMAGE_PULL_SECRETS_TO_MOUNT |
Comma-separated list of secret names to mount into image-builder pods | my-registry-creds |
IMAGE_BUILDER_WILDCARD_REGISTRY_HOSTS |
Comma-separated list of exact hostnames to expand wildcard credentials to | repo1.myregistry.com,repo2.myregistry.com |
Notes¶
IMAGE_BUILDER_WILDCARD_REGISTRY_HOSTSmust explicitly list every hostname that should inherit the wildcard credentials — BuildKit requires exact hostname matches and does not support wildcards natively.- Multiple secrets can be provided as a comma-separated list in
IMAGE_BUILDER_IMAGE_PULL_SECRETS_TO_MOUNT.
Resources requests and limits¶
For supporting building of extra large custom images (usually 4+GB), Image Builder might need to be configured with increased CPU, memory, and storage requests and limits:
build-service:
imageBuilder:
resources:
requests:
cpu: "1"
memory: "4G"
limits:
cpu: "2"
memory: "4G"
The default requests/limits are: CPU 1/1, RAM 1GB/1GB, Storage 1GB/100GB.
Environment variables under envApp¶
The envApp section in the values.yaml file defines the environment variables required for different profiles. Here are the supported environment variables and their explanations:
Secrets¶
- LOGS_BUCKET: Specifies the bucket name for storing logs (
databy default). - ALLOW_SELF_SIGNED_CERTS: Controls whether TLS verification is enabled for requests to external resources like Minio, Container Registry, and Ingress (
falseby default). - DISABLE_HTTPS: Determines whether HTTP or HTTPS is used for the Private Container Registry (
falseby default).
Database parameters¶
- POSTGRES_USER: The username for the PostgreSQL database.
- POSTGRES_PASSWORD: The password for the PostgreSQL database.
- POSTGRES_HOST: The host address of the PostgreSQL database.
- POSTGRES_PORT: The port number for the PostgreSQL database (
5432by default). - POSTGRES_DB: The database name for PostgreSQL.
Parameters required by MinIO profile¶
- MINIO_SERVER_HOST: The endpoint URL for the Minio server (e.g.,
http://<minio_endpoint>:9000). - MINIO_SERVER_ROOT_USER: The root username for the Minio server.
- MINIO_SERVER_ROOT_PASSWORD: The root password for the Minio server.
Parameters required by private_registry profile¶
- DOCKERHUB_USERNAME: The username for DockerHub or other private OCI registry.
- DOCKERHUB_PASSWORD: The password for DockerHub or other private OCI registry.
External secrets¶
The externalsecret section allows defining environment variables that are retrieved from an external secret store. These values can override any keys defined in the secret section. Example:
build-service:
buildService:
envApp:
externalsecret:
POSTGRES_USER:
name: external-postgres-secret
key: username
POSTGRES_PASSWORD:
name: external-postgres-secret
key: password
This configuration retrieves the POSTGRES_USER and POSTGRES_PASSWORD from an external secret named external-postgres-secret.
QEMU Image builder for a rootless mode without capabilities (security)¶
In order to comply with pod security policies, Image Builder pods might be required to run with the following security context applied:
build-service:
imageBuilder:
securityContext:
allowPrivilegeEscalation: false
seccompProfileType: "RuntimeDefault"
seLinuxType: "" # Leave empty for non-SELinux clusters; use "container_t" for SELinux-enabled environments
capabilities:
drop:
- ALL
add: []
Without SETUID and SETGID capabilities added, BuildKit is unable to build images (in rootless mode), so it needs to be run inside a QEMU virtual machine. To make that possible, set useQemu: true in the ImageBuilder configuration—this automatically selects the correct QEMU image tag:
build-service:
imageBuilder:
useQemu: true
resources:
requests:
cpu: "2"
memory: "4G"
limits:
cpu: "4"
memory: "8G"
Note: Ensure the LOGS_BUCKET in values.yaml is set to your actual S3 bucket name (default is "data"):
buildService:
envApp:
secret:
LOGS_BUCKET: your-s3-bucket-name
Please note, that running Image Builder within QEMU causes its performance to degrade significantly and requires more compute resources (CPU and RAM) compared to the default mode.
OpenShift Security context constraints (scc) for QEMU¶
When running Image Builder in QEMU mode on OpenShift, you can use the nonroot-v2 SCC instead of requiring privileged access. This provides better security by avoiding privileged mode while still allowing QEMU-based image builds.
To configure this in values.yaml:
build-service:
buildService:
serviceAccount:
securityContextConstraints: nonroot-v2
Note
This SCC configuration is only applicable for OpenShift clusters and should only be used when running Image Builder in QEMU mode as described above.
Kaniko image builder¶
Kaniko is an alternative build engine that builds container images entirely in user space, without requiring a daemon or privileged containers. This is available in 11.7.0 and onwards. It is a common choice when BuildKit’s default pod security or runtime requirements are not acceptable on your cluster. Kaniko still uses a container security context tuned for successful builds (root in-container, specific capabilities, and related settings); align imageBuilder.kaniko.containerSecurityContext and namespace policies with your admission rules as needed—see Kaniko worker security context.
To enable Kaniko, set the following in values.yaml:
build-service:
imageBuilder:
buildEngine: "kaniko"
Kaniko builder image¶
The container image used for Kaniko build pods (KANIKO_IMAGE) is chosen as follows:
- Default: The same registry and repository prefix as the standard Image Builder image (
IMAGE_BUILDER_REPOSITORYin the build-service deployment—typically derived from your chartimagesettings when you mirror build-service), plusimageBuilder.name(defaultimage-builder), with an image tag of<version>-kaniko-image. You do not need a separate Docker Hub path for Kaniko if build-service and Image Builder already use your private registry mirror. - Override: Set
imageBuilder.kaniko.imageto a full image reference (registry, path, tag, or digest) if the Kaniko image is hosted elsewhere or you need a fixed digest.
Kaniko worker security context¶
imageBuilder.securityContext applies to the BuildKit image-builder workload (and related env vars such as IMAGE_BUILDER_SECURITY_CONTEXT_*). It does not control Kaniko build pods.
When buildEngine: "kaniko", the build service reads imageBuilder.kaniko.containerSecurityContext from values.yaml and applies it to Kaniko containers (and Kaniko init containers in the scanner flow) via KANIKO_CONTAINER_SECURITY_CONTEXT_* on the build-service deployment.
Chart defaults (you can override any field under imageBuilder.kaniko.containerSecurityContext):
runAsUser: 0
runAsGroup: 0
runAsNonRoot: false
privileged: false
allowPrivilegeEscalation: false
readOnlyRootFilesystem: false
seccompProfileType: "Unconfined"
seLinuxType: "spc_t"
capabilities:
drop: [ALL]
add: [SETUID, SETGID, CHOWN, FOWNER]
The FOWNER capability is required for many Dockerfiles (for example Debian-based images): without it, layer unpack or package scripts can fail with chmod: operation not permitted on paths such as /var/cache/apt.
Overriding capabilities
If you set containerSecurityContext.capabilities.add or .drop in values.yaml, you must supply the full list you need. The chart merge replaces each capabilities list as a whole; partial lists are not merged field-by-field.
Strict admission policies (Kyverno, Pod Security, SELinux)
Defaults such as runAsUser: 0, added capabilities, readOnlyRootFilesystem: false, seccompProfileType: Unconfined, and seLinuxType: spc_t are chosen so typical Kaniko builds succeed. Stricter cluster policies may still reject these settings until you tune containerSecurityContext (for example seccompProfileType: RuntimeDefault or a different seLinuxType where your platform allows it) or negotiate policy exceptions for the image-builder namespace. There is no single configuration that satisfies every policy while supporting all customer Dockerfiles; validate against your admission rules.
Example override (illustrative only—adjust to match what your cluster allows):
build-service:
imageBuilder:
buildEngine: "kaniko"
kaniko:
containerSecurityContext:
seccompProfileType: "RuntimeDefault"
seLinuxType: "container_t"
capabilities:
drop: [ALL]
add: [SETUID, SETGID, CHOWN, FOWNER]
OpenTelemetry for Kaniko builds (optional)¶
Kaniko build pods receive OpenTelemetry-related environment variables from the build service when OTLP export is enabled on the chart.
build-service:
imageBuilder:
buildEngine: "kaniko"
otlpEnabled: true
otlpExporter: "http://your-otel-collector:4318"
Optional intervals can be set on the build-service deployment (for example via extra env in your umbrella chart): OTEL_METRIC_EXPORT_INTERVAL, OTEL_BLRP_SCHEDULE_DELAY.
User namespaces (hostUsers: false)¶
For clusters with policies that explicitly require runAsNonRoot: true, you can use Kubernetes user namespaces alongside Kaniko. With hostUsers: false, UID 0 inside the container is mapped to an unprivileged UID (65536+) on the host, satisfying host-level security policies while still allowing Kaniko to function normally inside the pod.
By default, the build service sets hostUsers: false on Kaniko pods when Kaniko is enabled. You do not need extra values.yaml keys for that default behavior—however, user namespaces have strict infrastructure requirements:
| Requirement | Minimum version |
|---|---|
| Kubernetes | 1.30 (feature is Beta) |
| Linux kernel | 6.3 |
| containerd | 2.0 |
| runc | 1.2 |
Note
User namespace support is a Beta feature in Kubernetes 1.30+ and requires coordination with your platform and security teams to enable. Verify your node OS and container runtime versions before relying on this for policy compliance.
If your cluster cannot use user namespaces (unsupported Kubernetes or kernel/runtime, or admission policies that reject hostUsers: false), disable the default by setting imageBuilder.kaniko.userNamespaceIsolation to false (with buildEngine: "kaniko"). The build service then omits hostUsers on Kaniko pods so the cluster default applies. Tradeoff: you lose UID remapping to the host (UID 0 in the container no longer maps to an unprivileged host UID); use this only when required by your platform.
build-service:
imageBuilder:
buildEngine: "kaniko"
kaniko:
userNamespaceIsolation: false
Network requirements¶
During image builds, IBS build pods require outbound access to external resources. The same requirements apply to all IBS consumers: custom models, applications, custom jobs, GenAI, and custom execution environments in notebooks. In restricted environments, these endpoints must be explicitly allowed.
Container registries¶
Build pods pull base images from external container registries. The following registries are supported by default:
| Registry | Description |
|---|---|
registry-1.docker.io, *.docker.io |
Docker Hub—DataRobot base images and user-supplied images. |
gcr.io |
Google Container Registry. |
ghcr.io |
GitHub Container Registry. |
quay.io |
Red Hat Quay. |
public.ecr.aws |
Amazon ECR Public Gallery. |
registry.gitlab.com |
GitLab Container Registry. |
*.jfrog.io |
JFrog Artifactory. |
*.azurecr.io |
Azure Container Registry. |
*.pkg.dev |
Google Artifact Registry. |
*.dkr.ecr.<region>.amazonaws.com |
AWS Elastic Container Registry (ECR). |
Note
Docker Hub also requires access to its Content Delivery Network (CDN) layer. See Docker Hub artifacts for a full list of required Docker Hub endpoints.
The set of registries that users can specify as base images is controlled by the EXECUTION_ENVIRONMENT_VERSION_URI_VALIDATION_ALLOWLIST environment variable, configured in core.config_env_vars in your values.yaml file. By default, docker.io/datarobot/* and docker.io/datarobotdev/* are blocked via EXECUTION_ENVIRONMENT_VERSION_URI_VALIDATION_DENYLIST to prevent users from overriding internal DataRobot images.
Python package index¶
When an execution environment declares Python dependencies (requirements.txt), packages are installed from PyPI during the image build:
| Endpoint | Description |
|---|---|
pypi.org |
Python package index. |
files.pythonhosted.org |
Python package file downloads. |
To use an internal PyPI mirror instead, configure the following environment variables in core.config_env_vars in your values.yaml file:
| Variable | Description |
|---|---|
CUSTOM_MODEL_DEPENDENCIES_PYTHON_INDEX |
Override the pip index URL (e.g., your Artifactory PyPI mirror). |
CUSTOM_MODEL_DEPENDENCIES_PYTHON_TRUSTED_HOST |
Skip SSL verification for the mirror host. |
CUSTOM_MODEL_DEPENDENCIES_HTTP_PROXY |
Route pip installs through an HTTP proxy. |
R package repository¶
For R-based execution environments, packages are installed from CRAN during image builds:
| Endpoint | Description |
|---|---|
cran.rstudio.com |
Default CRAN mirror. |
To use an internal CRAN mirror, configure the following environment variable in core.config_env_vars in your values.yaml file:
| Variable | Description |
|---|---|
CUSTOM_MODEL_DEPENDENCIES_CRAN_MIRROR |
Override the default CRAN mirror URL. |
OS-level package managers¶
Dockerfile-based execution environments may invoke OS package managers during the build. The required endpoints depend on the base image operating system:
| OS | Endpoints |
|---|---|
| Debian | deb.debian.org, security.debian.org |
| Ubuntu | archive.ubuntu.com, security.ubuntu.com |
| Alpine | dl-cdn.alpinelinux.org |
| RHEL / CentOS | mirror.centos.org |
npm (Node.js)¶
If execution environment Dockerfiles install Node.js packages, access to the npm registry is required:
| Endpoint | Description |
|---|---|
registry.npmjs.org |
npm package registry. |
AWS Signer (image signing only)¶
If image signing is enabled, IBS calls the AWS Signer API after each build to sign the resulting image and push the signature to ECR:
| Endpoint | Description |
|---|---|
signer.<region>.amazonaws.com |
AWS Signer API |
*.dkr.ecr.<region>.amazonaws.com |
ECR — signature push destination |
Replace <region> with the AWS region of your signing profile and ECR registry.
Image scanning¶
Image Build Service supports pre-push security scanning of container images. This feature is opt-in and disabled by default. To enable it, configure the following in values.yaml:
build-service:
imageBuilder:
imageScanner:
enabled: true
image: "customer-registry.com/customer-scanner:v1.0.0" # Custom scanner image (must include curl)
command: ["/bin/sh", "-c"] # Required for report upload functionality
args:
- |
snyk container test --file=/shared/image.tar \
--severity-threshold=medium \
--json-file-output=/shared/scan-report.json \
|| exit 1
env:
SNYK_TOKEN: "customer-token-here" # Scanner-specific credentials
SNYK_CACHE_PATH: "./shared/.snyk-cache"
reportUploadPath: "s3://customer-bucket/scan-reports"
resources: # Optional: defaults to 512Mi/256Mi memory, 500m/250m CPU
limits:
memory: "1Gi"
cpu: "1000m"
requests:
memory: "512Mi"
cpu: "500m"
Requirements:
- Custom scanner image: You must provide a custom scanner container image that includes both your scanning tool and curl. Base scanner images (e.g., aquasec/trivy:latest) can't be used directly.
- Scanner interface: The scanner must read from /shared/image.tar and write a JSON report to /shared/scan-report.json. Exit code 0 allows the build to continue; non-zero stops the build.