Skip to content

DataRobot container images

The DataRobot platform uses Docker Hub as its primary container registry. While the DataRobot Helm chart is configured to use Docker Hub by default, it also supports using a dedicated private ECR container registry if required.

Note

For additional information about Docker Hub, see the Docker Hub documentation.

Create repositories for custom models

If you plan to use custom models, you must create the following repositories in your private ECR Registry. These repositories are used by the build-service component to store the generated images.

Optionally, set an additional prefix for OCI repository names:

export CUSTOM_MODELS_REPO_PREFIX="" # Optional: Set additional prefix for Custom Models repository names

Create the repositories:

EXTRAREPO=(
    "base-image"
    "services/custom-model-conversion"
    "managed-image"
    "ephemeral-image"
    "custom-apps/managed-image"
    "custom-jobs/managed-image"
)
for REPOPREFIX in "${EXTRAREPO[@]}"; do
    echo "Processing: $item"
    REPO="$CUSTOM_MODELS_REPO_PREFIX/$REPOPREFIX"
    REPO="${REPO#/}"
    echo "creating $REPO"
    aws ecr describe-repositories --repository-names ${REPO} || aws ecr create-repository --repository-name ${REPO}
done

Using private Amazon Elastic Container Registry (ECR)

You can install the DataRobot platform by pulling application images from either Docker Hub or a private Amazon Elastic Container Registry (ECR). If you plan to use a private ECR Registry to store the DataRobot platform images, follow the steps listed below.

Set environment variables

To store configuration values, set the following variables (ensure that the below commands include the necessary context):

export DOCKERHUB_USERNAME="DATAROBOT_CUSTOMER_USERNAME"
export DOCKERHUB_PASSWORD="DATAROBOT_CUSTOMER_PASSWORD"
export DATAROBOT_VERSION="X.X.X"
export AWS_ACCOUNT_ID="DESIGNATED_AWS_ACCOUNT_ID"
export AWS_ECR_URL="DESIGNATED_AWS_ECR_URL"
export AWS_REGION="DESIGNATED_AWS_REGION"

Note

  • Replace DATAROBOT_CUSTOMER_USERNAME and DATAROBOT_CUSTOMER_PASSWORD with your Docker Hub credentials. You can request this credentials through the DataRobot Support Portal or by emailing support@datarobot.com.
  • Replace X.X.X with the latest release chart version.
  • Replace DESIGNATED_AWS_ACCOUNT_ID with your actual AWS account id.
  • Replace DESIGNATED_AWS_ECR_URL with your actual ECR URL.
  • Replace DESIGNATED_AWS_REGION with your actual AWS region.

Optionally, set an additional prefix for OCI repository names:

export DATAROBOT_REPO_PREFIX="" # Optional: Set additional prefix for DataRobot repository names
export SPARK_REPO_PREFIX="" # Optional: Set additional prefix for Spark repository names

Note

DataRobot Iamges are prefixed with datarobot/ by default

Initial setup {: initial-setup }

To authenticate your Docker Hub session, run the following command:

echo ${DOCKERHUB_PASSWORD} | docker login -u ${DOCKERHUB_USERNAME} --password-stdin

Download the main DataRobot chart from Docker Hub:

echo ${DOCKERHUB_PASSWORD} | helm registry login registry-1.docker.io -u ${DOCKERHUB_USERNAME} --password-stdin
helm pull oci://registry-1.docker.io/datarobot/datarobot-prime --version ${DATAROBOT_VERSION}

The process of loading images into your private registry is handled by a dedicated Helm plugin. To install the plugin, use the following command:

helm plugin install https://github.com/datarobot-oss/helm-datarobot-plugin.git

Create repositories for the DataRobot platform {: create-dr-repos }

To create the required repositories in ECR, you can use a for loop. The following script checks if a repository exists, and if one isn't found, the script creates one:

for REPOLIST in $(helm datarobot image datarobot-prime-${DATAROBOT_VERSION}.tgz | grep image: | sed 's|image: ||;' ); do
    REPOPREFIX=$(echo "$REPOLIST" | sed 's|^docker.io/||; s|:.*||')
    REPO="$DATAROBOT_REPO_PREFIX/$REPOPREFIX"
    REPO="${REPO#/}"
    echo "creating $REPO"
    aws ecr describe-repositories --repository-names ${REPO} || aws ecr create-repository --repository-name ${REPO}
done

Load images into ECR

Load images into your private registry by pulling DataRobot images from Docker Hub:

TOKEN=$(aws ecr get-login-password --region ${AWS_REGION})
[ -z "${DATAROBOT_REPO_PREFIX}" ] || export PREFIX_ARG="--prefix ${DATAROBOT_REPO_PREFIX}"
helm datarobot sync datarobot-prime-${DATAROBOT_VERSION}.tgz ${PREFIX_ARG} -r ${AWS_ECR_URL} -u AWS -p ${TOKEN}

If it is a restricted network, you can load images using the compressed tarball artifact:

TOKEN=$(aws ecr get-login-password --region ${AWS_REGION})
[ -z "${DATAROBOT_REPO_PREFIX}" ] || export PREFIX_ARG="--prefix ${DATAROBOT_REPO_PREFIX}"
helm datarobot load tarball-datarobot-${DATAROBOT_VERSION}.tar.zst ${PREFIX_ARG} -r ${AWS_ECR_URL} -u AWS -p ${TOKEN}

Optional: Copy Spark batch image if you need to enable feature discovery in distributed mode

export SAFER_VERSION="0.22.5"
REPO="$SPARK_REPO_PREFIX/spark-batch-image"
REPO=${REPO#/}
echo "creating $REPO"
aws ecr describe-repositories --repository-names ${REPO} || aws ecr create-repository --repository-name ${REPO}
docker pull datarobot/dr-docker-spark:3.5.1___safer___aws-emr-serverless-$SAFER_VERSION
NEW_TAG=${AWS_ECR_URL}/${REPO}:3.5.1-safer-$SAFER_VERSION
docker tag datarobot/dr-docker-spark:3.5.1___safer___aws-emr-serverless-$SAFER_VERSION ${NEW_TAG}
aws ecr get-login-password --region ${AWS_REGION} | docker login --username AWS --password-stdin ${AWS_ECR_URL}
docker push ${NEW_TAG}

Verify images are in ECR

To verify that the repositories have been created and the images are loaded into AWS ECR, use the following commands.

To get a list of repositories containing a specific name:

aws ecr describe-repositories --region ${AWS_REGION} | grep -B1 -A9 ${AWS_ECR_URL}

To describe the images for one of the returned repositories:

aws ecr describe-images --registry-id ${AWS_ACCOUNT_ID} --region ${AWS_REGION} --repository-name ${AWS_ECR_URL}/datarobot/datarobot-runtime