DataRobot container images¶
The DataRobot platform uses Docker Hub as its primary container registry. While the DataRobot Helm chart is configured to use Docker Hub by default, it also supports using a dedicated private ECR container registry if required.
Note
For additional information about Docker Hub, see the Docker Hub documentation.
Create repositories for custom models¶
If you plan to use custom models, you must create the following repositories in your private ECR Registry. These repositories are used by the build-service component to store the generated images.
Optionally, set an additional prefix for OCI repository names:
export CUSTOM_MODELS_REPO_PREFIX="" # Optional: Set additional prefix for Custom Models repository names
Create the repositories:
EXTRAREPO=(
"base-image"
"services/custom-model-conversion"
"managed-image"
"ephemeral-image"
"custom-apps/managed-image"
"custom-jobs/managed-image"
)
for REPOPREFIX in "${EXTRAREPO[@]}"; do
echo "Processing: $item"
REPO="$CUSTOM_MODELS_REPO_PREFIX/$REPOPREFIX"
REPO="${REPO#/}"
echo "creating $REPO"
aws ecr describe-repositories --repository-names ${REPO} || aws ecr create-repository --repository-name ${REPO}
done
Using private Amazon Elastic Container Registry (ECR)¶
You can install the DataRobot platform by pulling application images from either Docker Hub or a private Amazon Elastic Container Registry (ECR). If you plan to use a private ECR Registry to store the DataRobot platform images, follow the steps listed below.
Set environment variables¶
To store configuration values, set the following variables (ensure that the below commands include the necessary context):
export DOCKERHUB_USERNAME="DATAROBOT_CUSTOMER_USERNAME"
export DOCKERHUB_PASSWORD="DATAROBOT_CUSTOMER_PASSWORD"
export DATAROBOT_VERSION="X.X.X"
export AWS_ACCOUNT_ID="DESIGNATED_AWS_ACCOUNT_ID"
export AWS_ECR_URL="DESIGNATED_AWS_ECR_URL"
export AWS_REGION="DESIGNATED_AWS_REGION"
Note
- Replace
DATAROBOT_CUSTOMER_USERNAMEandDATAROBOT_CUSTOMER_PASSWORDwith your Docker Hub credentials. You can request this credentials through the DataRobot Support Portal or by emailing support@datarobot.com. - Replace
X.X.Xwith the latest release chart version. - Replace
DESIGNATED_AWS_ACCOUNT_IDwith your actual AWS account id. - Replace
DESIGNATED_AWS_ECR_URLwith your actual ECR URL. - Replace
DESIGNATED_AWS_REGIONwith your actual AWS region.
Optionally, set an additional prefix for OCI repository names:
export DATAROBOT_REPO_PREFIX="" # Optional: Set additional prefix for DataRobot repository names
export SPARK_REPO_PREFIX="" # Optional: Set additional prefix for Spark repository names
Note
DataRobot Iamges are prefixed with datarobot/ by default
Initial setup {: initial-setup }¶
To authenticate your Docker Hub session, run the following command:
echo ${DOCKERHUB_PASSWORD} | docker login -u ${DOCKERHUB_USERNAME} --password-stdin
Download the main DataRobot chart from Docker Hub:
echo ${DOCKERHUB_PASSWORD} | helm registry login registry-1.docker.io -u ${DOCKERHUB_USERNAME} --password-stdin
helm pull oci://registry-1.docker.io/datarobot/datarobot-prime --version ${DATAROBOT_VERSION}
The process of loading images into your private registry is handled by a dedicated Helm plugin. To install the plugin, use the following command:
helm plugin install https://github.com/datarobot-oss/helm-datarobot-plugin.git
Create repositories for the DataRobot platform {: create-dr-repos }¶
To create the required repositories in ECR, you can use a for loop. The following script checks if a repository exists, and if one isn't found, the script creates one:
for REPOLIST in $(helm datarobot image datarobot-prime-${DATAROBOT_VERSION}.tgz | grep image: | sed 's|image: ||;' ); do
REPOPREFIX=$(echo "$REPOLIST" | sed 's|^docker.io/||; s|:.*||')
REPO="$DATAROBOT_REPO_PREFIX/$REPOPREFIX"
REPO="${REPO#/}"
echo "creating $REPO"
aws ecr describe-repositories --repository-names ${REPO} || aws ecr create-repository --repository-name ${REPO}
done
Load images into ECR¶
Load images into your private registry by pulling DataRobot images from Docker Hub:
TOKEN=$(aws ecr get-login-password --region ${AWS_REGION})
[ -z "${DATAROBOT_REPO_PREFIX}" ] || export PREFIX_ARG="--prefix ${DATAROBOT_REPO_PREFIX}"
helm datarobot sync datarobot-prime-${DATAROBOT_VERSION}.tgz ${PREFIX_ARG} -r ${AWS_ECR_URL} -u AWS -p ${TOKEN}
If it is a restricted network, you can load images using the compressed tarball artifact:
TOKEN=$(aws ecr get-login-password --region ${AWS_REGION})
[ -z "${DATAROBOT_REPO_PREFIX}" ] || export PREFIX_ARG="--prefix ${DATAROBOT_REPO_PREFIX}"
helm datarobot load tarball-datarobot-${DATAROBOT_VERSION}.tar.zst ${PREFIX_ARG} -r ${AWS_ECR_URL} -u AWS -p ${TOKEN}
Optional: Copy Spark batch image if you need to enable feature discovery in distributed mode
export SAFER_VERSION="0.22.5"
REPO="$SPARK_REPO_PREFIX/spark-batch-image"
REPO=${REPO#/}
echo "creating $REPO"
aws ecr describe-repositories --repository-names ${REPO} || aws ecr create-repository --repository-name ${REPO}
docker pull datarobot/dr-docker-spark:3.5.1___safer___aws-emr-serverless-$SAFER_VERSION
NEW_TAG=${AWS_ECR_URL}/${REPO}:3.5.1-safer-$SAFER_VERSION
docker tag datarobot/dr-docker-spark:3.5.1___safer___aws-emr-serverless-$SAFER_VERSION ${NEW_TAG}
aws ecr get-login-password --region ${AWS_REGION} | docker login --username AWS --password-stdin ${AWS_ECR_URL}
docker push ${NEW_TAG}
Verify images are in ECR¶
To verify that the repositories have been created and the images are loaded into AWS ECR, use the following commands.
To get a list of repositories containing a specific name:
aws ecr describe-repositories --region ${AWS_REGION} | grep -B1 -A9 ${AWS_ECR_URL}
To describe the images for one of the returned repositories:
aws ecr describe-images --registry-id ${AWS_ACCOUNT_ID} --region ${AWS_REGION} --repository-name ${AWS_ECR_URL}/datarobot/datarobot-runtime