Migrate Custom Templates Job¶
The code for the custom templates is maintained in a DataRobot proprietary repository. The main artifact from that repository is a container that is responsible for two things: 1. Creating/updating execution environments used by custom templates 2. Creating/updating the custom templates for custom-metrics, custom-applications, and NVIDIA NIM Containers
The custom template code is kept in a separate repository to allow for maintenance independent of the main DataRobot repository. This allows updating on-prem instantiations with the custom templates updates without updating the entire DataRobot application. The separate repository also has the benefits of allowing more focused testing of the template code.
As of DataRobot 11.1.1, there are four execution environments. The list is subject to change over time.
Included execution environments¶
[DataRobot] NodeJS 22.14.0 Applications Base[DataRobot] Python 3.11 Custom Metrics[DataRobot] Python 3.12 Applications BaseDRUM NIM Sidecar
There are more than a hundred custom templates with a variety of template types. These templates are created/updated as required when the migration is done.
Execution¶
The custom templates installation/migration means running the datarobot-custom-templates container which contains
the information required to create/update the execution environments and custom templates. The use of an admin API
key is required, because normal users cannot manage globally provided templates. Here's a sample of the required
permissions for successfully installing/migrating the custom templates:
* Enable Admin Api Access
* Enable Custom Templates (Enabled by default)
* Enable Custom Jobs Template Gallery (Enabled by default)
* Can Manage Public Custom Environments
* Enable Resource Bundles
The following table highlights some of the options available in the container that help control the installation
behavior. The container's --help option will provide the most accurate information, and takes precedence over the
table with regard to functionality and accuracy.
| CLI Option | Environment Variable | Default | Notes |
|---|---|---|---|
--admin-api-token |
DR_API_KEY |
Must be provided | This is the admin API key |
--dr-host |
DR_HOST |
Must be provided | URL to the base host (e.g. https://staging.datarobot.com/) with or without /api/v2/ |
--[no-]update-exec-envs |
UPDATE_EXEC_ENVS |
False, no updates | When True, it creates or updates the execution environments. When False, it only creates missing execution environments. |
--template-types |
TEMPLATE_TYPES |
all | Allows update of limited template types for unusual circumstances. Allowed values are: all, applications, jobs, metrics, or models. |
--max-version-wait |
VERSION_TIMEOUT |
1200 seconds | Updating an execution-environment version involves building a new container which may exceed the "standard" 20 minute period. |
--force |
FORCE_TEMPLATE_UPDATE |
False | Force update of existing templates even when matches server. Useful if issues with detecting whether update is needed. |
--use-prebuilt-images |
USE_PREBUILT_IMAGES |
False | Use prebuilt execution-environment images instead of building in DR application |
--sleep-seconds |
0.25 seconds | Sleep time between template updates to avoid failures due to rate-limiting. | |
--log-level |
INFO |
Updating log levels can provide more information about execution. | |
--dry-run |
False | Checks whether templates need updating without updating them. |
The above options/flags can be directly provided as docker/podman arguments for running the container. For
options/flags with an environment variable, the value can be set using an environment variable or a direct CLI argument.
The following command snippets are functionally equivalent:
docker run -i datarobot/datarobot-custom-templates:<tag> --max-version-wait 300
docker run -e VERSION_TIMEOUT=300 datarobot/datarobot-custom-templates:<tag>
Image¶
When a datarobot-custom-templates container is built, it contains all the execution-environment and template
information such that running the container always produces the same result. Building the execution environments will
cause require external network access, but all the template information/code is frozen at the time of building the
datarobot-custom-templates container.
The image is included in the datarobot-app-charts repository's core-integration-tasks package. In your helm artifact
this should be located at charts/core-integration-tasks/Chart.yaml
No external network case¶
This job typically requires an external network connection because the execution environment images are built by the
DR application while the job is executing. You can use the --use-prebuilt-images option (or environment variable) to
use a set of pre-built images which are included in the datarobot-custom-templates image.
Please contact DataRobot support for further assistance related to this topic.
Cronjob¶
This task is run as a cronjob because it can take a long time to build all the execution environments.
The migrate_custom_templates cronjob is defined in the chart values, it is expected to run at the top of the hour
every hour 0 * * * *. It will attempt to install all missing execution environments and custom templates, and update
any custom templates that are needed.
[!WARNING] Local changes will be overwritten by the cronjob
The cronjob can be described using kubectl like this:
kubectl describe cronjob core-integration-tasks-custom-templates -n $NAMESPACE
The pod (while running) can be seen here:
kubectl get pods -n $NAMESPACE -l role=core-integration-tasks-custom-templates
Upgrading execution environments¶
When upgrading DataRobot versions, execution environments provisioned by the Custom Templates are NOT automatically upgraded to the latest versions. It is necessary to trigger the upgrade of execution environments in order to receive latest updates to the affected features. This operation will upgrade all the execution environments mentioned above.
docker run -i datarobot/datarobot-custom-templates:<tag> \
--admin-api-token <ADMIN_API_TOKEN> \
--dr-host <DR_HOST> \
--use-prebuilt-images \
--update-exec-envs
datarobot-custom-templates image.
Using the prebuilt images avoids involving Image Build Service.
This can also be accomplished using kubectl with a chart that looks like (updated with pointers for image, DR_HOST, and DR_API_KEY):
apiVersion: v1
kind: Pod
metadata:
name: custom-template-env-upgrade
namespace: datarobot
spec:
containers:
- name: custom-template-env-upgrade
image: datarobot/datarobot-custom-templates:11.1.438-image
env:
- name: USE_PREBUILT_IMAGES
value: 'true'
- name: UPDATE_EXEC_ENVS
value: 'true'
- name: DR_HOST
value: http://datarobot-public-api:8004
- name: DR_API_KEY
valueFrom:
secretKeyRef:
name: ui-admin-credentials
key: api_key
The biggest difference from example chart above to the standard helm-chart is that UPDATE_EXEC_ENVS is set to true.
This setting tells the pod to upload new containers for all the execution environments.
The USE_PREBUILT_IMAGES setting of true means that the execution environment containers included in the datarobot-custom-templates will be uploaded.
Deprecated Execution Environments¶
In release/11.4, some of the execution environments may get renamed to include [Deprecated] in the name, and new execution environments to be created.
The "deprecated" execution environments may still be in use, so they were not deleted.
However, no new versions (e.g. CVE fixes) will be applied to those environments.
Once the "deprecated" environments are no longer used, they can be removed.
Troubleshooting¶
Failed to build execution environment¶
In the event of Image Build Service errors or misconfiguration, the symptom on the application side would be
a failed or constantly-stuck (submitted status that never resolves) status of execution environments installed by the Custom Templates job,
e.g. [DataRobot] Python 3.11 Custom Metrics.
In this case it is possible to retrigger the job, forcing the latest updates as new versions of these environments using the --update-exec-envs flag.
The easiest way to do this is to manually run the job via:
docker run -i datarobot/datarobot-custom-templates:<tag> \
--admin-api-token <ADMIN_API_TOKEN> \
--dr-host <DR_HOST> \
--use-prebuilt-images \
--update-exec-envs
The above command will update the execution environments, and then point all the templates to the latest successful version of the respective execution environment.
Custom-metric jobs fail for odd reasons (e.g. function not found)¶
Custom-metric jobs fail for odd reasons (e.g. function not found).
Traceback (most recent call last):
File "/opt/code/main.py", line 24, in <module>
from dmm import log_parameters
ImportError: cannot import name 'log_parameters' from 'dmm' (/usr/local/lib/python3.11/dist-packages/dmm/__init__.py)
This can happen when the templates are updated to use functionality in a newer client library that is NOT in the
[DataRobot] Python 3.11 Custom Metrics container. In this case, the execution environment needs to
be updated. See the section on a failed build of an execution-environment.
File is not utf-8 encoded¶
I see an error saying:
datarobot.errors.ClientError: 422 client error: {'message': 'File is not utf-8 encoded.'}
The installation/migration compares all the files in a template to see if they're changed before making a decision about whether the template needs to be updated. In cases where binary files are used, the comparison currently ignores these files for the sake of determining if the template needs to be updated.
If the binary file(s) are the only changes to the template and it is important to use the latest binary file(s), then
you should use the --force command to cause all the templates to be updated.
docker run -i datarobot/datarobot-custom-templates:<tag> \
--admin-api-token <ADMIN_API_TOKEN> \
--dr-host <DR_HOST> \
--no-update-exec-envs \
--use-prebuilt-images \
--force
Unrecognized flag during migration¶
usage: migrate_templates.py [-h] [--admin-api-token ADMIN_API_TOKEN] [--dr-host DR_HOST] [--update-exec-envs | --no-update-exec-envs] [--template-types {all,applications,jobs,metrics,models}] [--delete-outdated-templates | --no-delete-outdated-templates] [--use-prebuilt-images | --no-use-prebuilt-images][--dry-run] [--force] [--sleep-seconds SLEEP_TIME] [--max-version-wait MAX_VERSION_WAIT] [--log-level LEVEL]
migrate_templates.py: error: unrecognized arguments: --use-generic-custom-templates-api
The --use-generic-custom-templates-api and --no-use-generic-custom-templates-api flags were used to control the API
endpoint that was used during migration. These flags were added when the API was moving from only custom-metrics (using
/api/v2/customMetricsTemplates/) to custom-metrics, custom-jobs, custom-applications, and custom-models (using the
generic path /api/v2/customTemplates/). Those flags were deprecated in 11.1, and the container is always run with the
equivalent of --use-generic-custom-templates-api.
References to the old flags should be removed.