Self-managed installation and maintenance > Installation and configuration guide > DataRobot advanced configuration section > Migrate Custom Templates Job

Migrate Custom templates job¶

The code for the custom templates is maintained in a DataRobot proprietary repository. The main artifact from that repository is a container that's responsible for two things: 1. Creating/updating execution environments used by custom templates 2. Creating/updating the custom templates for custom-metrics, custom-applications, and NVIDIA NIM Containers

The custom template code is kept in a separate repository to allow for maintenance independent of the main DataRobot repository. This allows updating on-prem instantiations with the custom templates updates without updating the entire DataRobot application. The separate repository also has the benefits of allowing more focused testing of the template code.

As of DataRobot 11.1.1, there are four execution environments. The list is subject to change over time.

Included execution environments¶

[DataRobot] NodeJS 22.14.0 Applications Base
[DataRobot] Python 3.11 Custom Metrics
[DataRobot] Python 3.12 Applications Base
DRUM NIM Sidecar

There are more than a hundred custom templates with a variety of template types. These templates are created/updated as required when the migration is done.

Execution¶

The custom templates installation/migration means running the datarobot-custom-templates container which contains the information required to create/update the execution environments and custom templates. The use of an admin API key is required, because normal users can't manage globally provided templates. Here's a sample of the required permissions for successfully installing/migrating the custom templates: * Enable Admin Api Access * Enable Custom Templates (Enabled by default) * Enable Custom Jobs Template Gallery (Enabled by default) * Can Manage Public Custom Environments * Enable Resource Bundles

The following table highlights some of the options available in the container that help control the installation behavior. The container's --help option provides the most accurate information, and takes precedence over the table with regard to functionality and accuracy.

CLI Option	Environment Variable	Default	Notes
`--admin-api-token`	`DR_API_KEY`	Must be provided	This is the admin API key
`--dr-host`	`DR_HOST`	Must be provided	URL to the base host (e.g. https://staging.datarobot.com/) with or without `/api/v2/`
`--[no-]update-exec-envs`	`UPDATE_EXEC_ENVS`	False, no updates	When True, it creates or updates the execution environments. When False, it only creates missing execution environments.
`--template-types`	`TEMPLATE_TYPES`	all	Allows update of limited template types for unusual circumstances. Allowed values are: all, applications, jobs, metrics, or models.
`--max-version-wait`	`VERSION_TIMEOUT`	1200 seconds	Updating an execution-environment version involves building a new container which may exceed the "standard" 20 minute period.
`--force`	`FORCE_TEMPLATE_UPDATE`	False	Force update of existing templates even when matches server. Useful if issues with detecting whether update is needed.
`--use-prebuilt-images`	`USE_PREBUILT_IMAGES`	False	Use prebuilt execution-environment images instead of building in DR application
`--sleep-seconds`		0.25 seconds	Sleep time between template updates to avoid failures due to rate-limiting.
`--log-level`		`INFO`	Updating log levels can provide more information about execution.
`--dry-run`		False	Checks whether templates need updating without updating them.

The above options/flags can be directly provided as docker/podman arguments for running the container. For options/flags with an environment variable, the value can be set using an environment variable or a direct CLI argument.

The following command snippets are functionally equivalent:

docker run -i  datarobot/datarobot-custom-templates:<tag> --max-version-wait 300
docker run -e VERSION_TIMEOUT=300 datarobot/datarobot-custom-templates:<tag>

Image¶

When a datarobot-custom-templates container is built, it contains all the execution-environment and template information such that running the container always produces the same result. Building the execution environments requires external network access, but all the template information/code is frozen at the time of building the datarobot-custom-templates container.

The image is included in the DataRobot-app-charts repository's core-integration-tasks package. In your helm artifact this should be located at charts/core-integration-tasks/Chart.yaml

No external network case¶

This job typically requires an external network connection because the execution environment images are built by the DR application while the job is executing. You can use the --use-prebuilt-images option (or environment variable) to use a set of pre-built images which are included in the datarobot-custom-templates image.

Please contact DataRobot support for further assistance related to this topic.

CronJob¶

This task is run as a cronjob because it can take a long time to build all the execution environments.

The migrate_custom_templates cronjob is defined in the chart values, it's expected to run at the top of the hour every hour 0 * * * *. It attempts to install all missing execution environments and custom templates, and update any custom templates that are needed.

[!WARNING] Local changes is overwritten by the cronjob

The cronjob can be described using kubectl like this:

kubectl describe cronjob core-integration-tasks-custom-templates -n $NAMESPACE

The pod (while running) can be seen here:

kubectl get pods -n $NAMESPACE -l role=core-integration-tasks-custom-templates

Upgrading execution environments¶

When upgrading DataRobot versions, execution environments provisioned by the Custom Templates aren't automatically upgraded to the latest versions. it's necessary to trigger the upgrade of execution environments in order to receive latest updates to the affected features. This operation upgrades all the execution environments mentioned above.

docker run -i  datarobot/datarobot-custom-templates:<tag> \
  --admin-api-token <ADMIN_API_TOKEN> \
  --dr-host <DR_HOST> \
  --use-prebuilt-images \
  --update-exec-envs

This uploads new versions of the execution environments that exist inside the datarobot-custom-templates image. Using the prebuilt images avoids involving Image Build Service.

This can also be accomplished using kubectl with a chart that looks like (updated with pointers for image, DR_HOST, and DR_API_KEY):

apiVersion: v1
kind: Pod
metadata:
  name: custom-template-env-upgrade
  namespace: datarobot
spec:
  containers:
    - name: custom-template-env-upgrade
      image: datarobot/datarobot-custom-templates:11.1.438-image
      env:
      - name: USE_PREBUILT_IMAGES
        value: 'true'
      - name: UPDATE_EXEC_ENVS
        value: 'true'
      - name: DR_HOST
        value: http://datarobot-public-api:8004
      - name: DR_API_KEY
        valueFrom:
          secretKeyRef:
            name: ui-admin-credentials
            key: api_key

The biggest difference from example chart above to the standard helm-chart is that UPDATE_EXEC_ENVS is set to true. This setting tells the pod to upload new containers for all the execution environments.

The USE_PREBUILT_IMAGES setting of true means that the execution environment containers included in the datarobot-custom-templates is uploaded.

Deprecated execution environments¶

In release/11.4, some of the execution environments may get renamed to include [Deprecated] in the name, and new execution environments to be created. The "deprecated" execution environments may still be in use, so they weren't deleted. However, no new versions (e.g. CVE fixes) is applied to those environments. Once the "deprecated" environments are no longer used, they can be removed.

Troubleshooting¶

Failed to build execution environment¶

In the event of Image Build Service errors or misconfiguration, the symptom on the application side would be a failed or constantly-stuck (submitted status that never resolves) status of execution environments installed by the Custom Templates job, e.g. [DataRobot] Python 3.11 Custom Metrics.

In this case it's possible to retrigger the job, forcing the latest updates as new versions of these environments using the --update-exec-envs flag.

The easiest way to do this is to manually run the job via:

docker run -i  datarobot/datarobot-custom-templates:<tag> \
  --admin-api-token <ADMIN_API_TOKEN> \
  --dr-host <DR_HOST> \
  --use-prebuilt-images \
  --update-exec-envs

The above command updates the execution environments, and then point all the templates to the latest successful version of the respective execution environment.

Custom-metric jobs fail for odd reasons (e.g. function not found)¶

Custom-metric jobs fail for odd reasons (e.g. function not found).

Traceback (most recent call last):
  File "/opt/code/main.py", line 24, in <module>
    from dmm import log_parameters
ImportError: cannot import name 'log_parameters' from 'dmm' (/usr/local/lib/python3.11/dist-packages/dmm/__init__.py)

This can happen when the templates are updated to use functionality in a newer client library that's NOT in the [DataRobot] Python 3.11 Custom Metrics container. In this case, the execution environment needs to be updated. See the section on a failed build of an execution-environment.

File isn't utf-8 encoded¶

Error: datarobot.errors.ClientError: 422 client error: {'message': 'File is not utf-8 encoded.'}

The installation/migration compares all the files in a template to see if they're changed before making a decision about whether the template needs to be updated. In cases where binary files are used, the comparison currently ignores these files for the sake of determining if the template needs to be updated.

If the binary file(s) are the only changes to the template and it's important to use the latest binary file(s), then you should use the --force command to cause all the templates to be updated.

docker run -i  datarobot/datarobot-custom-templates:<tag> \
  --admin-api-token <ADMIN_API_TOKEN> \
  --dr-host <DR_HOST> \
  --no-update-exec-envs \
  --use-prebuilt-images \
  --force

Unrecognized flag during migration¶

usage: migrate_templates.py [-h] [--admin-api-token ADMIN_API_TOKEN] [--dr-host DR_HOST] [--update-exec-envs | --no-update-exec-envs] [--template-types {all,applications,jobs,metrics,models}] [--delete-outdated-templates | --no-delete-outdated-templates] [--use-prebuilt-images | --no-use-prebuilt-images][--dry-run] [--force] [--sleep-seconds SLEEP_TIME] [--max-version-wait MAX_VERSION_WAIT] [--log-level LEVEL]

migrate_templates.py: error: unrecognized arguments: --use-generic-custom-templates-api

The --use-generic-custom-templates-api and --no-use-generic-custom-templates-api flags were used to control the API endpoint that was used during migration. These flags were added when the API was moving from only custom-metrics (using /api/v2/customMetricsTemplates/) to custom-metrics, custom-jobs, custom-applications, and custom-models (using the generic path /api/v2/customTemplates/). Those flags were deprecated in 11.1, and the container is always run with the equivalent of --use-generic-custom-templates-api.

References to the old flags should be removed.