Skip to content

Long Running Services (LRS) restore considerations

DataRobot provides a Kubernetes operator known as the Long Running Services (LRS) Operator to fulfill dynamic compute requests for DataRobot Feature Services. These workloads are intended to run for long periods, or indefinitely, after deployment.

DataRobot identifies the LRS workload type using a label on the resource: datarobot-lrs-type=<workload-type>.

Interactive Spark sessions

No restoration process is necessary for Interactive Spark sessions. Once the core DataRobot application is back online, Wrangler will automatically initiate a new LRS session, provided any stale sessions were removed during the backup procedure.

Custom Applications

To restore Custom Applications from Application Sources:

  1. Go to Registry > Applications > Application sources tab
  2. Select the application source from which you want to build the application
  3. Click Build application

To restore Custom Applications from Docker Image:

  1. Go to Registry > Applications page
  2. Click the dropdown next to Add new application source
  3. Select Upload application
  4. Upload the Application Docker Image
  5. Click Create application

To restore Custom Applications from Application Templates:

  1. Go to Registry > Applications page
  2. Click the dropdown next to Add new application source
  3. Select Create new application from template

Alternatively, on the DataRobot homepage, click Explore application templates, either Open in a codespace or Copy repository URL. Follow the README instructions to create the Application.

Note

Custom Applications are paused after a period of inactivity. To resume a paused custom application, click Open. A loading screen appears while the Application restarts. The Application data is persisted after the Application restarts.

Custom (Training) Task

Custom Tasks are part of the Custom Models restore. They require no extra actions on their own. Restore only affects deployments with a Custom Task. All other actions (training, scoring, insights, etc.) use ephemeral LRSs that are created and destroyed as needed.

Custom Models

Restore procedure for Custom Model LRSes might be needed after app upgrades or cluster migrations if something went wrong. There are two possible ways of restoring Custom Models: using LRS YAML files that were created according to the backup instructions, or reactivating custom models from the UI.

Recreating LRSes from YAML LRS definitions

Assuming you have already backed up custom model LRSes into the folder lrs_backup_2025_01_01_12_00 using the backup instructions, if for some reason you need to restore an LRS resource with a name lrs-xxxxx directly in the k8s cluster, you can run the following command:

kubectl apply -f lrs_backup_2025_01_01_12_00/lrs-xxxxx.yaml

This recreates the LRS if it doesn't exist or rolls it back to the backed up state if there were any changes done to its definition.

Note

The prerequisite for this command to run is a configured kubectl config in your shell.

Reactivating Custom Models from the UI

Another way is deactivating/reactivating Custom Models. It's a more convenient way because it uses the DataRobot API to start Custom Models which includes all business rules and code needed for that. This is safer than just recreating from LRS YAML definitions.

Custom Models can be reactivated from the UI manually. This way you would start only deployments you need. If you want to reactivate all deployments you have, the following script can be used:

kubectl -n $DR_CORE_NAMESPACE exec deploy/mmapp-app -it -- /entrypoint python3 tools/custom_model/deployments_tool.py --api-url <DATAROBOT_URL> --api-token <DATAROBOT_API_TOKEN> --action activate

Where:

  • DR_CORE_NAMESPACE - the namespace where DataRobot is installed
  • DATAROBOT_URL - the DataRobot app URL (e.g., https://app.datarobot.com)
  • DATAROBOT_API_TOKEN - an API token for the user with the MLOps admin role and access to all target deployments. The token can be retrieved in the API keys and tools section in Settings on the UI.

This script will:

  • Take all deployments visible to the owner of the DATAROBOT_API_TOKEN (please ensure that the user has MLOps admin role & access to all target deployments)
  • Send an activation request per each deployment
  • By default it won't wait for activation process to finish but it can be visible on the UI (Deployments section in the Console tab)

Downsides

  • To use it, you should have a user with MLOps admin access to all deployments in the environment
  • Custom metrics and monitoring jobs that are scheduled on the deployments will fail during downtime. Although, they will catch up after deployment reactivation
  • Service stats will be cleared after a deployment deactivation (they will be recalculated during reactivation)

Note

The tool is available in 11.1.1. If you use earlier versions of DataRobot and want to use the script, contact support to get the script.