Skip to content

Restore Mongo to externally managed DB

This operation should be executed from the macOS or GNU\Linux machine where previously taken MongoDB backup is located.

Please make sure that the application is scaled down before following this restore guide

Prerequisites:

  • Utility mongorestore of version 100.6.0 is installed on the host where backup is located
  • Utility kubectl of version 1.23 is installed on the host where backup is located
  • Utility kubectl is configured to access the Kubernetes cluster where DataRobot application is running, verify this with kubectl cluster-info command.

  • Please make sure the customer has allocated enough storage to perform this activity.

Internal note: Since we apply different pcs ha charts for external pcs where there will be no Mongo and PostgreSQL statefulsets, the k8s cluster provisioned will need to have enough space for restore while moving from internally hosted PCS to external PCS setup. Please see (external pcs)[installation/external-pcs.md] doc for more details

If customer has internal db size of 50GB, we would need atleast 110GB available size i.e 50% extra on the k8s cluster provisioned with external db store for a duration of time when we are migrating over, a temporary disk mount will also work.

First, export name of DataRobot application Kubernetes namespace in DR_CORE_NAMESPACE variable:

export DR_CORE_NAMESPACE=<namespace> 

There are two ways to get the password used for the restoration

  1. Obtain the MongoDB root user password through kubernetes secrets:
    export PCS_MONGO_PASSWD=$(kubectl -n $DR_CORE_NAMESPACE get secret pcs-mongo -o jsonpath="{.data.mongodb-root-password}" | base64 -d)
    echo ${PCS_MONGO_PASSWD} 
    

OR

  1. Login to mmapp pod, get the host and credential details to execute restore:
    kubectl exec -it <mmapp-pod-name> -n $DR_CORE_NAMESPACE /entrypoint -- bash
    bash-4.4$ env |grep MONGO
    MONGO_HOST=<MONGO_HOSTNAME>
    MONGO_PASSWORD=<MONGO_USER_PASSWORD>
    MONGO_USER=<MONGO_USERNAME> 
    

Uncompress backup files if it is in tar format

Define where the backups are stored on the host from where the backup will be restored:

export BACKUP_LOCATION=~/datarobot-backups/ 

Extract mongodb backup archive, this will create $BACKUP_LOCATION/mongodb directory:

cd $BACKUP_LOCATION
tar xf datarobot-mongo-backup-<date>.tar -C $BACKUP_LOCATION 

Copy backup to pod for restoration

Replace the variable below with the customer’s mount point that has sufficient space for the restore.

Define where the backups are stored on the host from where the backup will be restored:

export BACKUP_LOCATION=~/datarobot-backups/
export RESTORE_LOCATION=/<AVAILABLE_DIR>/datarobot-backups/mongo-backup/ 

Restore process will require you to copy the backup from the $BACKUP_LOCATION to the pod running

kubectl cp $BACKUP_LOCATION $DR_CORE_NAMESPACE/<pod-name>:$RESTORE_LOCATION 

Required step before restore

Login to mmapp pod where the backup is copied and remove the admin folder from the backup because external managed mongo does not support restoration of admin database

kubectl exec -it <pod-name> -n $DR_CORE_NAMESPACE /entrypoint -- bash
bash-4.4$ cd <RESTORE_LOCATION>
bash-4.4$ ls -ltrh
total 0
drwxr-xr-x    4 user  wheel   128B Sep 18 20:23 celery
drwxr-xr-x   34 user  wheel   1.1K Sep 18 20:24 datasets
drwxr-xr-x    6 user  wheel   192B Sep 18 20:24 admin
drwxr-xr-x    4 user  wheel   128B Sep 18 20:24 config
drwxr-xr-x    6 user  wheel   192B Sep 18 20:24 audit
drwxr-xr-x   26 user  wheel   832B Sep 18 20:24 application_builder
drwxr-xr-x    6 user  wheel   192B Sep 18 20:24 usersecrets
drwxr-xr-x  514 user  wheel    16K Sep 18 20:48 MMApp
bash-4.4$ mkdir /tmp/admin_backup_ignore
bash-4.4$ mv $RESTORE_LOCATION/admin/tmp/admin_backup_ignore/ 

Option 1: Copy backup to mmapp pod

Copy backup to mmapp pod for restoration as mentioned above

Replace the variable below with the customer’s mount point that has sufficient space for the restore.

Define where the backups are stored on the host from where the backup will be restored:

export BACKUP_LOCATION=~/datarobot-backups/
export RESTORE_LOCATION=/<AVAILABLE_DIR>/datarobot-backups/mongo-backup/ 

Restore process will require you to copy the untarred/unzipped backup from the $BACKUP_LOCATION to mmapp pod running

kubectl cp $BACKUP_LOCATION $DR_CORE_NAMESPACE/<mmapp-pod-name>:$RESTORE_LOCATION 

Restore the Mongo database from the $RESTORE_LOCATION inside the mmapp pod

kubectl exec -it <mmapp-pod-name> -n $DR_CORE_NAMESPACE /entrypoint -- bash
bash-4.4$ cd /opt/datarobot-libs/virtualenvs/datarobot-<release-version>/bin
base-4.4$ ./mongorestore --mongodb-uri "mongodb+srv://<MONGO_USERNAME>:<MONGO_USER_PASSWORD>@<MONGO_HOSTNAME>/" --numInsertionWorkersPerCollection=6 -j=6 <RESTORE_DIRECTORY> 

Scale the application back up again after restore

Option 2: Run restore from a temporary pod

Ensure that the pod being provisioned has a mount point with sufficient space available for the restore process.

Adjust the following according to cloud providers

Please replace spec:containers:name to be apt Please replace spec:containers:image value according to the cloud provider from the following:

AWS: amazon/aws-cli Azure: mcr.microsoft.com/azure-cli GCP: google/cloud-sdk

Minimal configuration for the restore pod, replace namespace value as needed

apiVersion: v1
kind: Pod
metadata:
  name: mongo-restore-minimal
  namespace: <NAMESPACE>
spec:
  containers:
  - name: mongo-restore
    image: amazon/aws-cli
    env:
    - name: MONGO_USER
      valueFrom:
        secretKeyRef:
          key: mongodb-root-username
          name: pcs-mongo-username
    - name: MONGO_PASSWORD
      valueFrom:
        secretKeyRef:
          key: mongodb-root-password
          name: pcs-mongo
    envFrom:
    - configMapRef:
        name: datarobot-modeling-envvars
    command:
      - tail
      - -f
      - /dev/null 

Apply the above in your namespace

kubectl apply -f pgrestore-minimal.yaml -n $DR_CORE_NAMESPACE 

Restore process will require you to copy the backup from the $BACKUP_LOCATION to below provisioned restore pod

kubectl get pods -A | grep minimal

kubectl cp $BACKUP_LOCATION $DR_CORE_NAMESPACE/<pod-name>:$RESTORE_LOCATION 

Perform restore from the above provisioned pod, make sure to install mongo shell and tools specific to the version, use the below link to find the right version and os. Follow the guide as indicated here, example for amazon mongo 5.0

(General guide for all os installs)[https://www.mongodb.com/docs/manual/administration/install-community/]

Example install of mongodb 5.0 community which will provide necessary tools for restore

kubectl exec -it <pod-name> -n $DR_CORE_NAMESPACE /entrypoint -- bash
bash-4.2$ vi /etc/yum.repos.d/mongodb-org-5.0.repo
bash-4.2$ yum install -y mongodb-org
bash-4.2$ mongorestore --mongodb-uri "mongodb+srv://<MONGO_USERNAME>:<MONGO_USER_PASSWORD>@<MONGO_HOSTNAME>/" --numInsertionWorkersPerCollection=6 -j=6 <RESTORE_DIRECTORY> 

Scale the application back up again after restore

Option 3: Add temporary IP address of your local system in Mongo Atlas

Ensure Mongo tools are available to perform mongorestore

  1. Requires admin access to Mongo Atlas UI
  2. Navigate to the project created for Datarobot Mongo
  3. Click on "Network Access"
  4. Click on "+Add IP ADDRESS"
  5. Enter IP where backup is located
  6. Click "Confirm" once added
  7. Wait for Atlas to plan and apply the access request ( takes 1-2 minutes )

You will be able to now access Atlas cluster directly from the IP

Run Mongo Restore, adjust -j ( parallel threads ) --numInsertionWorkersPerCollection ( number of parallel collections ) as needed

mongorestore --mongodb-uri "mongodb+srv://<MONGO_USERNAME>:<MONGO_USER_PASSWORD>@<MONGO_HOSTNAME>/" --numInsertionWorkersPerCollection=6 -j=6 <RESTORE_DIRECTORY> 

Scale the application back up again after restore