Restore Mongo to externally managed DB¶
This operation should be executed from the macOS or GNU\Linux machine where previously taken MongoDB backup is located.
Please make sure that the application is scaled down before following this restore guide
Prerequisites:¶
- Utility mongorestore of version 100.6.0 is installed on the host where backup is located
- Utility kubectl of version 1.23 is installed on the host where backup is located
-
Utility
kubectlis configured to access the Kubernetes cluster where DataRobot application is running, verify this withkubectl cluster-infocommand. -
Please make sure the customer has allocated enough storage to perform this activity.
Internal note: Since we apply different pcs ha charts for external pcs where there will be no Mongo and PostgreSQL statefulsets, the k8s cluster provisioned will need to have enough space for restore while moving from internally hosted PCS to external PCS setup. Please see (external pcs)[installation/external-pcs.md] doc for more details
If customer has internal db size of 50GB, we would need atleast 110GB available size i.e 50% extra on the k8s cluster provisioned with external db store for a duration of time when we are migrating over, a temporary disk mount will also work.
First, export name of DataRobot application Kubernetes namespace in DR_CORE_NAMESPACE variable:
export DR_CORE_NAMESPACE=<namespace>
There are two ways to get the password used for the restoration¶
- Obtain the MongoDB root user password through kubernetes secrets:
export PCS_MONGO_PASSWD=$(kubectl -n $DR_CORE_NAMESPACE get secret pcs-mongo -o jsonpath="{.data.mongodb-root-password}" | base64 -d) echo ${PCS_MONGO_PASSWD}
OR
- Login to mmapp pod, get the host and credential details to execute restore:
kubectl exec -it <mmapp-pod-name> -n $DR_CORE_NAMESPACE /entrypoint -- bash bash-4.4$ env |grep MONGO MONGO_HOST=<MONGO_HOSTNAME> MONGO_PASSWORD=<MONGO_USER_PASSWORD> MONGO_USER=<MONGO_USERNAME>
Uncompress backup files if it is in tar format¶
Define where the backups are stored on the host from where the backup will be restored:
export BACKUP_LOCATION=~/datarobot-backups/
Extract mongodb backup archive, this will create $BACKUP_LOCATION/mongodb directory:
cd $BACKUP_LOCATION
tar xf datarobot-mongo-backup-<date>.tar -C $BACKUP_LOCATION
Copy backup to pod for restoration¶
Replace the
Define where the backups are stored on the host from where the backup will be restored:
export BACKUP_LOCATION=~/datarobot-backups/
export RESTORE_LOCATION=/<AVAILABLE_DIR>/datarobot-backups/mongo-backup/
Restore process will require you to copy the backup from the $BACKUP_LOCATION to the pod running
kubectl cp $BACKUP_LOCATION $DR_CORE_NAMESPACE/<pod-name>:$RESTORE_LOCATION
Required step before restore¶
Login to mmapp pod where the backup is copied and remove the admin folder from the backup because external managed mongo does not support restoration of admin database
kubectl exec -it <pod-name> -n $DR_CORE_NAMESPACE /entrypoint -- bash
bash-4.4$ cd <RESTORE_LOCATION>
bash-4.4$ ls -ltrh
total 0
drwxr-xr-x 4 user wheel 128B Sep 18 20:23 celery
drwxr-xr-x 34 user wheel 1.1K Sep 18 20:24 datasets
drwxr-xr-x 6 user wheel 192B Sep 18 20:24 admin
drwxr-xr-x 4 user wheel 128B Sep 18 20:24 config
drwxr-xr-x 6 user wheel 192B Sep 18 20:24 audit
drwxr-xr-x 26 user wheel 832B Sep 18 20:24 application_builder
drwxr-xr-x 6 user wheel 192B Sep 18 20:24 usersecrets
drwxr-xr-x 514 user wheel 16K Sep 18 20:48 MMApp
bash-4.4$ mkdir /tmp/admin_backup_ignore
bash-4.4$ mv $RESTORE_LOCATION/admin/tmp/admin_backup_ignore/
Option 1: Copy backup to mmapp pod¶
This is recommended if the database size is very small ( smaller than 30GB )¶
Copy backup to mmapp pod for restoration as mentioned above
Replace the
Define where the backups are stored on the host from where the backup will be restored:
export BACKUP_LOCATION=~/datarobot-backups/
export RESTORE_LOCATION=/<AVAILABLE_DIR>/datarobot-backups/mongo-backup/
Restore process will require you to copy the untarred/unzipped backup from the $BACKUP_LOCATION to mmapp pod running
kubectl cp $BACKUP_LOCATION $DR_CORE_NAMESPACE/<mmapp-pod-name>:$RESTORE_LOCATION
Restore the Mongo database from the $RESTORE_LOCATION inside the mmapp pod
kubectl exec -it <mmapp-pod-name> -n $DR_CORE_NAMESPACE /entrypoint -- bash
bash-4.4$ cd /opt/datarobot-libs/virtualenvs/datarobot-<release-version>/bin
base-4.4$ ./mongorestore --mongodb-uri "mongodb+srv://<MONGO_USERNAME>:<MONGO_USER_PASSWORD>@<MONGO_HOSTNAME>/" --numInsertionWorkersPerCollection=6 -j=6 <RESTORE_DIRECTORY>
Scale the application back up again after restore
Option 2: Run restore from a temporary pod¶
This is recommended if the database size is larger than 50GB¶
Ensure that the pod being provisioned has a mount point with sufficient space available for the restore process.
Adjust the following according to cloud providers
Please replace spec:containers:name to be apt Please replace spec:containers:image value according to the cloud provider from the following:
AWS: amazon/aws-cli Azure: mcr.microsoft.com/azure-cli GCP: google/cloud-sdk
Minimal configuration for the restore pod, replace namespace value as needed
apiVersion: v1
kind: Pod
metadata:
name: mongo-restore-minimal
namespace: <NAMESPACE>
spec:
containers:
- name: mongo-restore
image: amazon/aws-cli
env:
- name: MONGO_USER
valueFrom:
secretKeyRef:
key: mongodb-root-username
name: pcs-mongo-username
- name: MONGO_PASSWORD
valueFrom:
secretKeyRef:
key: mongodb-root-password
name: pcs-mongo
envFrom:
- configMapRef:
name: datarobot-modeling-envvars
command:
- tail
- -f
- /dev/null
Apply the above in your namespace
kubectl apply -f pgrestore-minimal.yaml -n $DR_CORE_NAMESPACE
Restore process will require you to copy the backup from the $BACKUP_LOCATION to below provisioned restore pod
kubectl get pods -A | grep minimal
kubectl cp $BACKUP_LOCATION $DR_CORE_NAMESPACE/<pod-name>:$RESTORE_LOCATION
Perform restore from the above provisioned pod, make sure to install mongo shell and tools specific to the version, use the below link to find the right version and os. Follow the guide as indicated here, example for amazon mongo 5.0
(General guide for all os installs)[https://www.mongodb.com/docs/manual/administration/install-community/]
Example install of mongodb 5.0 community which will provide necessary tools for restore
kubectl exec -it <pod-name> -n $DR_CORE_NAMESPACE /entrypoint -- bash
bash-4.2$ vi /etc/yum.repos.d/mongodb-org-5.0.repo
bash-4.2$ yum install -y mongodb-org
bash-4.2$ mongorestore --mongodb-uri "mongodb+srv://<MONGO_USERNAME>:<MONGO_USER_PASSWORD>@<MONGO_HOSTNAME>/" --numInsertionWorkersPerCollection=6 -j=6 <RESTORE_DIRECTORY>
Scale the application back up again after restore
Option 3: Add temporary IP address of your local system in Mongo Atlas¶
Ensure Mongo tools are available to perform mongorestore
- Requires admin access to Mongo Atlas UI
- Navigate to the project created for Datarobot Mongo
- Click on "Network Access"
- Click on "+Add IP ADDRESS"
- Enter IP where backup is located
- Click "Confirm" once added
- Wait for Atlas to plan and apply the access request ( takes 1-2 minutes )
You will be able to now access Atlas cluster directly from the IP
Run Mongo Restore, adjust -j ( parallel threads ) --numInsertionWorkersPerCollection ( number of parallel collections ) as needed
mongorestore --mongodb-uri "mongodb+srv://<MONGO_USERNAME>:<MONGO_USER_PASSWORD>@<MONGO_HOSTNAME>/" --numInsertionWorkersPerCollection=6 -j=6 <RESTORE_DIRECTORY>
Scale the application back up again after restore