Skip to content

Restoring MongoDB to an externally managed database

This operation should be executed from the macOS or GNU/Linux machine where the previously taken MongoDB backup is located.

Important: Ensure the DataRobot application is scaled down (e.g., kubectl scale deployment --all --replicas=0 -n <your-datarobot-namespace>) before following this restore guide.

Prerequisites

Additionally, check that the following tools are installed on the host where the backup is located and the restore will be performed:

Tool Version Purpose/Notes Installation Link
mongorestore 100.6.0 or a compatible version Used for MongoDB database restore operations. MongoDB Database Tools Documentation
kubectl 1.23 or later Must be configured to access the Kubernetes cluster where the DataRobot application is running. Verify this configuration with the kubectl cluster-info command. Kubernetes Tools Documentation

Storage

Be sure you have allocated enough storage on the target externally managed MongoDB instance and, if applicable, on any temporary pods or locations used during the restore process.

For context, if migrating an internal MongoDB database of 50GB, you might temporarily need at least 110GB of available space on the Kubernetes cluster (or a temporary mount) if staging data there before pushing to the external DB. This is to accommodate the uncompressed backup and any operational overhead.

* For context, if migrating an internal MongoDB database of 50GB, you might temporarily need at least 110GB of available space on the Kubernetes cluster (or a temporary mount) if staging data there before pushing to the external DB. This is to accommodate the uncompressed backup and any operational overhead.

General preparation steps

  1. Set the DR_CORE_NAMESPACE environment variable to your DataRobot application's Kubernetes namespace. Replace <your-datarobot-namespace> with the actual namespace.

    export DR_CORE_NAMESPACE=<your-datarobot-namespace>
    

Obtain MongoDB credentials

You need the MongoDB credentials for the restore. There are two ways to obtain them:

Option A: From Kubernetes secrets (if restoring from an internal PCS backup)

export PCS_MONGO_PASSWD=$(kubectl -n $DR_CORE_NAMESPACE get secret pcs-mongo -o jsonpath="{.data.mongodb-root-password}" | base64 -d)
echo "MongoDB password for internal PCS retrieved."

Note

These credentials are for the source internal MongoDB. You will use the credentials for your target externally managed MongoDB in the mongorestore command's URI.

Option B: From a running mmapp-pod (if applicable for target credentials)

Replace <mmapp-pod-name> with the name of one of your mmapp-app pods.

kubectl exec -it <mmapp-pod-name> -n $DR_CORE_NAMESPACE -- /entrypoint.sh bash

Inside the pod, run:

env | grep MONGO

Example output:

MONGO_HOST=<MONGO_HOSTNAME>
MONGO_PASSWORD=<MONGO_USER_PASSWORD>
MONGO_USER=<MONGO_USERNAME>

Make a note of MONGO_HOST, MONGO_USER, and MONGO_PASSWORD for your target externally managed MongoDB.

Uncompress backup files

If your backup is a .tar archive:

  1. Define the location on the host where your MongoDB backup files are stored. This example assumes ~/datarobot-backups/.

    export BACKUP_LOCATION=~/datarobot-backups/
    
  2. Extract the MongoDB backup archive. This command typically creates a mongodb directory (or similar, depending on how the backup was structured) within your $BACKUP_LOCATION. Replace <date> with the actual date or identifier from your backup filename.

    cd $BACKUP_LOCATION
    tar xf datarobot-mongo-backup-<date>.tar -C $BACKUP_LOCATION
    

    Your uncompressed backup data (e.g., BSON files) should now be in a subdirectory like $BACKUP_LOCATION/mongodb/. Let this be your <PATH_TO_UNCOMPRESSED_BACKUP_DATA>.

Prepare backup data (remove admin database)

The admin database from an internal MongoDB backup should not be restored to an externally managed MongoDB service (like Atlas), as these services manage their own admin databases.

  1. If you plan to copy the backup data to a pod for the restore (as in Option 1 or 2 below), perform this step after copying the data to the pod, or ensure the admin directory is excluded from the copy. If restoring directly (Option 3), perform this on your local uncompressed backup data.

    This example assumes the uncompressed backup is in <PATH_TO_UNCOMPRESSED_BACKUP_DATA> which contains subdirectories for each database.

    # Example: If data is local at $BACKUP_LOCATION/mongodb/
    ls -ltrh $BACKUP_LOCATION/mongodb/
    # Output will show directories like 'admin', 'celery', 'datasets', etc.
    
    # Move the admin backup directory aside
    mkdir -p /tmp/admin_backup_ignore
    mv $BACKUP_LOCATION/mongodb/admin /tmp/admin_backup_ignore/
    

    Ensure the admin directory is no longer present in the main backup data path that mongorestore will use.

Option 1: Restore by copying backup to an mmapp pod

This option is recommended if the database size is relatively small (e.g., less than 30GB), as it uses an existing application pod.

  1. Define the source and destination paths. Replace <AVAILABLE_DIR_ON_POD> with a path on the mmapp pod that has sufficient space for the uncompressed backup data.

    export LOCAL_BACKUP_PATH=$BACKUP_LOCATION/mongodb # Path to local uncompressed data (excluding admin dir)
    export POD_RESTORE_LOCATION=/<AVAILABLE_DIR_ON_POD>/mongo-backup-data/
    
  2. Copy the uncompressed backup data (ensure the admin database directory has been removed from $LOCAL_BACKUP_PATH) to a running mmapp-app pod. Replace <mmapp-pod-name> with an actual pod name.

    kubectl cp $LOCAL_BACKUP_PATH $DR_CORE_NAMESPACE/<mmapp-pod-name>:$POD_RESTORE_LOCATION
    
  3. Execute mongorestore from within the mmapp-app pod.

    Replace <mmapp-pod-name>, <release-version> (e.g., 10.1.2), <MONGO_USERNAME>, <MONGO_USER_PASSWORD>, and <MONGO_HOSTNAME> (for your external MongoDB) with your actual values. The <MONGO_HOSTNAME> should be the connection string hostname for your Atlas cluster or other external service.

    kubectl exec -it <mmapp-pod-name> -n $DR_CORE_NAMESPACE -- /entrypoint.sh bash
    # Inside the mmapp pod:
    # cd /opt/datarobot-libs/virtualenvs/datarobot-<release-version>/bin
    # ./mongorestore --uri "mongodb+srv://<MONGO_USERNAME>:<MONGO_USER_PASSWORD>@<MONGO_HOSTNAME>/" \
    #   --numInsertionWorkersPerCollection=6 -j=6 --drop <POD_RESTORE_LOCATION_PATH_INSIDE_POD>
    

    (Replace <POD_RESTORE_LOCATION_PATH_INSIDE_POD> with the actual path, e.g., /<AVAILABLE_DIR_ON_POD>/mongo-backup-data/mongodb if kubectl cp created a nested mongodb directory). The --drop option drops collections from the target database before restoring.

  4. Once the restore is complete, scale your DataRobot application deployments back up.

Option 2: Restore using a temporary Kubernetes pod

This option is recommended if the database size is larger (e.g., greater than 50GB), as it uses a dedicated pod for the restore operation.

  1. Ensure the temporary pod you provision will have a mount point with sufficient space for the uncompressed backup data.
  2. Create a pod definition YAML file (e.g., mongo-restore-pod.yaml). Adjust the image based on your cloud provider or preference. The image must have mongorestore tools or allow you to install them. Replace <NAMESPACE> with your DataRobot namespace.

    apiVersion: v1
    kind: Pod
    metadata:
      name: mongo-restore-temp-pod
      namespace: <NAMESPACE>
    spec:
      containers:
      - name: mongo-restore-container
        image: mongo:5.0 # Or another image with MongoDB tools, or a base image where you can install them
        env: # Pass credentials for your external MongoDB
        - name: MONGO_USER
          value: "<YOUR_EXTERNAL_MONGO_USERNAME>"
        - name: MONGO_PASSWORD
          value: "<YOUR_EXTERNAL_MONGO_PASSWORD>"
        - name: MONGO_HOSTNAME # e.g., your Atlas cluster hostname
          value: "<YOUR_EXTERNAL_MONGO_HOSTNAME>"
        # If using secrets for credentials:
        # valueFrom:
        #   secretKeyRef:
        #     key: mongodb-username 
        #     name: external-mongo-creds
        command:
          - tail
          - -f
          - /dev/null
        # volumeMounts: # If you need to mount persistent storage for the backup data
        # - name: backup-data-volume
        #   mountPath: /restore-data
      # volumes: # If using a persistent volume for backup data
      # - name: backup-data-volume
      #   persistentVolumeClaim:
      #     claimName: my-pvc-for-restore-data
    
  3. Apply the pod definition to your namespace:

    kubectl apply -f mongo-restore-pod.yaml -n <NAMESPACE>
    
  4. Copy the uncompressed backup data (ensure the admin database directory has been removed) to the temporary pod. Replace $LOCAL_BACKUP_PATH with the path to your local uncompressed data, <temp-pod-name> (e.g., mongo-restore-temp-pod), and $POD_RESTORE_LOCATION with a path inside the pod (e.g., /restore-data/mongodb).

    # Wait for the pod to be running: kubectl get pods -n <NAMESPACE> | grep mongo-restore-temp-pod
    export LOCAL_BACKUP_PATH=$BACKUP_LOCATION/mongodb 
    export POD_RESTORE_LOCATION=/restore-data/mongodb 
    kubectl cp $LOCAL_BACKUP_PATH <NAMESPACE>/<temp-pod-name>:$POD_RESTORE_LOCATION
    
  5. Perform the restore from within the temporary pod.

    kubectl exec -it <temp-pod-name> -n <NAMESPACE> -- bash
    # Inside the temporary pod:
    # If MongoDB tools are not pre-installed on the image, install them:
    # Example for Debian/Ubuntu based image (like mongo:5.0):
    # apt-get update && apt-get install -y mongodb-database-tools
    # Example for RHEL/CentOS based image (if you used one and yum is available):
    # # Create /etc/yum.repos.d/mongodb-org-5.0.repo (see MongoDB docs for content)
    # # yum install -y mongodb-org-tools
    #
    # Now run mongorestore using the environment variables for credentials and hostname:
    # mongorestore --uri "mongodb+srv://${MONGO_USER}:${MONGO_PASSWORD}@${MONGO_HOSTNAME}/" \
    #   --numInsertionWorkersPerCollection=6 -j=6 --drop ${POD_RESTORE_LOCATION_PATH_INSIDE_POD}
    
    (Replace ${POD_RESTORE_LOCATION_PATH_INSIDE_POD} with the actual path, e.g., /restore-data/mongodb). The --drop option drops collections.

  6. Once the restore is complete, delete the temporary pod:

    kubectl delete pod <temp-pod-name> -n <NAMESPACE>
    
  7. Scale your DataRobot application deployments back up.

Option 3: Restore directly from local system to MongoDB Atlas (via temporary IP allowlist)

This option involves directly connecting from your local machine, where the backup and mongorestore utility are, to MongoDB Atlas. It uses a temporary IP allowlist.

Important

After the restore is complete, remove the temporary IP address from the Atlas IP Access List for security.

  1. Ensure the following:
    • MongoDB tools (specifically mongorestore) are available on your local system.
    • You have admin access to the MongoDB Atlas UI for your project.
  2. Navigate to the project created for DataRobot MongoDB in the Atlas UI.
  3. Click on Network Access.
  4. Click + ADD IP ADDRESS.
  5. Enter the public IP address of the machine where your backup is located and from which you will run mongorestore. You can choose "Allow Access From Current IP" if applicable.
  6. Click Confirm. Wait for Atlas to apply the access rule (this typically takes 1-2 minutes).
  7. Run mongorestore from your local machine. The --drop option drops collections from the target database before restoring. Replace:

    • <MONGO_USERNAME>, <MONGO_USER_PASSWORD>, <MONGO_HOSTNAME> (your Atlas cluster URI part),
    • <PATH_TO_UNCOMPRESSED_BACKUP_DATA> with the local path to uncompressed backup, excluding admin directory.
    mongorestore --uri "mongodb+srv://<MONGO_USERNAME>:<MONGO_USER_PASSWORD>@<MONGO_HOSTNAME>/" \
      --numInsertionWorkersPerCollection=6 -j=6 --drop <PATH_TO_UNCOMPRESSED_BACKUP_DATA>
    
  8. Scale your DataRobot application deployments back up.