Skip to content

Restore MongoDB to an externally managed database

Execute this operation from the macOS or GNU/Linux machine where MongoDB backup is located.

重要

Make sure that the application is scaled down before proceeding.

ストレージ

Be sure you have allocated enough storage on the target externally-managed MongoDB instance and, if applicable, on any temporary pods or locations used during the restore process.

For context, if migrating an internal MongoDB database of 50GB, you might temporarily need at least 110GB of available space on the Kubernetes cluster (or a temporary mount) if staging data there before pushing to the external DB. This is to accommodate the uncompressed backup and any operational overhead.

Obtain MongoDB credentials

備考

You must fulfill the prerequisites before proceeding.

You need the MongoDB credentials for the restore. There are two ways to obtain them:

Option A: From Kubernetes secrets (if restoring from an internal PCS backup)

export PCS_MONGO_PASSWD=$(kubectl -n $NAMESPACE get secret pcs-mongo -o jsonpath="{.data.mongodb-root-password}" | base64 -d)
echo "MongoDB password for internal PCS retrieved." 

備考

These credentials are for the source internal MongoDB. You will use the credentials for your target externally managed MongoDB in the mongorestore command's URI.

Option B: From a running mmapp-pod (if applicable for target credentials)

Replace <mmapp-pod-name> with the name of one of your mmapp-app pods.

kubectl exec -it <mmapp-pod-name> -n $NAMESPACE -- /entrypoint.sh bash 

Inside the pod, run:

env | grep MONGO 

Example output:

MONGO_HOST=<MONGO_HOSTNAME>
MONGO_PASSWORD=<MONGO_USER_PASSWORD>
MONGO_USER=<MONGO_USERNAME> 

Make a note of MONGO_HOST, MONGO_USER, and MONGO_PASSWORD for your target externally managed MongoDB.

Uncompress backup files

If your backup is a .tar archive:

Extract the MongoDB backup archive. This command typically creates a mongodb directory (or similar, depending on how the backup was structured) within your $BACKUP_LOCATION.

cd $BACKUP_LOCATION
tar xf datarobot-mongo-backup-BACKUP_DATE.tar -C $BACKUP_LOCATION 

備考

Replace BACKUP_DATE with the actual date or identifier from your backup filename.

Your uncompressed backup data (e.g., BSON files) should now be in a subdirectory, for example $BACKUP_LOCATION/mongodb/. Make this your <PATH_TO_UNCOMPRESSED_BACKUP_DATA>.

Prepare backup data (remove admin database)

The admin database from an internal MongoDB backup should not be restored to an externally managed MongoDB service (like Atlas), as these services manage their own admin databases.

If you plan to copy the backup data to a pod for the restore (as in Option 1 or 2 below), perform this step after copying the data to the pod, or ensure the admin directory is excluded from the copy. If restoring directly (Option 3), perform this on your local uncompressed backup data.

This example assumes the uncompressed backup is in <PATH_TO_UNCOMPRESSED_BACKUP_DATA>, which contains subdirectories for each database.

# Example: If data is local at $BACKUP_LOCATION/mongodb/
ls -ltrh $BACKUP_LOCATION/mongodb/
# Output will show directories like 'admin', 'celery', 'datasets', etc.

# Move the admin backup directory aside
mkdir -p /tmp/admin_backup_ignore
mv $BACKUP_LOCATION/mongodb/admin /tmp/admin_backup_ignore/ 

Ensure the admin directory is no longer present in the main backup data path that mongorestore will use.

Option 1: Restore by copying backup to an mmapp pod

This option is recommended if the database size is relatively small (e.g., less than 30GB), as it uses an existing application pod.

  1. Define the source and destination paths. Replace <AVAILABLE_DIR_ON_POD> with a path on the mmapp pod that has sufficient space for the uncompressed backup data.

    export LOCAL_BACKUP_PATH=$BACKUP_LOCATION/mongodb # Path to local uncompressed data (excluding admin dir)
    export POD_RESTORE_LOCATION=/<AVAILABLE_DIR_ON_POD>/mongo-backup-data/ 
    
  2. Copy the uncompressed backup data (ensure the admin database directory has been removed from $LOCAL_BACKUP_PATH) to a running mmapp-app pod. Replace <mmapp-pod-name> with an actual pod name.

    kubectl -n ${NAMESPACE} cp $LOCAL_BACKUP_PATH $NAMESPACE/<mmapp-pod-name>:$POD_RESTORE_LOCATION 
    
  3. Execute mongorestore from within the mmapp-app pod.

    Replace <mmapp-pod-name>, <release-version> (e.g., 10.1.2), <MONGO_USERNAME>, <MONGO_USER_PASSWORD>, and <MONGO_HOSTNAME> (for your external MongoDB) with your actual values. The <MONGO_HOSTNAME> should be the connection string hostname for your Atlas cluster or other external service.

    kubectl exec -it <mmapp-pod-name> -n $NAMESPACE -- /entrypoint.sh bash
    # Inside the mmapp pod:
    # cd /opt/datarobot-libs/virtualenvs/datarobot-<release-version>/bin
    # ./mongorestore --uri "mongodb+srv://<MONGO_USERNAME>:<MONGO_USER_PASSWORD>@<MONGO_HOSTNAME>/" \
    #   --numInsertionWorkersPerCollection=6 -j=6 --drop <POD_RESTORE_LOCATION_PATH_INSIDE_POD> 
    

    Replace <POD_RESTORE_LOCATION_PATH_INSIDE_POD> with the actual path, e.g., /<AVAILABLE_DIR_ON_POD>/mongo-backup-data/mongodb if kubectl cp created a nested mongodb directory. The --drop option drops collections from the target database before restoring.

  4. Once the restore is complete, scale your DataRobot application deployments back up.

Option 2: Restore using a temporary Kubernetes pod

This option is recommended if the database size is larger (e.g., greater than 50GB), as it uses a dedicated pod for the restore operation.

  1. Ensure the temporary pod you provision will have a mount point with sufficient space for the uncompressed backup data.
  2. Create a pod definition YAML file (e.g., mongo-restore-pod.yaml). Adjust the image based on your cloud provider or preference. The image must have mongorestore tools or allow you to install them.

    apiVersion: v1
    kind: Pod
    metadata:
      name: mongo-restore-temp-pod
    spec:
      containers:
      - name: mongo-restore-container
        image: mongo:5.0 # Or another image with MongoDB tools, or a base image where you can install them
        env: # Pass credentials for your external MongoDB
        - name: MONGO_USER
          value: "<YOUR_EXTERNAL_MONGO_USERNAME>"
        - name: MONGO_PASSWORD
          value: "<YOUR_EXTERNAL_MONGO_PASSWORD>"
        - name: MONGO_HOSTNAME # e.g., your Atlas cluster hostname
          value: "<YOUR_EXTERNAL_MONGO_HOSTNAME>"
        # If using secrets for credentials:
        # valueFrom:
        #   secretKeyRef:
        #     key: mongodb-username 
        #     name: external-mongo-creds
        command:
          - tail
          - -f
          - /dev/null
        # volumeMounts: # If you need to mount persistent storage for the backup data
        # - name: backup-data-volume
        #   mountPath: /restore-data
      # volumes: # If using a persistent volume for backup data
      # - name: backup-data-volume
      #   persistentVolumeClaim:
      #     claimName: my-pvc-for-restore-data 
    
  3. Apply the pod definition to your namespace:

    kubectl -n $NAMESPACE apply -f mongo-restore-pod.yaml 
    
  4. Copy the uncompressed backup data (ensure the admin database directory has been removed) to the temporary pod. Replace $LOCAL_BACKUP_PATH with the path to your local uncompressed data, <temp-pod-name> (e.g., mongo-restore-temp-pod), and $POD_RESTORE_LOCATION with a path inside the pod (e.g., /restore-data/mongodb).

    # Wait for the pod to be running: kubectl get pods -n ${NAMESPACE} | grep mongo-restore-temp-pod
    export LOCAL_BACKUP_PATH=$BACKUP_LOCATION/mongodb 
    export POD_RESTORE_LOCATION=/restore-data/mongodb 
    kubectl cp $LOCAL_BACKUP_PATH ${NAMESPACE}/<temp-pod-name>:$POD_RESTORE_LOCATION 
    
  5. Perform the restore from within the temporary pod.

    kubectl exec -it <temp-pod-name> -n ${NAMESPACE} -- bash
    # Inside the temporary pod:
    # If MongoDB tools are not pre-installed on the image, install them:
    # Example for Debian/Ubuntu based image (like mongo:5.0):
    # apt-get update && apt-get install -y mongodb-database-tools
    # Example for RHEL/CentOS based image (if you used one and yum is available):
    # # Create /etc/yum.repos.d/mongodb-org-5.0.repo (see MongoDB docs for content)
    # # yum install -y mongodb-org-tools
    #
    # Now run mongorestore using the environment variables for credentials and hostname:
    # mongorestore --uri "mongodb+srv://${MONGO_USER}:${MONGO_PASSWORD}@${MONGO_HOSTNAME}/" \
    #   --numInsertionWorkersPerCollection=6 -j=6 --drop ${POD_RESTORE_LOCATION_PATH_INSIDE_POD} 
    

    (Replace ${POD_RESTORE_LOCATION_PATH_INSIDE_POD} with the actual path, e.g., /restore-data/mongodb). The --drop option drops collections.

  6. Once the restore is complete, delete the temporary pod:

    kubectl -n $NAMESPACE delete pod <temp-pod-name> 
    
  7. Scale your DataRobot application deployments back up.

Option 3: Restore directly from local system to MongoDB Atlas (via temporary IP allowlist)

This option involves directly connecting from your local machine, where the backup and mongorestore utility are, to MongoDB Atlas. It uses a temporary IP allowlist.

重要

After the restore is complete, remove the temporary IP address from the Atlas IP Access List for security.

  1. Ensure the following:
    • MongoDB tools (specifically mongorestore) are available on your local system.
    • You have admin access to the MongoDB Atlas UI for your project.
  2. Navigate to the project created for DataRobot MongoDB in the Atlas UI.
  3. Click on Network Access.
  4. Click + ADD IP ADDRESS.
  5. Enter the public IP address of the machine where your backup is located and from which you will run mongorestore. You can choose "Allow Access From Current IP" if applicable.
  6. Click Confirm. Wait for Atlas to apply the access rule (this typically takes a few minutes).
  7. Run mongorestore from your local machine. The --drop option drops collections from the target database before restoring.

    mongorestore --uri "mongodb+srv://<MONGO_USERNAME>:<MONGO_USER_PASSWORD>@<MONGO_HOSTNAME>/" \
      --numInsertionWorkersPerCollection=6 -j=6 --drop <PATH_TO_UNCOMPRESSED_BACKUP_DATA> 
    

    備考

    • Replace <MONGO_USERNAME>, <MONGO_USER_PASSWORD>, <MONGO_HOSTNAME> with your Atlas cluster URI part
    • Replace <PATH_TO_UNCOMPRESSED_BACKUP_DATA> with the local path to uncompressed backup, excluding admin directory.
  8. Scale your DataRobot application deployments back up.