Skip to content

Backing up MongoDB

This operation can be executed from any macOS or GNU/Linux machine that has enough space to store the backup.

注意

If your DataRobot application is configured to use a managed service (external PCS) like MongoDB Atlas, do not follow this guide. Instead, refer to the backup documentation provided by your service provider, for example, the MongoDB Atlas documentation on backup procedures.

前提条件

Ensure the following tools are installed on the host where the backup will be created:

  • mongodump: Version 100.6.0 or a compatible version.

  • kubectl: Version 1.23 or later.

    • See Kubernetes Tools Documentation.
    • kubectl must be configured to access the Kubernetes cluster where the DataRobot application is running. Verify this configuration with the kubectl cluster-info command.

注意事項

As the database size increases, the execution time of mongodump also increases. This can reach impractical durations in certain scenarios, potentially spanning days. For production environments or large databases, DataRobot strongly recommends using managed services (external PCS) with their native backup solutions.

Backup procedure for internal MongoDB

If you are using internal MongoDB deployed via the pcs-ha charts, use the steps below to create a backup.

  1. Set the DR_CORE_NAMESPACE environment variable to your DataRobot application's Kubernetes namespace. Replace <your-datarobot-namespace> with the actual namespace.

    export DR_CORE_NAMESPACE=<your-datarobot-namespace> 
    
  2. Define the backup location on the host where the backup files will be stored. This example uses ~/datarobot-backups/mongodb.

    export BACKUP_LOCATION=~/datarobot-backups/mongodb
    mkdir -p ${BACKUP_LOCATION} 
    
  3. Define a local port for port-forwarding to the MongoDB service. This example uses port 27018.

    export LOCAL_MONGO_PORT=27018 
    
  4. Obtain the MongoDB root user password from the Kubernetes secret.

    export PCS_MONGO_PASSWD=$(kubectl -n $DR_CORE_NAMESPACE get secret pcs-mongo -o jsonpath="{.data.mongodb-root-password}" | base64 -d)
    echo "MongoDB password retrieved." 
    
  5. Forward the local port to the remote MongoDB service running in Kubernetes. This command runs in the background.

    kubectl -n $DR_CORE_NAMESPACE port-forward svc/pcs-mongo-headless --address 127.0.0.1 $LOCAL_MONGO_PORT:27017 & 
    

    Wait a few seconds for the port-forwarding to establish.

  6. Backup the MongoDB database using mongodump.

    mongodump -vv -u pcs-mongodb -p "$PCS_MONGO_PASSWD" -h 127.0.0.1 --port $LOCAL_MONGO_PORT -o $BACKUP_LOCATION --authenticationDatabase admin 
    
  7. Once the backup is complete, find the process ID (PID) of the kubectl port-forward command.

    ps aux | grep -E "port-forwar[d].*$LOCAL_MONGO_PORT" 
    
  8. Stop the port-forwarding process using its PID. Replace <pid_of_the_kubectl_port-forward> with the actual PID found in the previous step.

    kill <pid_of_the_kubectl_port-forward> 
    

    Confirm that the port-forwarding process has stopped.

  9. Create a compressed tar archive of the backed-up database files and remove the original backup directory after archiving.

    cd $(dirname $BACKUP_LOCATION) # cd to parent of mongodb directory
    tar -cvzf datarobot-mongo-backup-$(date +%F).tar.gz -C $(dirname $BACKUP_LOCATION) $(basename $BACKUP_LOCATION) --remove-files
    echo "MongoDB backup archived to $(dirname $BACKUP_LOCATION)/datarobot-mongo-backup-$(date +%F).tar.gz"