Backing up PostgreSQL¶
This operation can be executed from any macOS or GNU/Linux machine that has enough space to store the backup.
注意
If your DataRobot application is configured to use managed services (external PCS) for PostgreSQL, do not follow this guide. Instead, refer to the backup and restore documentation provided by your cloud provider, for example, the AWS documentation on backing up and restoring Amazon RDS for PostgreSQL.
前提条件¶
Ensure the following tools are installed on the host where the backup will be created:
-
pg_dump:
- DataRobot 11.0: Use version
12of pg_dump. - DataRobot 11.1 or newer: Use version
14of pg_dump. - See PostgreSQL Downloads.
- DataRobot 11.0: Use version
-
kubectl: Version
1.23or later.- See Kubernetes Tools Documentation.
kubectlmust be configured to access the Kubernetes cluster where the DataRobot application is running. Verify this configuration with thekubectl cluster-infocommand.
注意事項¶
As the database size increases, the execution time of pg_dump also increases. This can reach impractical durations in certain scenarios, potentially spanning days. For production environments or large databases, DataRobot strongly recommends using managed services (external PCS) with their native backup solutions.
Backup procedure for internal PostgreSQL¶
DataRobot recommends using managed services (external PCS) and scheduling backups simultaneously for managed PostgreSQL, Redis, and MongoDB.
If you are using internal PostgreSQL deployed via the pcs-ha charts, you can use the steps below to create a backup.
-
Set the
DR_CORE_NAMESPACEenvironment variable to your DataRobot application's Kubernetes namespace. Replace<your-datarobot-namespace>with the actual namespace.export DR_CORE_NAMESPACE=<your-datarobot-namespace> -
Define the backup location on the host where the backup files will be stored. This example uses
~/datarobot-backups/pgsql.export BACKUP_LOCATION=~/datarobot-backups/pgsql mkdir -p $BACKUP_LOCATION -
Define a local port for port-forwarding to the PostgreSQL service. This example uses port
54321.export LOCAL_PGSQL_PORT=54321 -
Obtain the PostgreSQL admin user password from the Kubernetes secret.
export PGPASSWORD=$(kubectl -n $DR_CORE_NAMESPACE get secret pcs-postgresql -o jsonpath='{.data.postgres-password}' | base64 -d) echo "PostgreSQL password retrieved." # Avoid echoing the actual password -
Forward the local port to the remote PostgreSQL service running in Kubernetes. This command runs in the background.
kubectl -n $DR_CORE_NAMESPACE port-forward svc/pcs-postgresql --address 127.0.0.1 $LOCAL_PGSQL_PORT:5432 &Wait a few seconds for the port-forwarding to establish.
-
List all databases to be backed up, excluding template and system databases. Create corresponding directories for each database backup.
dbs=$(psql -Upostgres -hlocalhost -p $LOCAL_PGSQL_PORT -t -c "SELECT datname FROM pg_database WHERE datistemplate = false AND datname NOT IN ('postgres', 'repmgr');" | tr -d '[:space:]' | sed 's/\r//g' | xargs) # Improved to handle multiple dbs correctly echo "Databases to backup: $dbs" cd ${BACKUP_LOCATION}/ for db_name in $dbs; do mkdir -p $db_name; done -
Backup each database individually using
pg_dumpin directory format.for db_name in $dbs; do echo "Backing up database: $db_name" pg_dump -Upostgres -hlocalhost -p$LOCAL_PGSQL_PORT -Fd -j4 "$db_name" -f "$BACKUP_LOCATION/$db_name" done -
Once the backup is complete, find the process ID (PID) of the
kubectl port-forwardcommand.ps aux | grep -E "port-forwar[d].*$LOCAL_PGSQL_PORT" -
Stop the port-forwarding process using its PID. Replace
<pid_of_the_kubectl_port-forward>with the actual PID found in the previous step.kill <pid_of_the_kubectl_port-forward>Confirm that the port-forwarding process has stopped.