Skip to content

Backup PostgreSQL

This operation can be executed from any macOS or GNU\Linux machine that has enough space to store the backup.

注意 If DataRobot application is configured to use managed services (external PCS), then instead of this guide, please refer yourself to Backing up and restoring guide for Amazon RDS for PostgreSQL.

前提条件

  • Utility pg_dump:
  • DataRobot 11.0: Please use version 12 of pg_dump on the host where the backup will be created.
  • DataRobot 11.1 or newer: Please use version 14 of pg_dump on the host where the backup will be created.
  • Utility kubectl of version 1.23 is installed on the host where the backup will be created.
  • Utility kubectl is configured to access the Kubernetes cluster where DataRobot application is running, verify this with kubectl cluster-info command.

注意事項

As the database size increases, the execution time of pg_dump also increases. This can reach impractical durations in certain scenarios, potentially spanning days. We recommend using managed services (external PCS).

Create backup

We recommend using managed services (external PCS) and scheduling backups simultaneously for managed Postgres, Redis, and Mongo.

If you are using pcs-ha charts, you can use the script below to create a backup. Export name of DataRobot application Kubernetes namespace in DR_CORE_NAMESPACE variable:

export DR_CORE_NAMESPACE=<namespace> 

Define where the backups will be stored on the host where backup will be created. I use ~/datarobot-backups/, but feel free to choose a different one:

export BACKUP_LOCATION=~/datarobot-backups/pgsql
mkdir -p $BACKUP_LOCATION 

Backup process will require you to forward local port to remote PostgreSQL service, please define which local port you will use. In the following example I use port 54321, but feel free to use another:

export LOCAL_PGSQL_PORT=54321 

Obtain the PostgreSQL admin user password:

export PGPASSWORD=$(kubectl -n $DR_CORE_NAMESPACE get secret pcs-postgresql -o jsonpath='{.data.postgres-password}' | base64 -d)
echo $PGPASSWORD 

Forward local port to remote PostgreSQL service deployed in the Kubernetes:

kubectl -n $DR_CORE_NAMESPACE port-forward svc/pcs-postgresql --address 127.0.0.1 $LOCAL_PGSQL_PORT:5432 & 

List databases for backup

dbs=$(psql -Upostgres -hlocalhost -p $LOCAL_PGSQL_PORT -t -c "SELECT datname FROM pg_database;" \
| grep -vE 'template|repmgr|postgres' \
| sed 's/\r//g')

cd ${BACKUP_LOCATION}/; mkdir -p $dbs 

Backup the database one-by-one:

for db in $dbs; do
  pg_dump -Upostgres -hlocalhost -p$LOCAL_PGSQL_PORT -Fd -j4 "$db" -f "$BACKUP_LOCATION/$db";
done 

Once backup complete, find process ID of the port-forwarding process:

ps aux | grep -E "port-forwar[d].*$LOCAL_PGSQL_PORT" 

and stop it

kill <pid_of_the_kubectl_port-forward>