Restore Postgres to externally managed DB¶
This operation should be executed from any macOS or GNU/Linux machine where the previously taken PostgreSQL backup is located.
Note
Make sure that the application is scaled down before following this guide.
Note
When restoring backed up PostgreSQL databases, skip the following databases (see the Restore backup section below for more):
identityresourceservicesushihydraIf DR version is>= 10.1.0:cnshydra
Prerequisites¶
- Utility pg_restore of relevant version is installed on the host from where the backup is restored
- Utility kubectl version 1.23 is installed on the host from where the backup is restored
- Utility
kubectlis configured to access the Kubernetes cluster where the DataRobot application is running. Verify this with thekubectl cluster-infocommand. - Make sure the customer has allocated enough storage to perform this activity.
Note
Since we apply different PCS HA charts for external PCS where there are no Mongo and PostgreSQL statefulsets, the k8s cluster provisioned needs to have enough space for restore while moving from internally hosted PCS to external PCS setup. See the external PCS doc for more details.
Note
If the customer has an internal db size of 50GB, the restore process needs at least 110GB available size (i.e., 50% extra) on the k8s cluster provisioned with external db store for the duration of time when migrating over. A temporary disk mount also works.
First, export name of DataRobot application Kubernetes namespace in DR_CORE_NAMESPACE variable:
export DR_CORE_NAMESPACE=<namespace>
There are two ways to get the password used for the restoration¶
- Obtain the PostgreSQL admin user password from kubernetes secrets:
export PGPASSWORD=$(kubectl -n $DR_CORE_NAMESPACE get secret pcs-postgresql -o jsonpath='{.data.postgres-password}' | base64 -d) echo ${PGPASSWORD}
OR
- Login to mmapp pod, get the host and credential details to execute restore:
kubectl exec -it <mmapp-pod-name> -n $DR_CORE_NAMESPACE /entrypoint -- bash bash-4.4$ env | grep PG PGSQL_MODMON_PASSWORD=<PGSQL_USER_PASSWORD> PGSQL_MODMON_USER=modmon PGSQL_HOST=<PGSQL_HOST> PGSQL_MODMON_DB=modmon PGSQL_PORT=5432 PGSQL_POSTGRES_PASSWORD=<PGSQL_POSTGRES_PASSWORD>
Uncompress backup files if it is in tar or gz format¶
Extract pgsql backup, this will create $BACKUP_LOCATION/pgsql directory:
cd $BACKUP_LOCATION
tar xf datarobot-pgsql-backup-<date>.tar -C $BACKUP_LOCATION
If the backup is in .dat.gz format, then gunzip the files so that all files are in .dat format
cd $BACKUP_LOCATION
gunzip *.gz
Option 1: Copy backup to mmapp pod¶
This is recommended if the database size is very small ( smaller than 30GB )¶
Replace the
Copy backup to mmapp pod for restoration
Define where the backups are stored on the host from where the backup will be restored:
export BACKUP_LOCATION=~/datarobot-backups/
export RESTORE_LOCATION=/<AVAILABLE_DIR>/datarobot-backups/pgsql-backup/
Restore process will require you to copy the untarred/unzipped backup from the $BACKUP_LOCATION to mmapp pod running
kubectl cp $BACKUP_LOCATION $DR_CORE_NAMESPACE/<mmapp-pod-name>:$RESTORE_LOCATION
Restore PostgreSQL databases from the $RESTORE_LOCATION inside the mmapp pod
kubectl exec -it <mmapp-pod-name> -n $DR_CORE_NAMESPACE /entrypoint -- bash
base-4.4$ export RESTORE_LOCATION=/<AVAILABLE_DIR>/datarobot-backups/pgsql-backup/
base-4.4$ cd /opt/datarobot-libs/virtualenvs/datarobot-<release-version>/bin
base-4.4$ for db in $(find $RESTORE_LOCATION/pgsql -mindepth 1 -maxdepth 1 -type d ! -name postgres ! -name sushihydra ! -name identityresourceservice); do
./pg_restore -v -U postgres -h <PGSQL_HOST> -p 5432 -cC -j4 -d postgres "$db";
done
Option 2: Spin up a temporary restore pod¶
This is recommended if the database size is larger ( greater than 50GB )¶
Ensure that the pod being provisioned has a mount point with sufficient space available for the restore process.
Here is a minimal pod configuration, replace namespace value as needed
Adjust the following according to cloud providers
Replace spec:containers:name to be appropriate.
Replace spec:containers:image value according to the cloud provider from the following:
AWS: amazon/aws-cli Azure: mcr.microsoft.com/azurelinux/base/postgres:16.4-1-azl3.0.20240824-amd64 GCP: google/cloud-sdk
apiVersion: v1
kind: Pod
metadata:
name: pgrestore-minimal
namespace: <NAMESPACE>
spec:
containers:
- name: awscli-onprem-minimal
image: amazon/aws-cli
env:
- name: PGSQL_POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
key: postgres-password
name: pcs-postgresql
- name: PGSQL_MODMON_PASSWORD
valueFrom:
secretKeyRef:
key: password
name: pcs-db-modmon
envFrom:
- configMapRef:
name: datarobot-modeling-envvars
command:
- tail
- -f
- /dev/null
Apply the above in your namespace
kubectl apply -f pgrestore-minimal.yaml -n $NAMESPACE
Restore process will require you to copy the backup from the $BACKUP_LOCATION to below provisioned restore pod
kubectl get pods -A | grep minimal
kubectl cp $BACKUP_LOCATION $DR_CORE_NAMESPACE/<pod-name>:$RESTORE_LOCATION
Perform restore from the above provisioned pod, make sure to install postgresql client libraries and tools specific to the version, below is an example if it is an amazon provisioned node
kubectl exec -it <pod-name> -n $DR_CORE_NAMESPACE /entrypoint -- bash
bash-4.4$ amazon-linux-extras install postgresql<VERSION_NUMER> example: 12, 13 etc
bash-4.4$ for db in $(find $RESTORE_LOCATION/pgsql -mindepth 1 -maxdepth 1 -type d ! -name postgres ! -name sushihydra ! -name identityresourceservice); do
pg_restore -v -U postgres -h <PGSQL_HOST> -p 5432 -cC -j4 -d postgres "$db";
done