Restore Postgres to externally managed DB¶
This operation should be executed from the macOS or GNU\Linux machine where previously taken PostgreSQL backup is located.
ADDITIONAL NOTE: Please make sure that the application is scaled down before following this guide
When restoring backed up postgres DBs, the following databases must be skipped (see Restore backup section below for more):
* identityresourceservice
* sushihydra
If DR version is >= 10.1.0:
* cnshydra
Prerequisites¶
- Utility pg_restore of relevant version is installed on the host from where backup will be restored
- Utility kubectl of version 1.23 is installed on the host from where backup will be restored
- Utility
kubectlis configured to access the Kubernetes cluster where DataRobot application is running, verify this withkubectl cluster-infocommand. - Please make sure the customer has allocated enough storage to perform this activity.
Internal note: Since we apply different pcs ha charts for external pcs where there will be no Mongo and PostgreSQL statefulsets, the k8s cluster provisioned will need to have enough space for restore while moving from internally hosted PCS to external PCS setup. Please see (external pcs)[installation/external-pcs.md] doc for more details
If customer has internal db size of 50GB, we would need atleast 110GB available size i.e 50% extra on the k8s cluster provisioned with external db store for a duration of time when we are migrating over, a temporary disk mount will also work.
First, export name of DataRobot application Kubernetes namespace in DR_CORE_NAMESPACE variable:
export DR_CORE_NAMESPACE=<namespace>
There are two ways to get the password used for the restoration¶
- Obtain the PostgreSQL admin user password from kubernetes secrets:
export PGPASSWORD=$(kubectl -n $DR_CORE_NAMESPACE get secret pcs-postgresql -o jsonpath='{.data.postgres-password}' | base64 -d) echo ${PGPASSWORD}
OR
- Login to mmapp pod, get the host and credential details to execute restore:
kubectl exec -it <mmapp-pod-name> -n $DR_CORE_NAMESPACE /entrypoint -- bash bash-4.4$ env | grep PG PGSQL_MODMON_PASSWORD=<PGSQL_USER_PASSWORD> PGSQL_MODMON_USER=modmon PGSQL_HOST=<PGSQL_HOST> PGSQL_MODMON_DB=modmon PGSQL_PORT=5432 PGSQL_POSTGRES_PASSWORD=<PGSQL_POSTGRES_PASSWORD>
Uncompress backup files if it is in tar or gz format¶
Extract pgsql backup, this will create $BACKUP_LOCATION/pgsql directory:
cd $BACKUP_LOCATION
tar xf datarobot-pgsql-backup-<date>.tar -C $BACKUP_LOCATION
If the backup is in .dat.gz format, then gunzip the files so that all files are in .dat format
cd $BACKUP_LOCATION
gunzip *.gz
Option 1: Copy backup to mmapp pod¶
This is recommended if the database size is very small ( smaller than 30GB )¶
Replace the
Copy backup to mmapp pod for restoration
Define where the backups are stored on the host from where the backup will be restored:
export BACKUP_LOCATION=~/datarobot-backups/
export RESTORE_LOCATION=/<AVAILABLE_DIR>/datarobot-backups/pgsql-backup/
Restore process will require you to copy the untarred/unzipped backup from the $BACKUP_LOCATION to mmapp pod running
kubectl cp $BACKUP_LOCATION $DR_CORE_NAMESPACE/<mmapp-pod-name>:$RESTORE_LOCATION
Restore PostgreSQL databases from the $RESTORE_LOCATION inside the mmapp pod
kubectl exec -it <mmapp-pod-name> -n $DR_CORE_NAMESPACE /entrypoint -- bash
base-4.4$ export RESTORE_LOCATION=/<AVAILABLE_DIR>/datarobot-backups/pgsql-backup/
base-4.4$ cd /opt/datarobot-libs/virtualenvs/datarobot-<release-version>/bin
base-4.4$ for db in $(find $RESTORE_LOCATION/pgsql -mindepth 1 -maxdepth 1 -type d ! -name postgres ! -name sushihydra ! -name identityresourceservice); do
./pg_restore -v -U postgres -h <PGSQL_HOST> -p 5432 -cC -j4 -d postgres "$db";
done
Option 2: Spin up a temporary restore pod¶
This is recommended if the database size is larger ( greater than 50GB )¶
Ensure that the pod being provisioned has a mount point with sufficient space available for the restore process.
Here is a minimal pod configuration, replace namespace value as needed
Adjust the following according to cloud providers
Please replace spec:containers:name to be apt Please replace spec:containers:image value according to the cloud provider from the following:
AWS: amazon/aws-cli Azure: mcr.microsoft.com/azurelinux/base/postgres:16.4-1-azl3.0.20240824-amd64 GCP: google/cloud-sdk
apiVersion: v1
kind: Pod
metadata:
name: pgrestore-minimal
namespace: <NAMESPACE>
spec:
containers:
- name: awscli-onprem-minimal
image: amazon/aws-cli
env:
- name: PGSQL_POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
key: postgres-password
name: pcs-postgresql
- name: PGSQL_MODMON_PASSWORD
valueFrom:
secretKeyRef:
key: password
name: pcs-db-modmon
envFrom:
- configMapRef:
name: datarobot-modeling-envvars
command:
- tail
- -f
- /dev/null
Apply the above in your namespace
kubectl apply -f pgrestore-minimal.yaml -n $NAMESPACE
Restore process will require you to copy the backup from the $BACKUP_LOCATION to below provisioned restore pod
kubectl get pods -A | grep minimal
kubectl cp $BACKUP_LOCATION $DR_CORE_NAMESPACE/<pod-name>:$RESTORE_LOCATION
Perform restore from the above provisioned pod, make sure to install postgresql client libraries and tools specific to the version, below is an example if it is an amazon provisioned node
kubectl exec -it <pod-name> -n $DR_CORE_NAMESPACE /entrypoint -- bash
bash-4.4$ amazon-linux-extras install postgresql<VERSION_NUMER> example: 12, 13 etc
bash-4.4$ for db in $(find $RESTORE_LOCATION/pgsql -mindepth 1 -maxdepth 1 -type d ! -name postgres ! -name sushihydra ! -name identityresourceservice); do
pg_restore -v -U postgres -h <PGSQL_HOST> -p 5432 -cC -j4 -d postgres "$db";
done