Skip to content

Restore Postgres to externally managed DB

This operation should be executed from the macOS or GNU\Linux machine where previously taken PostgreSQL backup is located.

ADDITIONAL NOTE: Please make sure that the application is scaled down before following this guide

When restoring backed up postgres DBs, the following databases must be skipped (see Restore backup section below for more): * identityresourceservice * sushihydra If DR version is >= 10.1.0: * cnshydra

Prerequisites

  • Utility pg_restore of relevant version is installed on the host from where backup will be restored
  • Utility kubectl of version 1.23 is installed on the host from where backup will be restored
  • Utility kubectl is configured to access the Kubernetes cluster where DataRobot application is running, verify this with kubectl cluster-info command.
  • Please make sure the customer has allocated enough storage to perform this activity.

Internal note: Since we apply different pcs ha charts for external pcs where there will be no Mongo and PostgreSQL statefulsets, the k8s cluster provisioned will need to have enough space for restore while moving from internally hosted PCS to external PCS setup. Please see (external pcs)[installation/external-pcs.md] doc for more details

If customer has internal db size of 50GB, we would need atleast 110GB available size i.e 50% extra on the k8s cluster provisioned with external db store for a duration of time when we are migrating over, a temporary disk mount will also work.

First, export name of DataRobot application Kubernetes namespace in DR_CORE_NAMESPACE variable:

export DR_CORE_NAMESPACE=<namespace>

There are two ways to get the password used for the restoration

  1. Obtain the PostgreSQL admin user password from kubernetes secrets:
    export PGPASSWORD=$(kubectl -n $DR_CORE_NAMESPACE get secret pcs-postgresql -o jsonpath='{.data.postgres-password}' | base64 -d)
    echo ${PGPASSWORD}
    

OR

  1. Login to mmapp pod, get the host and credential details to execute restore:
    kubectl exec -it <mmapp-pod-name> -n $DR_CORE_NAMESPACE /entrypoint -- bash
    bash-4.4$ env | grep PG
    PGSQL_MODMON_PASSWORD=<PGSQL_USER_PASSWORD>
    PGSQL_MODMON_USER=modmon
    PGSQL_HOST=<PGSQL_HOST>
    PGSQL_MODMON_DB=modmon
    PGSQL_PORT=5432
    PGSQL_POSTGRES_PASSWORD=<PGSQL_POSTGRES_PASSWORD>
    

Uncompress backup files if it is in tar or gz format

Extract pgsql backup, this will create $BACKUP_LOCATION/pgsql directory:

cd $BACKUP_LOCATION
tar xf datarobot-pgsql-backup-<date>.tar -C $BACKUP_LOCATION

If the backup is in .dat.gz format, then gunzip the files so that all files are in .dat format

cd $BACKUP_LOCATION
gunzip *.gz

Option 1: Copy backup to mmapp pod

Replace the variable below with the customer’s mount point that has sufficient space for the restore.

Copy backup to mmapp pod for restoration

Define where the backups are stored on the host from where the backup will be restored:

export BACKUP_LOCATION=~/datarobot-backups/
export RESTORE_LOCATION=/<AVAILABLE_DIR>/datarobot-backups/pgsql-backup/

Restore process will require you to copy the untarred/unzipped backup from the $BACKUP_LOCATION to mmapp pod running

kubectl cp $BACKUP_LOCATION $DR_CORE_NAMESPACE/<mmapp-pod-name>:$RESTORE_LOCATION

Restore PostgreSQL databases from the $RESTORE_LOCATION inside the mmapp pod

kubectl exec -it <mmapp-pod-name> -n $DR_CORE_NAMESPACE /entrypoint -- bash
base-4.4$ export RESTORE_LOCATION=/<AVAILABLE_DIR>/datarobot-backups/pgsql-backup/
base-4.4$ cd /opt/datarobot-libs/virtualenvs/datarobot-<release-version>/bin
base-4.4$ for db in $(find $RESTORE_LOCATION/pgsql -mindepth 1 -maxdepth 1 -type d ! -name postgres ! -name sushihydra ! -name identityresourceservice); do
  ./pg_restore -v -U postgres -h <PGSQL_HOST> -p 5432 -cC -j4 -d postgres "$db";
done

Option 2: Spin up a temporary restore pod

Ensure that the pod being provisioned has a mount point with sufficient space available for the restore process.

Here is a minimal pod configuration, replace namespace value as needed

Adjust the following according to cloud providers

Please replace spec:containers:name to be apt Please replace spec:containers:image value according to the cloud provider from the following:

AWS: amazon/aws-cli Azure: mcr.microsoft.com/azurelinux/base/postgres:16.4-1-azl3.0.20240824-amd64 GCP: google/cloud-sdk

apiVersion: v1
kind: Pod
metadata:
  name: pgrestore-minimal
  namespace: <NAMESPACE>
spec:
  containers:
  - name: awscli-onprem-minimal
    image: amazon/aws-cli
    env:
    - name: PGSQL_POSTGRES_PASSWORD
      valueFrom:
        secretKeyRef:
          key: postgres-password
          name: pcs-postgresql
    - name: PGSQL_MODMON_PASSWORD
      valueFrom:
        secretKeyRef:
          key: password
          name: pcs-db-modmon
    envFrom:
    - configMapRef:
        name: datarobot-modeling-envvars
    command:
      - tail
      - -f
      - /dev/null

Apply the above in your namespace

kubectl apply -f pgrestore-minimal.yaml -n $NAMESPACE

Restore process will require you to copy the backup from the $BACKUP_LOCATION to below provisioned restore pod

kubectl get pods -A | grep minimal

kubectl cp $BACKUP_LOCATION $DR_CORE_NAMESPACE/<pod-name>:$RESTORE_LOCATION

Perform restore from the above provisioned pod, make sure to install postgresql client libraries and tools specific to the version, below is an example if it is an amazon provisioned node

kubectl exec -it <pod-name> -n $DR_CORE_NAMESPACE /entrypoint -- bash
bash-4.4$ amazon-linux-extras install postgresql<VERSION_NUMER> example: 12, 13 etc
bash-4.4$ for db in $(find $RESTORE_LOCATION/pgsql -mindepth 1 -maxdepth 1 -type d ! -name postgres ! -name sushihydra ! -name identityresourceservice); do
  pg_restore -v -U postgres -h <PGSQL_HOST> -p 5432 -cC -j4 -d postgres "$db";
done