Skip to content

Restore Postgres to externally managed DB

This operation should be executed from any macOS or GNU/Linux machine where the previously taken PostgreSQL backup is located.

Note

Make sure that the application is scaled down before following this guide.

Note

When restoring backed up PostgreSQL databases, skip the following databases (see the Restore backup section below for more):

  • identityresourceservice
  • sushihydra If DR version is >= 10.1.0:
  • cnshydra

Prerequisites

  • Utility pg_restore of relevant version is installed on the host from where the backup is restored
  • Utility kubectl version 1.23 is installed on the host from where the backup is restored
  • Utility kubectl is configured to access the Kubernetes cluster where the DataRobot application is running. Verify this with the kubectl cluster-info command.
  • Make sure the customer has allocated enough storage to perform this activity.

Note

Since we apply different PCS HA charts for external PCS where there are no Mongo and PostgreSQL statefulsets, the k8s cluster provisioned needs to have enough space for restore while moving from internally hosted PCS to external PCS setup. See the external PCS doc for more details.

Note

If the customer has an internal db size of 50GB, the restore process needs at least 110GB available size (i.e., 50% extra) on the k8s cluster provisioned with external db store for the duration of time when migrating over. A temporary disk mount also works.

First, export name of DataRobot application Kubernetes namespace in DR_CORE_NAMESPACE variable:

export DR_CORE_NAMESPACE=<namespace>

There are two ways to get the password used for the restoration

  1. Obtain the PostgreSQL admin user password from kubernetes secrets:
    export PGPASSWORD=$(kubectl -n $DR_CORE_NAMESPACE get secret pcs-postgresql -o jsonpath='{.data.postgres-password}' | base64 -d)
    echo ${PGPASSWORD}
    

OR

  1. Login to mmapp pod, get the host and credential details to execute restore:
    kubectl exec -it <mmapp-pod-name> -n $DR_CORE_NAMESPACE /entrypoint -- bash
    bash-4.4$ env | grep PG
    PGSQL_MODMON_PASSWORD=<PGSQL_USER_PASSWORD>
    PGSQL_MODMON_USER=modmon
    PGSQL_HOST=<PGSQL_HOST>
    PGSQL_MODMON_DB=modmon
    PGSQL_PORT=5432
    PGSQL_POSTGRES_PASSWORD=<PGSQL_POSTGRES_PASSWORD>
    

Uncompress backup files if it is in tar or gz format

Extract pgsql backup, this will create $BACKUP_LOCATION/pgsql directory:

cd $BACKUP_LOCATION
tar xf datarobot-pgsql-backup-<date>.tar -C $BACKUP_LOCATION

If the backup is in .dat.gz format, then gunzip the files so that all files are in .dat format

cd $BACKUP_LOCATION
gunzip *.gz

Option 1: Copy backup to mmapp pod

Replace the variable below with the customer’s mount point that has sufficient space for the restore.

Copy backup to mmapp pod for restoration

Define where the backups are stored on the host from where the backup will be restored:

export BACKUP_LOCATION=~/datarobot-backups/
export RESTORE_LOCATION=/<AVAILABLE_DIR>/datarobot-backups/pgsql-backup/

Restore process will require you to copy the untarred/unzipped backup from the $BACKUP_LOCATION to mmapp pod running

kubectl cp $BACKUP_LOCATION $DR_CORE_NAMESPACE/<mmapp-pod-name>:$RESTORE_LOCATION

Restore PostgreSQL databases from the $RESTORE_LOCATION inside the mmapp pod

kubectl exec -it <mmapp-pod-name> -n $DR_CORE_NAMESPACE /entrypoint -- bash
base-4.4$ export RESTORE_LOCATION=/<AVAILABLE_DIR>/datarobot-backups/pgsql-backup/
base-4.4$ cd /opt/datarobot-libs/virtualenvs/datarobot-<release-version>/bin
base-4.4$ for db in $(find $RESTORE_LOCATION/pgsql -mindepth 1 -maxdepth 1 -type d ! -name postgres ! -name sushihydra ! -name identityresourceservice); do
  ./pg_restore -v -U postgres -h <PGSQL_HOST> -p 5432 -cC -j4 -d postgres "$db";
done

Option 2: Spin up a temporary restore pod

Ensure that the pod being provisioned has a mount point with sufficient space available for the restore process.

Here is a minimal pod configuration, replace namespace value as needed

Adjust the following according to cloud providers

Replace spec:containers:name to be appropriate. Replace spec:containers:image value according to the cloud provider from the following:

AWS: amazon/aws-cli Azure: mcr.microsoft.com/azurelinux/base/postgres:16.4-1-azl3.0.20240824-amd64 GCP: google/cloud-sdk

apiVersion: v1
kind: Pod
metadata:
  name: pgrestore-minimal
  namespace: <NAMESPACE>
spec:
  containers:
  - name: awscli-onprem-minimal
    image: amazon/aws-cli
    env:
    - name: PGSQL_POSTGRES_PASSWORD
      valueFrom:
        secretKeyRef:
          key: postgres-password
          name: pcs-postgresql
    - name: PGSQL_MODMON_PASSWORD
      valueFrom:
        secretKeyRef:
          key: password
          name: pcs-db-modmon
    envFrom:
    - configMapRef:
        name: datarobot-modeling-envvars
    command:
      - tail
      - -f
      - /dev/null

Apply the above in your namespace

kubectl apply -f pgrestore-minimal.yaml -n $NAMESPACE

Restore process will require you to copy the backup from the $BACKUP_LOCATION to below provisioned restore pod

kubectl get pods -A | grep minimal

kubectl cp $BACKUP_LOCATION $DR_CORE_NAMESPACE/<pod-name>:$RESTORE_LOCATION

Perform restore from the above provisioned pod, make sure to install postgresql client libraries and tools specific to the version, below is an example if it is an amazon provisioned node

kubectl exec -it <pod-name> -n $DR_CORE_NAMESPACE /entrypoint -- bash
bash-4.4$ amazon-linux-extras install postgresql<VERSION_NUMER> example: 12, 13 etc
bash-4.4$ for db in $(find $RESTORE_LOCATION/pgsql -mindepth 1 -maxdepth 1 -type d ! -name postgres ! -name sushihydra ! -name identityresourceservice); do
  pg_restore -v -U postgres -h <PGSQL_HOST> -p 5432 -cC -j4 -d postgres "$db";
done