Skip to content

Restoring PostgreSQL to an externally managed database

This operation should be executed from the macOS or GNU/Linux machine where the previously taken PostgreSQL backup is located.

Important: Ensure the DataRobot application is scaled down (e.g., kubectl scale deployment --all --replicas=0 -n <your-datarobot-namespace>) before following this guide.

Note

When restoring backed-up PostgreSQL databases, the following databases must be skipped. The find command in the restore procedures below handles these exclusions.

  • identityresourceservice
  • sushihydra Additionally, if the DataRobot version involved in the backup/restore is >= 10.1.0, also skip:
  • cnshydra

Prerequisites

Ensure the following:

  • Tools:

    • pg_restore: A version compatible with your backup and target PostgreSQL server (e.g., 12.x if your backup was from PostgreSQL 12).

    • kubectl: Version 1.23 or later, installed on the host where the backup will be restored.

    • kubectl must be configured to access the Kubernetes cluster where the DataRobot application is running. Verify with kubectl cluster-info.

  • Storage: You have allocated enough storage on the target externally managed PostgreSQL instance. If using a temporary pod for restore (Option 2), ensure that pod has sufficient temporary storage for the uncompressed backup data.

General preparation steps

  1. Set the DR_CORE_NAMESPACE environment variable to your DataRobot application's Kubernetes namespace. Replace <your-datarobot-namespace> with the actual namespace.

    export DR_CORE_NAMESPACE=<your-datarobot-namespace>
    

Obtain PostgreSQL credentials

You need credentials for your target externally managed PostgreSQL instance. The methods below show how to get credentials if you were previously using internal PCS, which might be useful if you are migrating or need a reference for the type of credentials required by DataRobot. For your external database, use the credentials provided by your database administrator or cloud provider.

Option A: From Kubernetes secrets (for source internal PCS, if applicable)

This retrieves the password for the internal pcs-postgresql instance.

export PGPASSWORD_INTERNAL_PCS=$(kubectl -n $DR_CORE_NAMESPACE get secret pcs-postgresql -o jsonpath='{.data.postgres-password}' | base64 -d)
echo "Retrieved internal PCS PostgreSQL password (if applicable)."

Option B: From a running mmapp-pod (shows environment variables DataRobot uses)

Replace <mmapp-pod-name> with the name of one of your mmapp-app pods. This shows how DataRobot services might be configured to connect to PostgreSQL.

kubectl exec -it <mmapp-pod-name> -n $DR_CORE_NAMESPACE -- /entrypoint.sh bash

Inside the pod, run

env | grep PG

Example output:

PGSQL_MODMON_PASSWORD=<PGSQL_USER_PASSWORD>
PGSQL_MODMON_USER=modmon
PGSQL_HOST=<PGSQL_HOST> # This would be your external PGSQL host
PGSQL_MODMON_DB=modmon
PGSQL_PORT=5432
PGSQL_POSTGRES_PASSWORD=<PGSQL_POSTGRES_PASSWORD> # Admin/superuser for external PGSQL

Ensure you have the correct hostname, port, username, and password for your target externally managed PostgreSQL database. Set PGPASSWORD environment variable for pg_restore to use:

export PGPASSWORD="<your_external_pgsql_password>"

Uncompress backup files

  1. Define the location on the host where your PostgreSQL backup files are stored. This example assumes ~/datarobot-backups/.

    export BACKUP_LOCATION=~/datarobot-backups/
    
  2. If your backup is a .tar archive containing the directory structure (e.g., pgsql with subdirectories for each database dump):

    Extract the PostgreSQL backup. This typically creates a pgsql directory within your $BACKUP_LOCATION. Replace <date> with the actual date or identifier from your backup filename.

    cd $BACKUP_LOCATION
    tar xf datarobot-pgsql-backup-<date>.tar -C $BACKUP_LOCATION
    

    Your uncompressed backup data (directories for each database) should now be in a path like $BACKUP_LOCATION/pgsql/.

  3. If your individual database backup files are in .dat.gz format (e.g., from a file-per-database backup), gunzip them:

    cd $BACKUP_LOCATION/pgsql # Or wherever your .dat.gz files are
    gunzip *.gz
    

    Ensure all database backup files are in uncompressed .dat format or in the directory format expected by pg_restore -Fd.

Option 1: Restore by copying backup to an mmapp pod

This option is recommended if the database size is relatively small (e.g., smaller than 30GB), as it uses an existing application pod.

  1. Define source and destination paths. Replace <AVAILABLE_DIR_ON_POD> with a path on the mmapp pod that has sufficient space for the uncompressed backup data.

    export LOCAL_BACKUP_PGSQL_PATH=$BACKUP_LOCATION/pgsql # Path to local uncompressed pgsql backup directory
    export POD_RESTORE_LOCATION_PGSQL=/<AVAILABLE_DIR_ON_POD>/datarobot-backups/pgsql-backup/
    
  2. Copy the uncompressed backup data (the pgsql directory containing individual database backup directories) to a running mmapp-app pod. Replace <mmapp-pod-name> with an actual pod name.

    kubectl cp $LOCAL_BACKUP_PGSQL_PATH $DR_CORE_NAMESPACE/<mmapp-pod-name>:$POD_RESTORE_LOCATION_PGSQL
    
  3. Execute pg_restore from within the mmapp-app pod for each database.

    Replace <mmapp-pod-name>, <release-version> (e.g., 10.1.2), <PGSQL_EXTERNAL_HOST>, <PGSQL_EXTERNAL_USER> and ensure PGPASSWORD is set in the pod's environment if not using a connection string that includes it.

    kubectl exec -it <mmapp-pod-name> -n $DR_CORE_NAMESPACE -- /entrypoint.sh bash
    # Inside the mmapp pod:
    # export POD_BACKUP_ROOT=/<AVAILABLE_DIR_ON_POD>/datarobot-backups/pgsql-backup/pgsql # Path to the parent of database dump dirs
    # export PGUSER="<YOUR_EXTERNAL_PGSQL_USER>" # User for the external DB
    # export PGPASSWORD="<YOUR_EXTERNAL_PGSQL_PASSWORD>" # Password for the external DB
    # cd /opt/datarobot-libs/virtualenvs/datarobot-<release-version>/bin
    # for db_backup_dir in $(find $POD_BACKUP_ROOT -mindepth 1 -maxdepth 1 -type d ! -name postgres ! -name sushihydra ! -name identityresourceservice ! -name cnshydra ); do # Adjust cnshydra exclusion based on DR version
    #   db_name=$(basename "$db_backup_dir")
    #   echo "Restoring database: $db_name"
    #   ./pg_restore -v -U $PGUSER -h <PGSQL_EXTERNAL_HOST> -p 5432 -cC -j4 -d postgres "$db_backup_dir"
    # done
    

Note

The -d postgres connects to the postgres maintenance database to issue CREATE DATABASE commands (due to -C). Ensure the user has creation rights or pre-create databases and adjust flags.

  1. Once the restore is complete, scale your DataRobot application deployments back up.

Option 2: Restore using a temporary Kubernetes pod

This option is recommended if the database size is larger (e.g., greater than 50GB).

  1. Ensure the temporary pod you provision will have a mount point with sufficient space for the uncompressed backup data.
  2. Create a pod definition YAML file (e.g., pgsql-restore-pod.yaml). Choose an image that includes PostgreSQL client tools (pg_restore) compatible with your backup and target database, or a base image where you can install them. Replace <NAMESPACE> with your DataRobot namespace.

    apiVersion: v1
    kind: Pod
    metadata:
      name: pgsql-restore-temp-pod
      namespace: <NAMESPACE>
    spec:
      containers:
      - name: pgsql-restore-container
        image: postgres:12 # Or an image matching your pg_restore version, e.g., appropriate Azure/GCP/AWS CLI image
        env: # Pass credentials for your external PostgreSQL
        - name: PGUSER
          value: "<YOUR_EXTERNAL_PGSQL_USER>"
        - name: PGPASSWORD
          value: "<YOUR_EXTERNAL_PGSQL_PASSWORD>"
        - name: PGHOST
          value: "<YOUR_EXTERNAL_PGSQL_HOSTNAME>"
        - name: PGPORT
          value: "5432" # Default, adjust if needed
        command:
          - tail
          - -f
          - /dev/null
        # volumeMounts: # If you need to mount persistent storage for the backup data
        # - name: backup-data-volume
        #   mountPath: /restore-data
      # volumes: # If using a persistent volume for backup data
      # - name: backup-data-volume
      #   persistentVolumeClaim:
      #     claimName: my-pvc-for-restore-data
    
  3. Apply the pod definition to your namespace:

    kubectl apply -f pgsql-restore-pod.yaml -n <NAMESPACE>
    
  4. Copy the uncompressed backup data (the pgsql directory) to the temporary pod. Replace $LOCAL_BACKUP_PGSQL_PATH with the path to your local uncompressed pgsql directory, <temp-pod-name> (e.g., pgsql-restore-temp-pod), and $POD_RESTORE_LOCATION_PGSQL with a path inside the pod (e.g., /restore-data/pgsql).

    # Wait for the pod to be running: kubectl get pods -n <NAMESPACE> | grep pgsql-restore-temp-pod
    export LOCAL_BACKUP_PGSQL_PATH=$BACKUP_LOCATION/pgsql
    export POD_RESTORE_LOCATION_PGSQL=/restore-data/pgsql
    kubectl cp $LOCAL_BACKUP_PGSQL_PATH <NAMESPACE>/<temp-pod-name>:$POD_RESTORE_LOCATION_PGSQL
    
  5. Perform the restore from within the temporary pod.

    kubectl exec -it <temp-pod-name> -n <NAMESPACE> -- bash
    # Inside the temporary pod:
    # If PostgreSQL client tools are not on the image, install them.
    # Example for Debian/Ubuntu based image (like postgres:12):
    # apt-get update && apt-get install -y postgresql-client
    # Example for RHEL/CentOS based image (if you used amazon/aws-cli and it's Amazon Linux):
    # amazon-linux-extras install postgresql<VERSION_NUMBER_NO_DOT> # e.g., postgresql12
    #
    # Now run pg_restore using the environment variables for credentials and host:
    # export POD_BACKUP_ROOT=/restore-data/pgsql # Path to the parent of database dump dirs
    # for db_backup_dir in $(find $POD_BACKUP_ROOT -mindepth 1 -maxdepth 1 -type d ! -name postgres ! -name sushihydra ! -name identityresourceservice ! -name cnshydra ); do # Adjust cnshydra exclusion
    #   db_name=$(basename "$db_backup_dir")
    #   echo "Restoring database: $db_name"
    #   pg_restore -v -U $PGUSER -h $PGHOST -p $PGPORT -cC -j4 -d postgres "$db_backup_dir"
    # done
    
  6. Once the restore is complete, delete the temporary pod:

    kubectl delete pod <temp-pod-name> -n <NAMESPACE>
    
  7. Scale your DataRobot application deployments back up.