Self-managed installation and maintenance > Backup and restore > Database restore procedure > Restoring PostgreSQL to an externally managed database

Restoring PostgreSQL to an externally managed database¶

This operation should be executed from the macOS or GNU/Linux machine where the previously taken PostgreSQL backup is located.

Important: Ensure the DataRobot application is scaled down (e.g., kubectl scale deployment --all --replicas=0 -n <your-datarobot-namespace>) before following this guide.

Note

When restoring backed-up PostgreSQL databases, the following databases must be skipped. The find command in the restore procedures below handles these exclusions.

identityresourceservice

sushihydra Additionally, if the DataRobot version involved in the backup/restore is >= 10.1.0, also skip:

cnshydra

Prerequisites¶

Ensure the following:

Tools:
- pg_restore: A version compatible with your backup and target PostgreSQL server (e.g., 12.x if your backup was from PostgreSQL 12).
  - See PostgreSQL Downloads.
- kubectl: Version 1.23 or later, installed on the host where the backup will be restored.
  - See Kubernetes Tools Documentation.
- kubectl must be configured to access the Kubernetes cluster where the DataRobot application is running. Verify with kubectl cluster-info.
Storage: You have allocated enough storage on the target externally managed PostgreSQL instance. If using a temporary pod for restore (Option 2), ensure that pod has sufficient temporary storage for the uncompressed backup data.

General preparation steps¶

Set the DR_CORE_NAMESPACE environment variable to your DataRobot application's Kubernetes namespace. Replace <your-datarobot-namespace> with the actual namespace.
```
export DR_CORE_NAMESPACE=<your-datarobot-namespace>
```

Obtain PostgreSQL credentials¶

You need credentials for your target externally managed PostgreSQL instance. The methods below show how to get credentials if you were previously using internal PCS, which might be useful if you are migrating or need a reference for the type of credentials required by DataRobot. For your external database, use the credentials provided by your database administrator or cloud provider.

Option A: From Kubernetes secrets (for source internal PCS, if applicable)

This retrieves the password for the internal pcs-postgresql instance.

export PGPASSWORD_INTERNAL_PCS=$(kubectl -n $DR_CORE_NAMESPACE get secret pcs-postgresql -o jsonpath='{.data.postgres-password}' | base64 -d)
echo "Retrieved internal PCS PostgreSQL password (if applicable)."

Option B: From a running mmapp-pod (shows environment variables DataRobot uses)

Replace <mmapp-pod-name> with the name of one of your mmapp-app pods. This shows how DataRobot services might be configured to connect to PostgreSQL.

kubectl exec -it <mmapp-pod-name> -n $DR_CORE_NAMESPACE -- /entrypoint.sh bash

Inside the pod, run

env | grep PG

Example output:

PGSQL_MODMON_PASSWORD=<PGSQL_USER_PASSWORD>
PGSQL_MODMON_USER=modmon
PGSQL_HOST=<PGSQL_HOST> # This would be your external PGSQL host
PGSQL_MODMON_DB=modmon
PGSQL_PORT=5432
PGSQL_POSTGRES_PASSWORD=<PGSQL_POSTGRES_PASSWORD> # Admin/superuser for external PGSQL

Ensure you have the correct hostname, port, username, and password for your target externally managed PostgreSQL database. Set PGPASSWORD environment variable for pg_restore to use:

export PGPASSWORD="<your_external_pgsql_password>"

Uncompress backup files¶

Define the location on the host where your PostgreSQL backup files are stored. This example assumes ~/datarobot-backups/.
```
export BACKUP_LOCATION=~/datarobot-backups/
```
If your backup is a .tar archive containing the directory structure (e.g., pgsql with subdirectories for each database dump):

Extract the PostgreSQL backup. This typically creates a pgsql directory within your $BACKUP_LOCATION. Replace <date> with the actual date or identifier from your backup filename.
```
cd $BACKUP_LOCATION
tar xf datarobot-pgsql-backup-<date>.tar -C $BACKUP_LOCATION
```
Your uncompressed backup data (directories for each database) should now be in a path like $BACKUP_LOCATION/pgsql/.
If your individual database backup files are in .dat.gz format (e.g., from a file-per-database backup), gunzip them:
```
cd $BACKUP_LOCATION/pgsql # Or wherever your .dat.gz files are
gunzip *.gz
```
Ensure all database backup files are in uncompressed .dat format or in the directory format expected by pg_restore -Fd.

Option 1: Restore by copying backup to an mmapp pod¶

This option is recommended if the database size is relatively small (e.g., smaller than 30GB), as it uses an existing application pod.

Define source and destination paths. Replace <AVAILABLE_DIR_ON_POD> with a path on the mmapp pod that has sufficient space for the uncompressed backup data.

export LOCAL_BACKUP_PGSQL_PATH=$BACKUP_LOCATION/pgsql # Path to local uncompressed pgsql backup directory
export POD_RESTORE_LOCATION_PGSQL=/<AVAILABLE_DIR_ON_POD>/datarobot-backups/pgsql-backup/

Copy the uncompressed backup data (the pgsql directory containing individual database backup directories) to a running mmapp-app pod. Replace <mmapp-pod-name> with an actual pod name.
```
kubectl cp $LOCAL_BACKUP_PGSQL_PATH $DR_CORE_NAMESPACE/<mmapp-pod-name>:$POD_RESTORE_LOCATION_PGSQL
```

Execute pg_restore from within the mmapp-app pod for each database.

Replace <mmapp-pod-name>, <release-version> (e.g., 10.1.2), <PGSQL_EXTERNAL_HOST>, <PGSQL_EXTERNAL_USER> and ensure PGPASSWORD is set in the pod's environment if not using a connection string that includes it.

kubectl exec -it <mmapp-pod-name> -n $DR_CORE_NAMESPACE -- /entrypoint.sh bash
# Inside the mmapp pod:
# export POD_BACKUP_ROOT=/<AVAILABLE_DIR_ON_POD>/datarobot-backups/pgsql-backup/pgsql # Path to the parent of database dump dirs
# export PGUSER="<YOUR_EXTERNAL_PGSQL_USER>" # User for the external DB
# export PGPASSWORD="<YOUR_EXTERNAL_PGSQL_PASSWORD>" # Password for the external DB
# cd /opt/datarobot-libs/virtualenvs/datarobot-<release-version>/bin
# for db_backup_dir in $(find $POD_BACKUP_ROOT -mindepth 1 -maxdepth 1 -type d ! -name postgres ! -name sushihydra ! -name identityresourceservice ! -name cnshydra ); do # Adjust cnshydra exclusion based on DR version
#   db_name=$(basename "$db_backup_dir")
#   echo "Restoring database: $db_name"
#   ./pg_restore -v -U $PGUSER -h <PGSQL_EXTERNAL_HOST> -p 5432 -cC -j4 -d postgres "$db_backup_dir"
# done

Note

The -d postgres connects to the postgres maintenance database to issue CREATE DATABASE commands (due to -C). Ensure the user has creation rights or pre-create databases and adjust flags.

Once the restore is complete, scale your DataRobot application deployments back up.

Option 2: Restore using a temporary Kubernetes pod¶

This option is recommended if the database size is larger (e.g., greater than 50GB).

Ensure the temporary pod you provision will have a mount point with sufficient space for the uncompressed backup data.

Create a pod definition YAML file (e.g., pgsql-restore-pod.yaml). Choose an image that includes PostgreSQL client tools (pg_restore) compatible with your backup and target database, or a base image where you can install them. Replace <NAMESPACE> with your DataRobot namespace.

apiVersion: v1
kind: Pod
metadata:
  name: pgsql-restore-temp-pod
  namespace: <NAMESPACE>
spec:
  containers:
  - name: pgsql-restore-container
    image: postgres:12 # Or an image matching your pg_restore version, e.g., appropriate Azure/GCP/AWS CLI image
    env: # Pass credentials for your external PostgreSQL
    - name: PGUSER
      value: "<YOUR_EXTERNAL_PGSQL_USER>"
    - name: PGPASSWORD
      value: "<YOUR_EXTERNAL_PGSQL_PASSWORD>"
    - name: PGHOST
      value: "<YOUR_EXTERNAL_PGSQL_HOSTNAME>"
    - name: PGPORT
      value: "5432" # Default, adjust if needed
    command:
      - tail
      - -f
      - /dev/null
    # volumeMounts: # If you need to mount persistent storage for the backup data
    # - name: backup-data-volume
    #   mountPath: /restore-data
  # volumes: # If using a persistent volume for backup data
  # - name: backup-data-volume
  #   persistentVolumeClaim:
  #     claimName: my-pvc-for-restore-data

Apply the pod definition to your namespace:

kubectl apply -f pgsql-restore-pod.yaml -n <NAMESPACE>

Copy the uncompressed backup data (the pgsql directory) to the temporary pod. Replace $LOCAL_BACKUP_PGSQL_PATH with the path to your local uncompressed pgsql directory, <temp-pod-name> (e.g., pgsql-restore-temp-pod), and $POD_RESTORE_LOCATION_PGSQL with a path inside the pod (e.g., /restore-data/pgsql).

# Wait for the pod to be running: kubectl get pods -n <NAMESPACE> | grep pgsql-restore-temp-pod
export LOCAL_BACKUP_PGSQL_PATH=$BACKUP_LOCATION/pgsql
export POD_RESTORE_LOCATION_PGSQL=/restore-data/pgsql
kubectl cp $LOCAL_BACKUP_PGSQL_PATH <NAMESPACE>/<temp-pod-name>:$POD_RESTORE_LOCATION_PGSQL

Perform the restore from within the temporary pod.

kubectl exec -it <temp-pod-name> -n <NAMESPACE> -- bash
# Inside the temporary pod:
# If PostgreSQL client tools are not on the image, install them.
# Example for Debian/Ubuntu based image (like postgres:12):
# apt-get update && apt-get install -y postgresql-client
# Example for RHEL/CentOS based image (if you used amazon/aws-cli and it's Amazon Linux):
# amazon-linux-extras install postgresql<VERSION_NUMBER_NO_DOT> # e.g., postgresql12
#
# Now run pg_restore using the environment variables for credentials and host:
# export POD_BACKUP_ROOT=/restore-data/pgsql # Path to the parent of database dump dirs
# for db_backup_dir in $(find $POD_BACKUP_ROOT -mindepth 1 -maxdepth 1 -type d ! -name postgres ! -name sushihydra ! -name identityresourceservice ! -name cnshydra ); do # Adjust cnshydra exclusion
#   db_name=$(basename "$db_backup_dir")
#   echo "Restoring database: $db_name"
#   pg_restore -v -U $PGUSER -h $PGHOST -p $PGPORT -cC -j4 -d postgres "$db_backup_dir"
# done

Once the restore is complete, delete the temporary pod:

kubectl delete pod <temp-pod-name> -n <NAMESPACE>

Scale your DataRobot application deployments back up.