Restoring PostgreSQL to an externally managed database¶
This operation should be executed from the macOS or GNU/Linux machine where the previously taken PostgreSQL backup is located.
Important: Ensure the DataRobot application is scaled down (e.g.,
kubectl scale deployment --all --replicas=0 -n <your-datarobot-namespace>) before following this guide.
Note
When restoring backed-up PostgreSQL databases, the following databases must be skipped. The find command in the restore procedures below handles these exclusions.
identityresourceservicesushihydraAdditionally, if the DataRobot version involved in the backup/restore is>= 10.1.0, also skip:cnshydra
Prerequisites¶
Ensure the following:
-
Tools:
-
pg_restore: A version compatible with your backup and target PostgreSQL server (e.g.,
12.xif your backup was from PostgreSQL 12).- See PostgreSQL Downloads.
-
kubectl: Version
1.23or later, installed on the host where the backup will be restored. -
kubectlmust be configured to access the Kubernetes cluster where the DataRobot application is running. Verify withkubectl cluster-info.
-
-
Storage: You have allocated enough storage on the target externally managed PostgreSQL instance. If using a temporary pod for restore (Option 2), ensure that pod has sufficient temporary storage for the uncompressed backup data.
General preparation steps¶
-
Set the
DR_CORE_NAMESPACEenvironment variable to your DataRobot application's Kubernetes namespace. Replace<your-datarobot-namespace>with the actual namespace.export DR_CORE_NAMESPACE=<your-datarobot-namespace>
Obtain PostgreSQL credentials¶
You need credentials for your target externally managed PostgreSQL instance. The methods below show how to get credentials if you were previously using internal PCS, which might be useful if you are migrating or need a reference for the type of credentials required by DataRobot. For your external database, use the credentials provided by your database administrator or cloud provider.
Option A: From Kubernetes secrets (for source internal PCS, if applicable)
This retrieves the password for the internal pcs-postgresql instance.
export PGPASSWORD_INTERNAL_PCS=$(kubectl -n $DR_CORE_NAMESPACE get secret pcs-postgresql -o jsonpath='{.data.postgres-password}' | base64 -d)
echo "Retrieved internal PCS PostgreSQL password (if applicable)."
Option B: From a running mmapp-pod (shows environment variables DataRobot uses)
Replace <mmapp-pod-name> with the name of one of your mmapp-app pods. This shows how DataRobot services might be configured to connect to PostgreSQL.
kubectl exec -it <mmapp-pod-name> -n $DR_CORE_NAMESPACE -- /entrypoint.sh bash
Inside the pod, run
env | grep PG
Example output:
PGSQL_MODMON_PASSWORD=<PGSQL_USER_PASSWORD>
PGSQL_MODMON_USER=modmon
PGSQL_HOST=<PGSQL_HOST> # This would be your external PGSQL host
PGSQL_MODMON_DB=modmon
PGSQL_PORT=5432
PGSQL_POSTGRES_PASSWORD=<PGSQL_POSTGRES_PASSWORD> # Admin/superuser for external PGSQL
Ensure you have the correct hostname, port, username, and password for your target externally managed PostgreSQL database. Set PGPASSWORD environment variable for pg_restore to use:
export PGPASSWORD="<your_external_pgsql_password>"
Uncompress backup files¶
-
Define the location on the host where your PostgreSQL backup files are stored. This example assumes
~/datarobot-backups/.export BACKUP_LOCATION=~/datarobot-backups/ -
If your backup is a
.tararchive containing the directory structure (e.g.,pgsqlwith subdirectories for each database dump):Extract the PostgreSQL backup. This typically creates a
pgsqldirectory within your$BACKUP_LOCATION. Replace<date>with the actual date or identifier from your backup filename.cd $BACKUP_LOCATION tar xf datarobot-pgsql-backup-<date>.tar -C $BACKUP_LOCATIONYour uncompressed backup data (directories for each database) should now be in a path like
$BACKUP_LOCATION/pgsql/. -
If your individual database backup files are in
.dat.gzformat (e.g., from a file-per-database backup),gunzipthem:cd $BACKUP_LOCATION/pgsql # Or wherever your .dat.gz files are gunzip *.gzEnsure all database backup files are in uncompressed
.datformat or in the directory format expected bypg_restore -Fd.
Option 1: Restore by copying backup to an mmapp pod¶
This option is recommended if the database size is relatively small (e.g., smaller than 30GB), as it uses an existing application pod.
-
Define source and destination paths. Replace
<AVAILABLE_DIR_ON_POD>with a path on themmapppod that has sufficient space for the uncompressed backup data.export LOCAL_BACKUP_PGSQL_PATH=$BACKUP_LOCATION/pgsql # Path to local uncompressed pgsql backup directory export POD_RESTORE_LOCATION_PGSQL=/<AVAILABLE_DIR_ON_POD>/datarobot-backups/pgsql-backup/ -
Copy the uncompressed backup data (the
pgsqldirectory containing individual database backup directories) to a runningmmapp-apppod. Replace<mmapp-pod-name>with an actual pod name.kubectl cp $LOCAL_BACKUP_PGSQL_PATH $DR_CORE_NAMESPACE/<mmapp-pod-name>:$POD_RESTORE_LOCATION_PGSQL -
Execute
pg_restorefrom within themmapp-apppod for each database.Replace
<mmapp-pod-name>,<release-version>(e.g.,10.1.2),<PGSQL_EXTERNAL_HOST>,<PGSQL_EXTERNAL_USER>and ensurePGPASSWORDis set in the pod's environment if not using a connection string that includes it.kubectl exec -it <mmapp-pod-name> -n $DR_CORE_NAMESPACE -- /entrypoint.sh bash # Inside the mmapp pod: # export POD_BACKUP_ROOT=/<AVAILABLE_DIR_ON_POD>/datarobot-backups/pgsql-backup/pgsql # Path to the parent of database dump dirs # export PGUSER="<YOUR_EXTERNAL_PGSQL_USER>" # User for the external DB # export PGPASSWORD="<YOUR_EXTERNAL_PGSQL_PASSWORD>" # Password for the external DB # cd /opt/datarobot-libs/virtualenvs/datarobot-<release-version>/bin # for db_backup_dir in $(find $POD_BACKUP_ROOT -mindepth 1 -maxdepth 1 -type d ! -name postgres ! -name sushihydra ! -name identityresourceservice ! -name cnshydra ); do # Adjust cnshydra exclusion based on DR version # db_name=$(basename "$db_backup_dir") # echo "Restoring database: $db_name" # ./pg_restore -v -U $PGUSER -h <PGSQL_EXTERNAL_HOST> -p 5432 -cC -j4 -d postgres "$db_backup_dir" # done
Note
The -d postgres connects to the postgres maintenance database to issue CREATE DATABASE commands (due to -C). Ensure the user has creation rights or pre-create databases and adjust flags.
- Once the restore is complete, scale your DataRobot application deployments back up.
Option 2: Restore using a temporary Kubernetes pod¶
This option is recommended if the database size is larger (e.g., greater than 50GB).
- Ensure the temporary pod you provision will have a mount point with sufficient space for the uncompressed backup data.
-
Create a pod definition YAML file (e.g.,
pgsql-restore-pod.yaml). Choose an image that includes PostgreSQL client tools (pg_restore) compatible with your backup and target database, or a base image where you can install them. Replace<NAMESPACE>with your DataRobot namespace.apiVersion: v1 kind: Pod metadata: name: pgsql-restore-temp-pod namespace: <NAMESPACE> spec: containers: - name: pgsql-restore-container image: postgres:12 # Or an image matching your pg_restore version, e.g., appropriate Azure/GCP/AWS CLI image env: # Pass credentials for your external PostgreSQL - name: PGUSER value: "<YOUR_EXTERNAL_PGSQL_USER>" - name: PGPASSWORD value: "<YOUR_EXTERNAL_PGSQL_PASSWORD>" - name: PGHOST value: "<YOUR_EXTERNAL_PGSQL_HOSTNAME>" - name: PGPORT value: "5432" # Default, adjust if needed command: - tail - -f - /dev/null # volumeMounts: # If you need to mount persistent storage for the backup data # - name: backup-data-volume # mountPath: /restore-data # volumes: # If using a persistent volume for backup data # - name: backup-data-volume # persistentVolumeClaim: # claimName: my-pvc-for-restore-data -
Apply the pod definition to your namespace:
kubectl apply -f pgsql-restore-pod.yaml -n <NAMESPACE> -
Copy the uncompressed backup data (the
pgsqldirectory) to the temporary pod. Replace$LOCAL_BACKUP_PGSQL_PATHwith the path to your local uncompressedpgsqldirectory,<temp-pod-name>(e.g.,pgsql-restore-temp-pod), and$POD_RESTORE_LOCATION_PGSQLwith a path inside the pod (e.g.,/restore-data/pgsql).# Wait for the pod to be running: kubectl get pods -n <NAMESPACE> | grep pgsql-restore-temp-pod export LOCAL_BACKUP_PGSQL_PATH=$BACKUP_LOCATION/pgsql export POD_RESTORE_LOCATION_PGSQL=/restore-data/pgsql kubectl cp $LOCAL_BACKUP_PGSQL_PATH <NAMESPACE>/<temp-pod-name>:$POD_RESTORE_LOCATION_PGSQL -
Perform the restore from within the temporary pod.
kubectl exec -it <temp-pod-name> -n <NAMESPACE> -- bash # Inside the temporary pod: # If PostgreSQL client tools are not on the image, install them. # Example for Debian/Ubuntu based image (like postgres:12): # apt-get update && apt-get install -y postgresql-client # Example for RHEL/CentOS based image (if you used amazon/aws-cli and it's Amazon Linux): # amazon-linux-extras install postgresql<VERSION_NUMBER_NO_DOT> # e.g., postgresql12 # # Now run pg_restore using the environment variables for credentials and host: # export POD_BACKUP_ROOT=/restore-data/pgsql # Path to the parent of database dump dirs # for db_backup_dir in $(find $POD_BACKUP_ROOT -mindepth 1 -maxdepth 1 -type d ! -name postgres ! -name sushihydra ! -name identityresourceservice ! -name cnshydra ); do # Adjust cnshydra exclusion # db_name=$(basename "$db_backup_dir") # echo "Restoring database: $db_name" # pg_restore -v -U $PGUSER -h $PGHOST -p $PGPORT -cC -j4 -d postgres "$db_backup_dir" # done -
Once the restore is complete, delete the temporary pod:
kubectl delete pod <temp-pod-name> -n <NAMESPACE> -
Scale your DataRobot application deployments back up.