Skip to content

Elasticsearch Restore

注意 If DataRobot application is configured to use managed services (external PCS), then instead of this guide, please refer yourself to Restoring snapshots guide from Amazon.

Prerequisites:

  • Utility kubectl of version 1.23 is installed on the host where backup will be created
  • Utility kubectl is configured to access the Kubernetes cluster where DataRobot application is running, verify this with kubectl cluster-info command.

Manage Elasticsearch

Due to specific security settings, we suggest using the curl utility within the elasticsearch containers for all following operations. For example, attach to container in pcs-elasticsearch-master stateful set:

kubectl -n $DR_CORE_NAMESPACE exec sts/pcs-elasticsearch-master -- /bin/bash 

Then retrieve the cluster information:

curl -k -u elastic:$ELASTICSEARCH_PASSWORD "https://localhost:9200/" 

  • option -k is mandatory to allow insecure connection
  • as well as -u elastic:$ELASTICSEARCH_PASSWORD since non-authorized access is not allowed.

Restore from filesystem snapshot

Starting 10.2, additional mount point is configured to store backup locally. If backup is stored under local filesystem in elasticsearch, please follow below steps

Look for snapshot under /snapshots directory in pcs-elasticsearch-master node

$ ls /snapshots/
index-0  index.latest  meta-f7MUCz9BR5GFlNd0Voct0g.dat  snap-f7MUCz9BR5GFlNd0Voct0g.dat 

Run restore

curl -k -X POST -u elastic:$ELASTICSEARCH_PASSWORD "https://localhost:9200/_snapshot/snapshots/<snapshot-name>/_restore" -H "Content-Type: application/json" -d '{
  "indices": "*",
  "ignore_unavailable": true,
  "include_global_state": true
}' 

Register a snapshot repository

You can configure Elasticsearch to read snapshots from different external locations: AWS S3 bucket, Azure Blob Storage, shared NFS volume, etc.

Shared filesystem (NFS) repository

Elasticsearch distribution delivered with DataRobot application allows you to configure Elasticsearch to keep snapshots on NFS volume. Please check Snapshot and restore operations for more information on using this method. Please note that this method requires an NFS server continuously available in your network.

Other repository types

Elasticsearch can also store snapshots on S3, Google Cloud or Azure Blob Storage. If your snapshots are stored using one of these ways, please refer yourself to the appropriate section of Register a snapshot repository guide.

Registering snapshot repository

Please refer to backup guide on how to configure snapshot registry, this restore guide assumes that the backup snapshot is accessible from the same registry to perform restore.

Restore data from snapshot

Snapshots can be manually restored according to Restore a snapshot guide. Please follow Restore Entire Cluster section.

Is the snapshot is available in s3 registry, you can follow below steps

  1. Get a list of all indices
export DR_CORE_NAMESPACE=dr-app
kubectl -n $DR_CORE_NAMESPACE exec -it pcs-elasticsearch-master-0 -- /bin/bash
curl -k -X GET -u elastic:$ELASTICSEARCH_PASSWORD "https://localhost:9200/_cat/indices" 
  1. Remove all existing indices if restoring onto an existing cluster with data, single command will not work and will throw error

    curl -k -XDELETE -u elastic:$ELASTICSEARCH_PASSWORD "https://localhost:9200/<index-name-from-step-1>" 
    

  2. Once deleted, run restore using the backup snapshot in s3 repository

curl -k -X POST -u elastic:$ELASTICSEARCH_PASSWORD "https://localhost:9200/_snapshot/repository_name/<snapshot-name>/_restore" -H "Content-Type: application/json" -d '{
  "indices": "*",
  "ignore_unavailable": true,
  "include_global_state": true
}'