Elasticsearch Restore¶

注意 If DataRobot application is configured to use managed services (external PCS), then instead of this guide, please refer yourself to Restoring snapshots guide from Amazon.

Prerequisites:¶

Utility kubectl of version 1.23 is installed on the host where backup will be created
Utility kubectl is configured to access the Kubernetes cluster where DataRobot application is running, verify this with kubectl cluster-info command.

Manage Elasticsearch¶

Due to specific security settings, we suggest using the curl utility within the elasticsearch containers for all following operations. For example, attach to container in pcs-elasticsearch-master stateful set:

kubectl -n $DR_CORE_NAMESPACE exec sts/pcs-elasticsearch-master -- /bin/bash

Then retrieve the cluster information:

curl -k -u elastic:$ELASTICSEARCH_PASSWORD "https://localhost:9200/"

option -k is mandatory to allow insecure connection
as well as -u elastic:$ELASTICSEARCH_PASSWORD since non-authorized access is not allowed.

Restore from filesystem snapshot¶

Starting 10.2, additional mount point is configured to store backup locally. If backup is stored under local filesystem in elasticsearch, please follow below steps

Look for snapshot under /snapshots directory in pcs-elasticsearch-master node

$ ls /snapshots/
index-0  index.latest  meta-f7MUCz9BR5GFlNd0Voct0g.dat  snap-f7MUCz9BR5GFlNd0Voct0g.dat

Run restore

curl -k -X POST -u elastic:$ELASTICSEARCH_PASSWORD "https://localhost:9200/_snapshot/snapshots/<snapshot-name>/_restore" -H "Content-Type: application/json" -d '{
  "indices": "*",
  "ignore_unavailable": true,
  "include_global_state": true
}'

Register a snapshot repository¶

You can configure Elasticsearch to read snapshots from different external locations: AWS S3 bucket, Azure Blob Storage, shared NFS volume, etc.

Shared filesystem (NFS) repository¶

Elasticsearch distribution delivered with DataRobot application allows you to configure Elasticsearch to keep snapshots on NFS volume. Please check Snapshot and restore operations for more information on using this method. Please note that this method requires an NFS server continuously available in your network.

Other repository types¶

Elasticsearch can also store snapshots on S3, Google Cloud or Azure Blob Storage. If your snapshots are stored using one of these ways, please refer yourself to the appropriate section of Register a snapshot repository guide.

Registering snapshot repository¶

Please refer to backup guide on how to configure snapshot registry, this restore guide assumes that the backup snapshot is accessible from the same registry to perform restore.

Restore data from snapshot¶

Snapshots can be manually restored according to Restore a snapshot guide. Please follow Restore Entire Cluster section.

Is the snapshot is available in s3 registry, you can follow below steps

Get a list of all indices

export DR_CORE_NAMESPACE=dr-app
kubectl -n $DR_CORE_NAMESPACE exec -it pcs-elasticsearch-master-0 -- /bin/bash
curl -k -X GET -u elastic:$ELASTICSEARCH_PASSWORD "https://localhost:9200/_cat/indices"

Remove all existing indices if restoring onto an existing cluster with data, single command will not work and will throw error
```
curl -k -XDELETE -u elastic:$ELASTICSEARCH_PASSWORD "https://localhost:9200/<index-name-from-step-1>" 
```
Once deleted, run restore using the backup snapshot in s3 repository

curl -k -X POST -u elastic:$ELASTICSEARCH_PASSWORD "https://localhost:9200/_snapshot/repository_name/<snapshot-name>/_restore" -H "Content-Type: application/json" -d '{
  "indices": "*",
  "ignore_unavailable": true,
  "include_global_state": true
}'