Back up Elasticsearch¶

備考

You must fulfill the prerequisites before proceeding.

Due to specific security settings, DataRobot suggests using the curl utility within the Elasticsearch containers for the following operations (for example, attach it to the container in pcs-elasticsearch-master stateful set):

kubectl -n $NAMESPACE exec -it sts/pcs-elasticsearch-master -- /bin/bash

Then retrieve the cluster information:

curl -k -u elastic:$ELASTICSEARCH_PASSWORD "https://localhost:9200/"

備考

The inclusion of -k is mandatory to allow an insecure connection as well as -u elastic:$ELASTICSEARCH_PASSWORD because non-authorized access isn't allowed.

Local file system backup¶

Starting with version 10.2, an additional mount point is configured to store backup locally. To take a snapshot under the /snapshots directory

Find the master node.

kubectl exec -it pcs-elasticsearch-master-0 -n $NS -- bash -c 'export ELASTICSEARCH_PASSWORD=$(cat /opt/iamguarded/elasticsearch/secrets/elasticsearch-password); curl -X GET -k -u elastic:$ELASTICSEARCH_PASSWORD "https://localhost:9200/_cat/master?v"'

Register snapshot directory.

$ curl -X PUT -k -u elastic:$ELASTICSEARCH_PASSWORD "https://localhost:9200/_snapshot/dr_repository?pretty" \
-H 'Content-Type: application/json' -d'
{
  "type": "fs",
  "settings": {
    "location": "/snapshots"
  }
}
'

備考

If the above command returns 500, disable the repository verification by adding "verify": false to the URL:

curl -X PUT -k -u elastic:$ELASTICSEARCH_PASSWORD \
"https://localhost:9200/_snapshot/dr_repository?verify=false&pretty" \
-H 'Content-Type: application/json' \
-d '{
  "type": "fs",
  "settings": {
    "location": "/snapshots"
  }
}'

Take a snapshot, ignoring system indices that start with the . character.

curl -k -X PUT -u elastic:$ELASTICSEARCH_PASSWORD \
"https://localhost:9200/_snapshot/dr_repository/dr_snapshot?wait_for_completion=true&pretty" \
-H "Content-Type: application/json" \
-d '{
  "indices": ["*", "-.ds-*", "-.*"],
  "ignore_unavailable": true,
  "include_global_state": false,
  "partial": false
}'

Take a snapshot from each Elasticsearch pod so that DataRobot has all indices from all shards that might be distributed.

for pod in pcs-elasticsearch-master-0 pcs-elasticsearch-master-1 pcs-elasticsearch-master-2; do
  echo "Tarring on $pod..."
  kubectl exec -it $pod -n $NS -- bash -c "tar -czvf /tmp/snapshot_${pod}.tar.gz /snapshots/"
done

To allow restoring the snapshot onto your target Elasticsearch statefulset on a different Kubernetes namespace, copy the snapshot from each pod to a provisioner.

for pod in pcs-elasticsearch-master-0 pcs-elasticsearch-master-1 pcs-elasticsearch-master-2; do
  echo "Copying from $pod..."
  kubectl cp $NS/$pod:/tmp/snapshot_${pod}.tar.gz ./snapshot_${pod}.tar.gz
done

Merge all snapshots locally.

mkdir -p merged_snapshots
for pod in pcs-elasticsearch-master-0 pcs-elasticsearch-master-1 pcs-elasticsearch-master-2; do
  echo "Extracting $pod tar..."
  tar -xzvf snapshot_${pod}.tar.gz -C merged_snapshots/ 2>/dev/null || true
done
tar -czvf es_snapshot_merged_final.tar.gz -C merged_snapshots snapshots/

The backup is now ready.

Register a snapshot repository¶

Elasticsearch can store snapshots on different external locations such as AWS S3 bucket, Azure Blob Storage, or shared NFS volume.

Shared filesystem (NFS) repository¶

Elasticsearch distribution delivered with the DataRobot application allows you to configure Elasticsearch to store snapshots on an NFS volume. Check Snapshot and restore operations for more information on using this method. Note that this method requires an NFS server continuously available in your network.

Other repository types¶

Elasticsearch can store snapshots on S3, Google Cloud, or Azure Blob Storage. If you prefer any of these methods, refer to the appropriate section of the Register a snapshot repository guide.

Example: adding an AWS S3 repository¶

Follow the official guide to configure either an S3 IAM role or service account that has access to store backups.

For IAM users¶

If you have an AWS IAM role assigned to an IAM user, follow the steps below to export the access ID and secret key to configure the S3 repository:

kubectl -n $NAMESPACE exec -it pcs-elasticsearch-master-0 -- /bin/bash
/opt/bitnami/elasticsearch/bin/elasticsearch-keystore add s3.client.default.access_key 
/opt/bitnami/elasticsearch/bin/elasticsearch-keystore add s3.client.default.secret_key

When prompted, input your AWS Access Key ID and Secret Access Key.

You can also show these values to confirm if correct values are set for AWS Access Key ID and Secret Access Key

/opt/bitnami/elasticsearch/bin/elasticsearch-keystore show s3.client.default.access_key
/opt/bitnami/elasticsearch/bin/elasticsearch-keystore show s3.client.default.secret_key

After adding the credentials, reload the secure settings across all Elasticsearch nodes to ensure they're applied:

curl -k -X POST -u elastic:$ELASTICSEARCH_PASSWORD \
    -H "Content-Type: application/json" \
    "https://localhost:9200/_nodes/reload_secure_settings" \
    -d '{"secure_settings_password": ""}'

Assuming cluster nodes can access S3 bucket "dr_repository_bucket" without a password, the following command, run from any DataRobot container (see Manage Elasticsearch section above), creates a snapshot repository in "dr_repository_bucket":

curl -k -X PUT -u elastic:$ELASTICSEARCH_PASSWORD -H "Content-Type: application/json" \ 
"https://localhost:9200/_snapshot/dr_repository?pretty" -d'
{
  "type": "s3",
  "settings": {
    "bucket": "dr_repository_bucket"
  }
}
'

With a provisional account¶

If there is a provisioned service account that has access to S3, follow the steps below.**

備考

Since the following are manual edits, they are overwritten during upgrades. It's acceptable for these to be overwritten, as these steps are only necessary when preparing for backup and subsequent restoration.

Edit the PCS helm chart to add this init script under the Elasticsearch block:

  initScripts:
    setup_s3_access.sh: |
      #!/bin/sh
      mkdir -p /opt/bitnami/elasticsearch/config/repository-s3
      chown {{ .Values.master.containerSecurityContext.runAsUser }}:{{ .Values.master.podSecurityContext.fsGroup }} /opt/bitnami/elasticsearch/config/repository-s3
      ln -svf $AWS_WEB_IDENTITY_TOKEN_FILE /opt/bitnami/elasticsearch/config/repository-s3/aws-web-identity-token-file
      export SIZE_OF_SECRETS_FILE=$(wc -c /opt/bitnami/elasticsearch/config/repository-s3/aws-web-identity-token-file | awk '{print $1}')
      info "S3 access setup successfully for snapshot and restore. Size of secrets file $SIZE_OF_SECRETS_FILE"

Here is a sample Elasticsearch block with values to indicate where the init script needs to be placed. don't copy any other values from below:

elasticsearch:
  coordinating:
    replicaCount: 0
  data:
    replicaCount: 3
  fullnameOverride: pcs-elasticsearch
  image:
    registry: docker.io
    repository: bitnami/elasticsearch
    tag: 8.12.2-debian-12-r1
  ingest:
    replicaCount: 0
  initScripts:
    setup_s3_access.sh: |
      #!/bin/sh
      mkdir -p /opt/bitnami/elasticsearch/config/repository-s3
      chown {{ .Values.master.containerSecurityContext.runAsUser }}:{{ .Values.master.podSecurityContext.fsGroup }} /opt/bitnami/elasticsearch/config/repository-s3
      ln -svf $AWS_WEB_IDENTITY_TOKEN_FILE /opt/bitnami/elasticsearch/config/repository-s3/aws-web-identity-token-file
      export SIZE_OF_SECRETS_FILE=$(wc -c /opt/bitnami/elasticsearch/config/repository-s3/aws-web-identity-token-file | awk '{print $1}')
      info "S3 access setup successfully for snapshot and restore. Size of secrets file $SIZE_OF_SECRETS_FILE"
  master:
    containerSecurityContext:
      seccompProfile: null
    masterOnly: false
    persistence:
      size: 20Gi
    replicaCount: 3
    resources:
      limits:
        cpu: 2000m
        memory: 3Gi
      requests:
        cpu: 250m
        memory: 512Mi
    serviceAccount:
      create: true
      name: pcs-elasticsearch-sa
  security:
    enabled: true
    existingSecret: pcs-elasticsearch
    tls:
      autoGenerated: true
  sysctlImage:
    enabled: true
    tag: 12-debian-12-r18
extraObjects: []

Run helm upgrade on PCS:

helm upgrade pcs datarobot-pcs-ha-10.1.0.tgz -n $NAMESPACE -f <updated-values-with-initscript.yaml>

Update pcs-elasticsearch-master statefulset to mount service account tokens that allow access to s3:

Under spec:containers:env:

        - name: AWS_ROLE_ARN
          value: arn:aws:iam::<account-number>:role/<irsa-role-defined-for-cluster>
        - name: AWS_WEB_IDENTITY_TOKEN_FILE
          value: /var/run/secrets/eks.amazonaws.com/serviceaccount/token

Under mountPath::

        - mountPath: /var/run/secrets/eks.amazonaws.com/serviceaccount
          name: aws-iam-token

Under volumes::

      - name: aws-iam-token
        projected:
          defaultMode: 420
          sources:
          - serviceAccountToken:
              audience: sts.amazonaws.com
              expirationSeconds: 86400
              path: token

Apply the above modified values:

kubectl apply -f <above-updated-config.yaml> -n $NAMESPACE

This mounts the service token on the correct path so Elasticsearch can use it for the snapshot registry.

curl -k -X PUT -u elastic:$ELASTICSEARCH_PASSWORD -H "Content-Type: application/json" \
"https://localhost:9200/_snapshot/es_backup?pretty" -d '{
  "type": "s3",
  "settings": {
    "bucket": "<bucket_name_in_s3>",
    "region": "us-east-1",
    "base_path": "<any_sub_folders_in_s3>"
  }
}'

Example: adding a GCP repository¶

Follow the official guide to configure a service account that has access to store backups. Also, ensure that a new key for the service account has been created.

Your JSON credentials file needs to look like this. Set up the Elasticsearch keystore with the credential file above.

Client settings are needed to establish connectivity between Elasticsearch and Google Cloud Storage. The default client name looked up by a GCS repository is called default

Copy the above credentials into the Elasticsearch primary pod.

kubectl cp </path/to/local/service-account.json> $NAMESPACE/pcs-elasticsearch-master-0:/tmp/service-account.json

Exec into the Elasticsearch master pod to set up the keystore.

kubectl exec -it pcs-elasticsearch-master-0 -n $NAMESPACE /entrypoint -- bash
I have no name!@pcs-elasticsearch-master-0:$ cd /opt/bitnami/elasticsearch/bin
I have no name!@pcs-elasticsearch-master-0:/opt/bitnami/elasticsearch/bin$ % elasticsearch-keystore add-file gcs.client.default.credentials_file /tmp/service-account.json

After the keystore is set up, register the repository with the default client name.

curl -k -X PUT -u elastic:$ELASTICSEARCH_PASSWORD -H "Content-Type: application/json" \
"https://localhost:9200/_snapshot/es_backup?pretty" -d '{
  "type": "gcs",
  "settings": {
    "bucket": "<google_cloud_storage_name>",
    "client": "default",
    "base_path": "<any_sub_folders_in_gcs>"
  }
}'

Example: adding an Azure repository¶

Follow the official guide to configure Azure credentials that have access to store backups.

Get the account and key for Azure blob storage. By default, Azure repositories use a client named default.

kubectl exec -it pcs-elasticsearch-master-0 -n $NAMESPACE /entrypoint -- bash
I have no name!@pcs-elasticsearch-master-0:$ cd /opt/bitnami/elasticsearch/bin
I have no name!@pcs-elasticsearch-master-0:/opt/bitnami/elasticsearch/bin$ % elasticsearch-keystore add azure.client.default.account
I have no name!@pcs-elasticsearch-master-0:/opt/bitnami/elasticsearch/bin$ % elasticsearch-keystore add azure.client.default.key

Once the keys are added, you can register snapshot repository.

curl -k -X PUT -u elastic:$ELASTICSEARCH_PASSWORD -H "Content-Type: application/json" \
"https://localhost:9200/_snapshot/es_backup?pretty" -d '{
  "type": "azure",
  "settings": {
    "container": "<azure_blob_storage_name>",
    "client": "default",
    "base_path": "<any_sub_folders_in_azure>"
  }
}'