Skip to content

Back up Elasticsearch

Manage Elasticsearch

備考

You must fulfill the prerequisites before proceeding.

Due to specific security settings, DataRobot suggests using the curl utility within the Elasticsearch containers for the following operations (e.g., attach it to the container in pcs-elasticsearch-master stateful set):

kubectl -n $NAMESPACE exec -it sts/pcs-elasticsearch-master -- /bin/bash 

Then retrieve the cluster information:

curl -k -u elastic:$ELASTICSEARCH_PASSWORD "https://localhost:9200/" 

備考

The inclusion of -k is mandatory to allow an insecure connection as well as -u elastic:$ELASTICSEARCH_PASSWORD because non-authorized access is not allowed.

Local file system backup

Starting from version 10.2, additional mount point is configured to store backup locally. To take a snapshot under the /snapshots directory:

$ curl -X PUT -k -u elastic:$ELASTICSEARCH_PASSWORD "https://localhost:9200/_snapshot/dr_repository?pretty" \
 -H 'Content-Type: application/json' -d'
{
  "type": "fs",
  "settings": {
    "location": "/snapshots"
  }
}
'

$ curl -k -X PUT -u elastic:$ELASTICSEARCH_PASSWORD \
    "https://localhost:9200/_snapshot/dr_repository/dr_snapshot?wait_for_completion=true&pretty"


$ ls /snapshots/
index-0  index.latest  meta-f7MUCz9BR5GFlNd0Voct0g.dat  snap-f7MUCz9BR5GFlNd0Voct0g.dat 

Register a snapshot repository

Elasticsearch can store snapshots on different external locations such as AWS S3 bucket, Azure Blob Storage, or shared NFS volume.

Shared filesystem (NFS) repository

Elasticsearch distribution delivered with the DataRobot application allows you to configure Elasticsearch to store snapshots on an NFS volume. Check Snapshot and restore operations for more information on using this method. Note that this method requires an NFS server continuously available in your network.

Other repository types

Elasticsearch can store snapshots on S3, Google Cloud, or Azure Blob Storage. If you prefer any of these methods, refer to the appropriate section of the Register a snapshot repository guide.

Example for adding AWS S3 repository

Follow the official guide to configure either an S3 IAM Role or Service Account that has access to store backups.

If you have an AWS IAM role assigned to an IAM user, follow the steps below to export the access ID and secret key to configure the S3 repository:

kubectl -n $NAMESPACE exec -it pcs-elasticsearch-master-0 -- /bin/bash
/opt/bitnami/elasticsearch/bin/elasticsearch-keystore add s3.client.default.access_key 
/opt/bitnami/elasticsearch/bin/elasticsearch-keystore add s3.client.default.secret_key 

When prompted, input your AWS Access Key ID and Secret Access Key.

You can also show these values to confirm if correct values are set for AWS Access Key ID and Secret Access Key

/opt/bitnami/elasticsearch/bin/elasticsearch-keystore show s3.client.default.access_key
/opt/bitnami/elasticsearch/bin/elasticsearch-keystore show s3.client.default.secret_key 

After adding the credentials, reload the secure settings across all Elasticsearch nodes to ensure they are applied:

curl -k -X POST -u elastic:$ELASTICSEARCH_PASSWORD \
    -H "Content-Type: application/json" \
    "https://localhost:9200/_nodes/reload_secure_settings" \
    -d '{"secure_settings_password": ""}' 

Assuming cluster nodes can access S3 bucket "dr_repository_bucket" without a password, the following command executed from any DataRobot container (see Manage Elasticsearch section above), will create a snapshot repository in "dr_repository_bucket":

curl -k -X PUT -u elastic:$ELASTICSEARCH_PASSWORD -H "Content-Type: application/json" \ 
"https://localhost:9200/_snapshot/dr_repository?pretty" -d'
{
  "type": "s3",
  "settings": {
    "bucket": "dr_repository_bucket"
  }
}
' 

If the customer has provisioned a service account that has access to S3, follow the steps below:

備考

Since the following are manual edits, they are overwritten during upgrades. It is acceptable for these to be overwritten, as these steps are only necessary when preparing for backup and subsequent restoration.

Edit the PCS helm chart to add this init script under the Elasticsearch block:

  initScripts:
    setup_s3_access.sh: |
      #!/bin/sh
      mkdir -p /opt/bitnami/elasticsearch/config/repository-s3
      chown {{ .Values.master.containerSecurityContext.runAsUser }}:{{ .Values.master.podSecurityContext.fsGroup }} /opt/bitnami/elasticsearch/config/repository-s3
      ln -svf $AWS_WEB_IDENTITY_TOKEN_FILE /opt/bitnami/elasticsearch/config/repository-s3/aws-web-identity-token-file
      export SIZE_OF_SECRETS_FILE=$(wc -c /opt/bitnami/elasticsearch/config/repository-s3/aws-web-identity-token-file | awk '{print $1}')
      info "S3 access setup successfully for snapshot and restore. Size of secrets file $SIZE_OF_SECRETS_FILE" 

Here is a sample elasticsearch block with values to indicate where the init script needs to be placed. Do not copy any other values from below:

elasticsearch:
  coordinating:
    replicaCount: 0
  data:
    replicaCount: 3
  fullnameOverride: pcs-elasticsearch
  image:
    registry: docker.io
    repository: bitnami/elasticsearch
    tag: 8.12.2-debian-12-r1
  ingest:
    replicaCount: 0
  initScripts:
    setup_s3_access.sh: |
      #!/bin/sh
      mkdir -p /opt/bitnami/elasticsearch/config/repository-s3
      chown {{ .Values.master.containerSecurityContext.runAsUser }}:{{ .Values.master.podSecurityContext.fsGroup }} /opt/bitnami/elasticsearch/config/repository-s3
      ln -svf $AWS_WEB_IDENTITY_TOKEN_FILE /opt/bitnami/elasticsearch/config/repository-s3/aws-web-identity-token-file
      export SIZE_OF_SECRETS_FILE=$(wc -c /opt/bitnami/elasticsearch/config/repository-s3/aws-web-identity-token-file | awk '{print $1}')
      info "S3 access setup successfully for snapshot and restore. Size of secrets file $SIZE_OF_SECRETS_FILE"
  master:
    containerSecurityContext:
      seccompProfile: null
    masterOnly: false
    persistence:
      size: 20Gi
    replicaCount: 3
    resources:
      limits:
        cpu: 2000m
        memory: 3Gi
      requests:
        cpu: 250m
        memory: 512Mi
    serviceAccount:
      create: true
      name: pcs-elasticsearch-sa
  security:
    enabled: true
    existingSecret: pcs-elasticsearch
    tls:
      autoGenerated: true
  sysctlImage:
    enabled: true
    tag: 12-debian-12-r18
extraObjects: [] 

Run helm upgrade on PCS:

helm upgrade pcs datarobot-pcs-ha-10.1.0.tgz -n $NAMESPACE -f <updated-values-with-initscript.yaml> 

Update pcs-elasticsearch-master statefulset to mount service account tokens that allow access to s3:

  1. under spec:containers:env:

            - name: AWS_ROLE_ARN
              value: arn:aws:iam::<account-number>:role/<irsa-role-defined-for-cluster>
            - name: AWS_WEB_IDENTITY_TOKEN_FILE
              value: /var/run/secrets/eks.amazonaws.com/serviceaccount/token 
    
  2. under mountPath::

            - mountPath: /var/run/secrets/eks.amazonaws.com/serviceaccount
              name: aws-iam-token 
    
  3. under volumes::

          - name: aws-iam-token
            projected:
              defaultMode: 420
              sources:
              - serviceAccountToken:
                  audience: sts.amazonaws.com
                  expirationSeconds: 86400
                  path: token 
    

Apply the above modified values:

kubectl apply -f <above-updated-config.yaml> -n $NAMESPACE 

This should get the service token mounted on the right path for Elasticsearch to configure snapshot registry.

curl -k -X PUT -u elastic:$ELASTICSEARCH_PASSWORD -H "Content-Type: application/json" \
"https://localhost:9200/_snapshot/es_backup?pretty" -d '{
  "type": "s3",
  "settings": {
    "bucket": "<bucket_name_in_s3>",
    "region": "us-east-1",
    "base_path": "<any_sub_folders_in_s3>"
  }
}' 

Example for adding GCP repository

Follow the official guide to configure a Service Account that has access to store backups. Also ensure that the customer has created a new key for the service account.

Your JSON credentials file needs to look like this.

Setup elasticsearch keystore with the above credential file

Client settings are needed to establish connectivity between elasticsearch and google cloud storage. The default client name looked up by a gcs repository is called default

Copy the above credentials into elasticsearch primary pod

kubectl cp </path/to/local/service-account.json> $NAMESPACE/pcs-elasticsearch-master-0:/tmp/service-account.json 

Exec into elasticsearch master pod to setup keystore

kubectl exec -it pcs-elasticsearch-master-0 -n $NAMESPACE /entrypoint -- bash
I have no name!@pcs-elasticsearch-master-0:$ cd /opt/bitnami/elasticsearch/bin
I have no name!@pcs-elasticsearch-master-0:/opt/bitnami/elasticsearch/bin$ % elasticsearch-keystore add-file gcs.client.default.credentials_file /tmp/service-account.json 

The keystore is setup, register repository with default client name.

curl -k -X PUT -u elastic:$ELASTICSEARCH_PASSWORD -H "Content-Type: application/json" \
"https://localhost:9200/_snapshot/es_backup?pretty" -d '{
  "type": "gcs",
  "settings": {
    "bucket": "<google_cloud_storage_name>",
    "client": "default",
    "base_path": "<any_sub_folders_in_gcs>"
  }
}' 

Example for adding Azure repository

Follow the official guide to configure Azure credentials that have access to store backups.

Get the account and key for Azure blob storage. By default, Azure repositories use a client named default.

kubectl exec -it pcs-elasticsearch-master-0 -n $NAMESPACE /entrypoint -- bash
I have no name!@pcs-elasticsearch-master-0:$ cd /opt/bitnami/elasticsearch/bin
I have no name!@pcs-elasticsearch-master-0:/opt/bitnami/elasticsearch/bin$ % elasticsearch-keystore add azure.client.default.account
I have no name!@pcs-elasticsearch-master-0:/opt/bitnami/elasticsearch/bin$ % elasticsearch-keystore add azure.client.default.key 

Once the keys are added, you can register snapshot repository

curl -k -X PUT -u elastic:$ELASTICSEARCH_PASSWORD -H "Content-Type: application/json" \
"https://localhost:9200/_snapshot/es_backup?pretty" -d '{
  "type": "azure",
  "settings": {
    "container": "<azure_blob_storage_name>",
    "client": "default",
    "base_path": "<any_sub_folders_in_azure>"
  }
}' 

Manually create a snapshot

ヒント

Instead of manually creating the snapshot, you can automate this operation with Snapshot lifecycle management (SLM).

Once the snapshot repository is registered, you can manually create snapshot from container. The following example creates snapshot names dr_snapshot in the snapshot repository named dr_repository

curl -k -X PUT -u elastic:$ELASTICSEARCH_PASSWORD \
    "https://localhost:9200/_snapshot/dr_repository/dr_snapshot?wait_for_completion=true&pretty" 

Note that wait_for_completion is optional. The operation runs in the background if omitted.