How to Backup Elasticsearch Data of Production Environment to S3 Bucket

Harsh Vardhan Jul 5th, 2021

elasticsearch data to s3 bucket

Before proceeding further, let’s understand what Elasticsearch is and why we need to backup its data to s3 bucket.

What is Elasticsearch?

Elasticsearch is an open source tool with search engine compliant to restful webservice. Elasticsearch is built on Apache Lucene and released under Apache license. It allows us to store, search, and analyze big volumes of data very quickly and in near real time.

Why do we need to backup data to s3 bucket?

Suppose you have elasticsearch running in your production environment, say it is running on KUBERNETES and has a large amount of data. If for some reason your elasticsearch cluster falls down, all the data that was in your elasticsearch cluster will also be lost and you will be sitting without the data. So to overcome this scenario we need to have a backup of our elasticsearch data to s3 bucket so that if data is lost somehow due to any reason, we can restore that data to our elasticsearch cluster.

For proceeding further we need to setup a few things:-

  1. Create an IAM user and provide the programmatic access for s3 and save its access key and secret key.
  2. S3 repository plugin for elasticsearch has to be installed.
  3. Elasticsearch-keystore plugin has to be installed for storing the access key and secret key for the user that we have created in first step.
  4. A elasticsearch.yml file which consists of s3 default endpoint.

If you already know how to install a plugin in s3 then it’s fantastic, if not this blog will help you through steps to install the plugin in elasticsearch.

Since we are talking about the production environment, say kubernetes, we need to have our custom docker image with all the necessary setup and plugins so that we can deploy the same image for elasticsearch on kubernetes. Let’s proceed further and look at how this can be achieved.

We will now create a Dockerfile which will consist of all the steps required that we just mentioned above, you can copy the Dockerfile mentioned below and update it according to your requirement.

FROM docker.elastic.co/elasticsearch/elasticsearch:7.7.1
RUN yes | /usr/share/elasticsearch/bin/elasticsearch-plugin install  repository-s3
COPY elasticsearch.yml /usr/share/elasticsearch/config/elasticsearch.yml
RUN \
    /usr/share/elasticsearch/bin/elasticsearch-keystore create && \
	echo "ACCESS-KEY"  | /usr/share/elasticsearch/bin/elasticsearch-keystore add --stdin s3.client.default.access_key  && \
	echo "SECRET-KEY"  | /usr/share/elasticsearch/bin/elasticsearch-keystore add --stdin s3.client.default.secret_key

Remember to update your access key and secret key of the IAM user with s3 permission that we created earlier. The elasticsearch.yml file that we are copying in our docker image is mentioned below:-

cluster.name: "docker-cluster"
network.host: 0.0.0.0
s3.client.default.endpoint: s3.amazonaws.com

Now we create a docker image with the help of this Dockerfile, execute the below command

sudo docker build -t <name-you-want-to-tag-your-image> -f /path/to/Dockerfile

Now push this image to your docker registry by executing

sudo docker push <image-name>

Congratulations! You now have your own custom image for elasticsearch with all the required plugins for this operation.

Now deploy the elasticsearch cluster on your production environment or kubernetes environment with the custom image that we created above and wait for it to stabilize.

Next, check your elasticsearch cluster, you should have a response as shown in the below image. If not, you may want to check if you’ve missed something in between.

{
  "name" : "es01",
  "cluster_name" : "es-docker-cluster",
  "cluster_uuid" : "A9AICCuxTi2lITqr2OJS2w",
  "version" : {
    "number" : "7.6.2",
    "build_flavor" : "default",
    "build_type" : "docker",
    "build_hash" : "ef48eb35cf30adf4db14086e8aabd07ef6fb113f",
    "build_date" : "2020-03-26T06:34:37.794943Z",
    "build_snapshot" : false,
    "lucene_version" : "8.4.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

Now you can set up a repository where you can store your elasticsearch data or snapshots.

curl -X PUT "localhost:9200/_snapshot/yourcustom_s3_repo?pretty" -H 'Content-Type: application/json' -d'
{
  "type": "s3",
  "settings": {
    "bucket": "<yourbucket>"
  }
}
'

If everything goes fine then you will get the below response.

{
 "acknowledged" : true
}

Now you are all set “snapshotting” your Elasticsearch indices; check out the repository configuration:

curl "localhost:9200/_snapshot?pretty"

You should receive a response like this:

{
 "yourcustom_s3_repo" : {
   "type" : "s3",
   "settings" : {
     "bucket" : "yourbucket"
   }
 }
}

Moving forward our next step is create snapshot via curl, execute the below command for creating snapshot

curl -X PUT “localhost:9200/_snapshot/yourcustom_s3_repo/snapshot_1?wait_for_completion=true&pretty”

With the parameter wait_for_completion that we have added in above command, will wait until the snapshot is completely created. The response that you will be getting after creating the snapshot will be similar to the one mentioned below:-

{
  "snapshot" : {
    "snapshot" : "snapshot_1",
    "uuid" : "WiNVFShuRzmNBmnkxoC20A",
    "version_id" : 7060299,
    "version" : "7.6.2",
    "indices" : [ ],
    "include_global_state" : true,
    "state" : "SUCCESS",
    "start_time" : "2020-05-30T11:44:43.972Z",
    "start_time_in_millis" : 1590839083972,
    "end_time" : "2020-05-30T11:44:44.173Z",
    "end_time_in_millis" : 1590839084173,
    "duration_in_millis" : 201,
    "failures" : [ ],
    "shards" : {
      "total" : 0,
      "failed" : 0,
      "successful" : 0
    }
  }
}

The message tells us that the snapshot has been created. if we will get our S3 bucket, we will put our elasticsearch snapshot there.

This is how you can backup elasticsearch data to s3.

Harsh Vardhan
Harsh Vardhan is DevOps lead at Enveu and heads the infrastructure and deployment team. He has been working on DevOps space along with various cloud tools and also has experience of working on Amazon Web Services.

Add a Comment

Your email address will not be published. Required fields are marked *

Looking for Streaming Solutions?

  • Go-Live on 12+ platforms
  • Zero Revenue Share
  • Multiple Monetization Models
  • 50+ Integrations

get a demo today

Take control of your digital media strategy.
Contact us for a no-obligation demo of the Experience
Cloud, tailor-made for you!