Elasticsearch For .NET Developer : What and Why Elasticsearch ?

What is ElasticSearch ?

Elasticsearch is an open source, NoSQL, distributed, and highly scalable full text search database that is built on top of Lucene, a search engine library written in Java. It provides extremely fast searches and makes data discovery easy.

It’s a NoSQL database because it’s a document based storage system instead of using tables and columns. It does not use standard SQL(structured query language) for searching. It doesn’t even require a schema ahead of time to store data, instead it was built from the ground up to store documents, lots and lots of documents.

Its distributed nature and ease of use makes it very easy to get started and scale as you have more data.

What makes Elasticsearch different from any other document stores out there?
  • Elasticsearch is a search engine and not just a key-value store.
  • It’s a very powerful analytical engine. All Queries that you usually run in a batch can be executed in real time.
  • It Supports features such as autocomplete, geo-location-based filters, multilevel aggregations etc.

 

Key Features in Elasticsearch

It has the following main features:

  1. REST-based : It provides REST API endpoints to perform CRUD operations.  It provides a number of search and analysis capabilities on your data over a RESTful interface.
  2. Distributed : Elasticsearch is distributed in nature and has been designed for scaling horizontally and not vertically. You can start with a single-node Elasticsearch cluster and can scale that cluster to hundreds or thousands of nodes without worrying about the internal complexities that normally come with distributed computing and searches.
  3. High Availability : Data replication feature keeps multiple copies of documents in the cluster to provide higher availability and avoid data loss in the case of system failure.
  4. Powerful Query DSL : Elasticsearch offers a powerful query DSL(domain-specific language) to define queries to execute against Elasticsearch. The query DSL for ElasticSearch is a bit different to Lucene and SQL to write and read queries in a very easy way.

 

Elasticsearch Common Terms

There are a few terms that come with Elasticsearch and their understanding is crucial to fully understand how Elasticsearch works.

  • Node : A single instance of Elasticsearch running on a machine.
  • Cluster : A cluster is the single name under which one or more nodes/instances of Elasticsearch are connected to each other.
  • Document : A document is a JSON object that contains the actual data in key value pairs. In the Elasticsearch world, it is all about searching for documents and analyzing them.
  • Index : A logical namespace under which Elasticsearch stores data. Index is something similar to a database in SQL world. It is used to store the documents and read them from it.
  • Type : A type in Elasticsearch represents a class of similar documents. A type consists of a name and a mapping, including data types and the Lucene configurations for each field. A type in Elasticsearch is something similar to a database table. An index can contain more than one type.
  • Mapping : Mapping is intended to define the structure and field types of the documents as required. Elasticsearch can automatically discover field type by looking at its value, but in most cases we will want to configure the mappings ourselves.
  • Shard : An index is a collection of one or more shards. Elasticsearch spreads index data to multiple physical Lucene indices called shards. Each shard can be placed on a different server, and thus, your data can be spread among the cluster nodes. When you query an index that is built from multiple shards, Elasticsearch sends the query to each relevant shard and merges the result in such a way that your application doesn’t know about the shards. By spreading the data that belongs to an index to multiple shards, Elasticsearch can store information beyond what a single server can store.
  • Replica : In order to achieve high availability, shard replicas can be used. A replica is just an exact copy of the shard(primary), and each shard can have zero or more replicas. When the primary shard is lost, the cluster will promote the replica to be the new primary shard.

 

Useful Post :

If you find this article useful and informative, please feel free to share it with your friends and colleagues on Facebook, Twitter, and Google Plus.

Happy Learning 🙂

Rahul is a Data Geek, technology enthusiast, a passionate writer, thinker with passion for computer programming.  He loves to explore technology and finds ultimate joy when writing about trending technology, geek stuff and web development.