Large scale distributed search using Elasticsearch
Introduction
Search engines are essential tools for managing and mining big text data. Recent years have seen tremendous increase in text data. Managing and mining text data is one of the key challenges. Many open source frameworks and libraries like Lucene, Solr, Sphinx and Elasticsearch exist to tackle this challenge.
This session would cover how Elasticsearch(A distributed, scalable and highly available search server based on Lucene) can be integrated with a Drupal site to offer large scale distributed search which is one of the key demands of any enterprise Drupal site.
Topics
This session would cover following topics
- Introduction to text retrieval
- Difference between text retrieval and database retrieval
- Introduction to Elasticsearch
- Elasticsearch and Drupal integration
- Clustering in elasticsearch: We would touch base upon some elasticsearch concepts like clusters, nodes, shards(primary, replica) etc. This section would cover how clustering in elasticsearch can be used to create distributed search cluster with failover and how horizontal scaling can be used to add more nodes to clusters to offer more scalable search.
- Integrate elasticsearch in Drupal using Elastic Search Connector and Search API modules. This would be the fun part when we would start killing nodes in the cluster and see our search still working and see how nodes readjust among themselves in the cluster to provide failover :)
Tools, Frameworks and Modules used
- Drupal
- Elasticsearch
- Vagrant and Virtualbox: Vagrant and virtualbox would be used to create multiple virtual machines where elasticsearch nodes would reside and talk to each other to create a cluster.
- Elastic Search Connector Module, Search API Module
Session Format
The session would be a mix of presentation and live demo using above tools.
At the end of the session participants would gain knowledge about how easy it is to configure clustering in Elasticsearch and the ease of integrating with Drupal site using modules mentioned above.