https://www.softwaretestinghelp.com/elasticsearch-interview-questions/
Overview Of ElasticSearch
Elasticsearch is an open-source, RESTful, scalable, built on Apache Lucene library, document-based search engine. It stores retrieve and manage textual, numerical, geospatial, structured and unstructured data in the form of JSON documents using CRUD REST API or ingestion tools such as Logstash.
You can use Kibana, an open-source visualization tool, with Elasticsearch to visualize your data and build interactive dashboards for Analysis.
Elasticsearch, Apache Lucene search engine is a JSON document, which is indexed for faster searching. Due to indexing, user can search text from JSON documents within 10 seconds.
List Of Most Frequently Asked ElasticSearch Interview Questions
Q #1) Explain in brief about Elasticsearch?
Answer: Elasticsearch Apache Lucene search engine is a database that stores retrieve and manages document-oriented and semi-structured data. It provides real-time search and analytics for structured or unstructured text, numerical or geospatial data.
Q #2) Can you state the stable Elasticsearch version currently available for download?
Answer: The latest stable version of Elasticsearch is 7.5.0.
Q #3) To install Elasticsearch, what software is required as a prerequisite?
Answer: Latest JDK 8 or Java version 1.8.0 is recommended as the software required for running Elasticsearch on your device.
Q #4) Can you please give step by step procedures to start an Elasticsearch server?
Answer: The server can be started from the command line.
Following steps explain the process:
- Click on the Windows Start icon present at the bottom-left part of the desktop screen.
- Type command or cmd in the Windows Start menu and press Enter to open a command prompt.
- Change the directory up to the bin folder of the Elasticsearch folder that got created after it has been installed.
- Type /Elasticsearch.bat and press Enter to start the Elasticsearch server.
This will start Elasticsearch on command prompt in the background. Further open browser and enter http://localhost:9200 and press enter. This should display the Elasticsearch cluster name and other meta value related to its database.
Q #5) Name 10 companies that have an Elasticsearch as their search engine and database for their application?
Answer:
Following are the list of some companies that use Elasticsearch along with Logstash and Kibana:
- Uber
- Instacart
- Slack
- Shopify
- Stack Overflow
- DigitalOcean
- Udemy
- 9GAG
- Wikipedia
- Netflix
- Accenture
- Fujitsu
Q #6) Please explain Elasticsearch Cluster?
Answer: It is a group of one or more node instances connected responsible for the distribution of tasks, searching and indexing across all the nodes.
Node and Shards:
Q #7) What is a Node in Elasticsearch?
Answer: A node is an instance of Elasticsearch. Different node types are Data nodes, Master nodes, Client nodes and Ingest nodes.
These are explained as follows:
- Data nodes hold data and perform an operation such as CRUD (Create/Read/Update/Delete), search and aggregations on data.
- Master nodes help in configuration and management to add and remove nodes across the cluster.
- Client nodes send cluster requests to the master node and data-related requests to data nodes,
- Ingest nodes for pre-processing documents before indexing.
Q #8) What is an index in an Elasticsearch cluster?
Answer: An Elasticsearch cluster can contain multiple indices, which are database as compared with a relational database, these indices contain multiple types (tables). The types (tables) contain multiple Documents (records/rows) and these documents contain Properties (columns).
Q #9) What is a Type in an Elastic search?
Answer: Type, here is a table in the relational database. These types (tables) hold multiple Documents (rows), and each document has Properties (columns).
[image source]
Q #10) Can you please define Mapping in an Elasticsearch?
Answer: Mapping is the outline of the documents stored in an index. The mapping defines how a document is indexed, how its fields are indexed and stored by Lucene.
Q #11) What is a Document with respect to Elasticsearch?
Answer: A document is a JSON document that is stored in Elasticsearch. It is equivalent to a row in a relational database table.
Q #12) Can you explain SHARDS with regards to Elasticsearch?
Answer: When the number of documents increases, hard disk capacity, and processing power will not be sufficient, responding to client requests will be delayed. In such a case, the process of dividing indexed data into small chunks is called Shards, which improves the fetching of results during data search.
Q #13) Can you define REPLICA and what is the advantage of creating a replica?
Answer: A replica is an exact copy of the Shard, used to increase query throughput or achieve high availability during extreme load conditions. These replicas help to efficiently manage requests.
Q #14) Please explain the procedure to add or create an index in Elasticsearch Cluster?
Answer: To add a new index, create an index API option should be used. The parameters required to create the index is Configuration setting of an index, Fields mapping in the index as well as Index aliases
Q #15) What is the syntax or code to delete an index in Elasticsearch?
Answer: You can delete an existing index using the following syntax:
DELETE /<index_name>
_all or * can be used to remove/delete all the indices
Q #16) What is the syntax or code to list all indexes of a Cluster in Elasticsearch?
Answer: You can get the list of indices present in the cluster using the following syntax:
GET /_<index_name>
GET index_name , in above case, index_name is .kibana
Q #17) Can you tell me the syntax or code to add a Mapping in an Index?
Answer: You can add a mapping in an index using the following syntax:
POST /_<index_name>/_type/_id
Q #18) What is the syntax or code to retrieve a document by ID in Elasticsearch?
Answer: GET API retrieves the specified JSON document from an index.
Syntax:
GET <index_name>/_doc/<_id>
Q #19) Please explain relevancy and scoring in Elasticsearch?
Answer: When you search on the internet about say, Apple. It could either display the search results about fruit or company with name as an Apple. You may want to buy fruit online, check the recipe from the fruit or health benefits of eating fruit, apple.
In contrast, you may want to check Apple.com to find the latest product range offered by the company, check Apple Inc.’s stock prices and how a company is performing in NASDAQ in the last 6 months, 1 or 5 years.
Similarly, when we search for a document (a record) from Elasticsearch, you are interested in getting the relevant information that you are looking for. Based on the relevance, the probability of getting the relevant information is calculated by the Lucene scoring algorithm.
The Lucene technology helps to search a particular record i.e. document which is indexed based on the frequency of the term in search appearing in the document, how often its appearance across an index and query which is designed using various parameters.
Q #20) What are the various possible ways in which we can perform a search in Elasticsearch?
Answer:
Mentioned below are the various possible ways in which we can perform a search in Elasticsearch:
- Applying search API across multiple types and multiple indexes: Search API, we can search an entity across multiple types and indices.
- Search request using a Uniform Resource Identifier: We can search requests using parameters along with URI i.e. Uniform Resource Identifier.
- Search using Query DSL i.e. (Domain Specific Language) within the body: DSL i.e. Domain Specific Language is utilized for JSON request body.
Q #21) What are the various types of queries that Elasticsearch supports?
Answer: Queries are mainly divided into two types: Full Text or Match Queries and Term based Queries.
Text Queries such as basic match, match phrase, multi-match, match phrase prefix, common terms, query-string, simple query string.
Term Queries such as term exists, type, term set, range, prefix, ids, wildcard, regexp and, fuzzy.
Q #22) Can you compare between Term-based queries and Full-text queries?
Answer: Domain Specific Language (DSL) Elasticsearch query which is known as Full-text queries utilizes the HTTP request body, offers the advantage of clear and detailed in their intent, over time it is simpler to tune these queries.
Term based queries utilize the inverted index, a hash map-like data structure that helps to locate text or string from the body of email, keyword or numbers or dates, etc. used in analysis purposes.
Q #23) Please explain the working of aggregation in Elasticsearch?
Answer: Aggregations help in the collection of data from the query used in the search. Different types of aggregations are Metrics, Average, Minimum, Maximum, Sum and stats, based on different purposes.
Q #24) Can you tell me data storage functionality in Elasticsearch?
Answer: Elasticsearch is a search engine used as storage and searching complex data structures indexed and serialized as a JSON document.
Q #25) What is an Elasticsearch Analyzer?
Answer: Analyzers are used for Text analysis, it can be either built-in analyzer or custom analyzer. The analyzer consists of zero or more Character filters, at least one Tokenizer and zero or more Token filters.
- Character filters break down the stream of string or numerical into characters by stripping out HTML tags, searching the string for key and replacing them with the related value defined in mapping char filter as well as replace the characters based on a specific pattern.
- Tokenizer breaks the stream of string into characters, For example, whitespace tokenizer breaks the stream of string while encountering whitespace between characters.
- Token filters convert these tokens into lower case, remove from string stop words like ‘a’, ‘an’, ‘the’. or replace characters into equivalent synonyms defined by the filter.
Q #26) Can you list various types of analyzers in Elasticsearch?
Answer: Types of Elasticsearch Analyzer are Built-in and Custom.
Built-in analyzers are further classified as below:
- Standard Analyzer: This type of analyzer is designed with standard tokenizer which breaks the stream of string into tokens based on maximum token length configured, lower case token filter which converts the token into lower case and stops token filter, which removes stop words such as ‘a’, ‘an’, ‘the’.
- Simple Analyzer: This type of analyzer breaks a stream of string into a token of text whenever it comes across numbers or special characters. A simple analyzer converts all the text tokens into lower case characters.
- Whitespace Analyzer: This type of analyzer breaks the stream of string into a token of text when it comes across white space between these string or statements. It retains the case of tokens as it was in the input stream.
- Stop Analyzer: This type of analyzer is similar to that of the simple analyzer, but in addition to it removes stop words from the stream of string such as ‘a’, ‘an’, ‘the’. The complete list of stop words in English can be found from the link.
- Keyword Analyzer: This type of analyzer returns the entire stream of string as a single token as it was. This type of analyzer can be converted into a custom analyzer by adding filters to it.
- Pattern Analyzer: This type of analyzer breaks the stream of string into tokens based on the regular expression defined. This regular expression acts on the stream of string and not on the tokens.
- Language Analyzer: This type of analyzer is used for specific language texts analysis. There are plug-ins to support language analyzers. These plug-ins are Stempel, Ukrainian Analysis, Kuromoji for Japanese, Nori for Korean and Phonetic plugins. There are additional plug-ins for Indian as well as non-Indian languages such as Asian languages ( Example, Japanese, Vietnamese, Tibetan) analyzers.
[image source]
- Fingerprint Analyzer: The fingerprint analyzer converts the stream of string into lower case, removes extended characters, sorts and concatenates into a single token.
Q #27) How can Elasticsearch Tokenizer be used?
Answer: Tokenizers accept a stream of string, break them into individual tokens and display output as collection/array of these tokens. Tokenizers are mainly grouped into word-oriented, partial word, and structured text tokenizers.
Q #28) How do Filters work in an Elasticsearch?
Answer: Token filters receive text tokens from tokenizer and can manipulate them to compare the tokens for search conditions. These filters compare tokens with the searched stream, resulting in Boolean value, like true or false.
The comparison can be whether the value for searched condition matches with filtered token texts, OR does not match, OR matches with one of the filtered token text returned OR does not match any of the specified tokens, OR value of the token text is within given range OR is not within a given range, OR the token texts exist in search condition or does not exist in the search condition.
Q #29) How does an ingest node in Elasticsearch function?
Answer: Ingest node processes the documents before indexing, which takes place with help of series of processors which sequentially modifies the document by removing one or more fields followed by another processor that renames the field value. This helps normalizes the document and accelerates the indexing, resulting in faster search results.
Q #30) Differentiate between Master node and Master eligible node in Elasticsearch?
Answer: Master node functionality revolves around actions across the cluster such as the creation of index/indices, deletion of index/indices, monitor or keeps an account of those nodes that form a cluster. These nodes also decide shards allocation to specific nodes resulting in stable Elasticsearch cluster health.
Whereas, Master – eligible nodes are those nodes that get elected to become Master Node.
Q #31) What are functionalities of attributes such as enabled, index and store in Elasticsearch?
Answer:
Enabled attribute of Elasticsearch is applied in the case where we need to retain and store a particular field from indexing. This is done by using “enabled”: false syntax into the top-level mapping as well as to object fields.
Index attribute of Elasticsearch will decide three ways in which a stream of string can be indexed.
- ‘analyzed’ in which string will be analyzed before it is subjected to indexing as a full-text field.
- ‘not_analyzed’ index the stream of string to make it searchable, without analyzing it.
- ‘no’ – where the string will not be indexed at all, and will not be searchable as well.
Irrespective of setting the attribute ‘store’ to false, Elasticsearch stores the original document on the disk, which searches as quickly as possible.
Q #32) How does a character filter in Elasticsearch Analyzer utilized?
Answer: Character filter in Elasticsearch analyzer is not mandatory. These filters manipulate the input stream of the string by replacing the token of text with corresponding value mapped to the key.
We can use mapping character filters that use parameters as mappings and mappings_path. The mappings are the files that contain an array of key and corresponding values listed, whereas mappings_path is the path that is registered in the config directory that shows the mappings file present.
Q #33) Please explain about NRT with regards to Elasticsearch?
Answer: Elasticsearch is the quickest possible search platform, where the latency (delay) is just one second from the time you index the document and the time it becomes searchable, hence Elasticsearch is Near Real-Time (NRT) search platform.
Q #34) What are the advantages of REST API with regards to Elasticsearch?
Answer: REST API is communication between systems using hypertext transfer protocol which transfers data requests in XML and JSON format.
The REST protocol is stateless and is separated from the user interface with server and storage data, resulting in enhanced portability of user interface with any type of platform. It also improves scalability allowing to independently implement the components and hence applications become more flexible to work with.
REST API is platform and language independent except that the language used for data exchange will be XML or JSON.
Q #35) While installing Elasticsearch, please explain different packages and their importance?
Answer: Elasticsearch installation includes the following packages:
- Linux and macOS platform needs tar.gz archives to be installed.
- Windows operating system requires .zip archives to be installed.
- Debian, Ubuntu-based systems deb pack needs to be installed.
- Red Hat, Centos, OpenSuSE, SLES needs rpm package to be installed.
- Windows 64 bits system requires the MSI package to be installed.
- Docker images for running Elasticsearch as Docker containers can be downloaded from Elastic Docker Registry.
- X-Pack API packages are installed along with Elasticsearch that helps to get information on the license, security, migration, and machine learning activities that are involved in Elasticsearch.
Q #36) What are configuration management tools that are supported by Elasticsearch?
Answer: Ansible, Chef, Puppet and Salt Stack are configuration tools supported by Elasticsearch used by the DevOps team.
Q #37) Can you please explain the functionality and importance of the installation of X-Pack for Elasticsearch?
Answer: X-Pack is an extension that gets installed along with Elasticsearch. Various functionalities of X-Pack are security (Role-based access, Privileges/Permissions, Roles and User security), monitoring, reporting, alerting and many more.
Q #38) Can you list X-Pack API types?
Answer: X-Pack API types are listed as below:
(i) Info API: It provides general information on features of X-Pack installed, such as Build info, License info, features info.
Info API – xPack API:
(ii) Graph Explore API: Explore API helps to retrieve and summarize documents information versus terms of Elasticsearch indices.
(iii) Licensing APIs: This APIs helps to manage licenses such as to get trial Status, Starting Trial, get basic status, start basic, start the trial, update license and delete license.
GET license
(iv) Machine learning APIs: These APIs perform tasks related to calendar such as create a calendar, add and delete the job, add and delete scheduled events to the calendar, get the calendar, get scheduled events, delete calendar, filter tasks such as create, update, get and delete the filter, data feeds tasks like create, update, start, stop, preview and delete data feed, get data feed info/statistics.
Jobs tasks like create, update, open, close, delete the job, add or delete job to calendar, get job info/statistics, various other tasks related to model snapshots, results, file structure as well as expired data also are included in machine learning API.
(v) Security APIs: These API are utilized to perform X-Pack security activities, such as Authenticate, clear cache, Privilege and SSL Certificate related security activities.
(vi) Watcher APIs: These API helps to watch or observe new documents added into Elasticsearch.
(vii) Rollup APIs: These API has been introduced for verifying the functionalities in the experimental stage, which may be removed in the future from Elasticsearch.
(viii) Migration APIs: These API upgrades X-Pack index from the previous version to the latest version.
Q #39) Can you list X-Pack commands?
Answer: X-Pack commands are listed below:
- Certgen
- Migrate
- setup-passwords
- syskeygen
- users
Q #40) What is the functionality of cat API in Elasticsearch?
Answer: cat API commands give an analysis, overview, and health of Elasticsearch cluster which include information related to aliases, allocation, indices, node attributes to name a few. These cat commands use query string as its parameter which returns headers and their corresponding information from the JSON document.
Q #41) What are the cat commands from cat API used in Elasticsearch?
Answer:
Enlisted below are the cat commands listed from cat API:
(i) Aliases – GET _cat/aliases?v –This command display mapping of alias with indices, routing as well as filtering information.
(ii) Allocation – GET _cat/allocation?v –This command display disk space allocated for indices as well as shards count on each node.
(iii) Count – GET _cat/count?v – This command shows how many documents are present in the Elasticsearch cluster.
(iv) Fielddata – GET _cat/fielddata?v – This displays the amount of memory utilized by each of the fields per node.
(v) Health – GET _cat/health?v – It displays cluster status like since how long it is up and running, node counts it has, etc. to analyze cluster health.
(vi) Indices – GET _cat/indices?v – cat indices API gives us information on several shards, document, deleted document, store sizes of all the shards including their replicas.
(vii) Master – GET _cat/master?v – It displays information that shows the master node that has been elected.
(viii) Node attributes – GET _cat/nodeattrs?v – It displays custom nodes attributes.
(ix) Nodes – GET _cat/nodes?v – It displays information related to a node such as roles and load metrics.
(x) Pending tasks – GET _cat/pending_tasks?v – It displays pending tasks progress such as task priority and time in queue.
(xi) Plugins – GET _cat/plugins?v – It displays information related to installing plugins like names, versions, and components.
(xii) Recovery – GET _cat/recovery?v – It displays recoveries related to completed as well as current indices and shards.
(xiii) Repositories – Get _cat/repositories?v – It displays a glance of repositories as well as their types.
(xiv) Segments – GET _cat/segments?v – It displays for each of the indexes, Lucene level segments information.
(xv) Shards – GET _cat/shards?v – It displays the state as well as the distribution of primary and replica shards
(xvi) Snapshots – GET _cat/snapshots?v – It displays a glance of a repository.
(xvii) Tasks – GET _cat/tasks?v – It displays all tasks that are running on the cluster and their progress.
(xviii) Templates – GET _cat/templates?v – cat template API gives us information on index templates which are created during new indices creation for index settings and field mappings
(xix) Thread pool – GET _cat/thread_pool?v – It displays the status of different node wise thread pools such as active, queued and rejected are the status of thread pools.
Q #42) Can you explain Explore API in an Elasticsearch?
Answer: Explore API help to fetch information on documents and duration or terms such as “max number of vertices” or “number of shards/partition” or “document count” etc.
Q #43) How Migration API can be used as an Elasticsearch?
Answer: Migration API is applied after the Elasticsearch version is upgraded with a newer version. With this migration API, X-Pack indices are updated into the latest/newer version of the Elasticsearch cluster.
Q #44) How searching API function in an Elasticsearch?
Answer: Search API helps to look for the data from the index, from particular shards guided by a routing parameter.
Q #45) Can you please list field data type majorly available concerning Elasticsearch?
Answer: Enlisted below are the data types for the document fields:
- String data type which includes text and keyword such as email addresses, zip codes, hostnames.
- Numeric data type like byte, short, integer, long, float, double, half_float, scaled_float.
- Date, Date nanoseconds, Boolean, Binary (Base64 encoded string, e.g 000000 for char ‘A’ or 011010 for char ‘a’)
- Range (integer_range, long_range, double_range, float_range, date_range)
- Complex data types that include object (Example: single JSON object) and Nested (array of JSON objects)
- Geo datatypes include latitude/longitude which is geo-points and geo-shape which include shapes like a polygon.
- Specialized datatypes, Arrays (values in the array should have same data type)
Q #46) Explain in detail about ELK Stack and its contents?
Answer: Enterprises, large or small nowadays come across information in the form of reports, data and customer follow-ups and historical, current orders as well as customer reviews from the online and offline logs. It is essential to store and analyze these logs which will help predict valuable feedback for the businesses.
To maintain these logs of data, it needs an inexpensive log analysis tool. ELK Stack is a collection of Search and Analysis tools like Elastic Search, Collection and Transformation tool like log stash and Visualization and Data Management tool like Kibana, parsing and collection of logs with Beats and monitoring and reporting tool like X Pack.
Q #47) Where and how Kibana will be useful in Elasticsearch?
Answer: Kibana comes as a part of the ELK Stack – log analysis solution. It is an open-source visualizations tool that analyzes ever-increasing logs in various graph formats such as line, pie-bar, coordinate maps, etc.
Q #48) How Log stash can be used with Elasticsearch?
Answer: Log stash is an open-source ETL server-side engine that comes along with ELK Stack that collects, and processes data from a large variety of sources.
Q #49) How Beats can be used with Elasticsearch?
Answer: Beats is an open-source tool that transports the data straight to the Elasticsearch or through Log stash, where data can be processed or filtered before being viewed using Kibana. The type of data that gets transported is audit data, log files, cloud data, network traffic, and window event logs.
Q #50) How Elastic Stack Reporting is used?
Answer: Reporting API helps to retrieve data in PDF format, image PNG format as well as spreadsheet CSV format and can be shared or saved as per need.
Q #51) Can you please list use cases related to ELK log analytics?
Answer: ELK log analytics successfully designed use cases are listed below:
- Compliance
- E-commerce Search solution
- Fraud detection
- Market Intelligence
- Risk management
- Security analysis
Conclusion
Elasticsearch is an open-source, RESTful, scalable, built on Apache Lucene library, document-based search engine. Elasticsearch stores retrieve and manage textual, numerical, geospatial, structured and unstructured data in the form of JSON documents using CRUD REST API.
Every possible area of ElasticSearch, as well as ELK stack, questions related to various analyzers, filters, token filters and APIs used in ElasticSearch, has been asked as interview questions with most technical answers to each of the questions.
We hope you have found the answers to the most frequently asked interview questions. Practice, refer and revise these Elasticsearch Interview questions and answers to perform confidently in the technical interview.
Best of luck with the interview!!
微信公众号: 架构师日常笔记 欢迎关注!