Elasticsearch
一、Elasticsearch是什么
Elasticsearch 是一个分布式的免费开源搜索和分析引擎,适用于包括文本、数字、地理空间、结构化和非结构化数据等在内的所有类型的数据。Elasticsearch 在 Apache Lucene 的基础上开发而成,由 Elasticsearch N.V.(即现在的 Elastic)于 2010 年首次发布。Elasticsearch 以其简单的 REST 风格 API、分布式特性、速度和可扩展性而闻名,是 Elastic Stack 的核心组件;Elastic Stack 是一套适用于数据采集、扩充、存储、分析和可视化的免费开源工具。人们通常将 Elastic Stack 称为 ELK Stack(代指 Elasticsearch、Logstash 和 Kibana),目前 Elastic Stack 包括一系列丰富的轻量型数据采集代理,这些代理统称为 Beats,可用来向 Elasticsearch 发送数据。
Elasticsearch的底层是Lucene,但是无法直接使用,要写代码调用它的接口。Elastic封装了Lucene,提供了REST API的操作接口,开箱即用。
官方文档:https://www.elastic.co/guide/index.html
安装:以Elasticsearch 7.9.3为例
- Download the Elasticsearch 7.9.3 Windows zip file from the Elasticsearch download page.
- Extract the contents of the zip file to a directory on your computer, for example,
C:\Program Files
. - Open a command prompt as an Administrator and navigate to the directory that contains the extracted files, for example: cd C:\Program Files\elasticsearch-7.9.3
- Start Elasticsearch:bin\elasticsearch.bat
通过访问http://127.0.0.1:9200,测试Elasticsearch服务是否正常启动。
注:9200为浏览器访问http RESTful端口,9300为Elasticsearch集群间组件的通信接口。
浏览器返回以下结果表示正常启动。
{ "name" : "QtI5dUu", "cluster_name" : "elasticsearch", "cluster_uuid" : "DMXhqzzjTGqEtDlkaMOzlA", "version" : { "number" : "7.9.3", "build_flavor" : "default", "build_type" : "tar", "build_hash" : "00d8bc1", "build_date" : "2018-06-06T16:48:02.249996Z", "build_snapshot" : false, "lucene_version" : "7.3.1", "minimum_wire_compatibility_version" : "5.6.0", "minimum_index_compatibility_version" : "5.0.0" }, "tagline" : "You Know, for Search" }
二、Elasticsearch 的用途
- 应用程序搜索
- 网站搜索
- 企业搜索
- 日志处理和分析
- 基础设施指标和容器监测
- 应用程序性能监测
- 地理空间数据分析和可视化
- 安全分析
- 业务分析
三、Elasticsearch 的工作原理
原始数据会从多个来源(包括日志、系统指标和网络应用程序)输入到 Elasticsearch 中。数据采集指在 Elasticsearch 中进行索引之前解析、标准化并充实这些原始数据的过程。这些数据在 Elasticsearch 中索引完成之后,用户便可针对他们的数据运行复杂的查询,并使用聚合来检索自身数据的复杂汇总。
四、基本概念
1、Index(索引)
动词:相当于MySQL中的insert;
名词:相对于MySQL中的Database
2、Type(类型)
在index中,可以定义一个或多个类型。类似于MySQL中的Table,每种类型的数据放在一起。
注:在Elasticsearch 6.X中,一个index中只能包含一个type。
3、document(文档)
保存在某个索引下,某种类型的一个数据,数据格式为JSON格式。类比MySQL中的某个table里面的一行数据。
4、倒排索引机制
Elasticsearch 使用一种称为 倒排索引 的结构,它适用于快速的全文搜索。一个倒排索引由文档中所有不重复词的列表构成,对于其中每个词,有一个包含它的文档列表。
例如,假设我们有两个文档,每个文档的
content
域包含如下内容:
- The quick brown fox jumped over the lazy dog
- Quick brown foxes leap over lazy dogs in summer
为了创建倒排索引,我们首先将每个文档的
content
域拆分成单独的 词(我们称它为词条
或tokens
),创建一个包含所有不重复词条的排序列表,然后列出每个词条出现在哪个文档。结果如下所示:Term Doc_1 Doc_2 ------------------------- Quick | | X The | X | brown | X | X dog | X | dogs | | X fox | X | foxes | | X in | | X jumped | X | lazy | X | X leap | | X over | X | X quick | X | summer | | X the | X | ------------------------现在,如果我们想搜索
quick brown
,我们只需要查找包含每个词条的文档:Term Doc_1 Doc_2 ------------------------- brown | X | X quick | X | ------------------------ Total | 2 | 1两个文档都匹配,但是第一个文档比第二个匹配度更高。如果我们使用仅计算匹配词条数量的简单 相似性算法 ,那么,我们可以说,对于我们查询的相关性来讲,第一个文档比第二个文档更佳。
但是,我们目前的倒排索引有一些问题:
Quick
和quick
以独立的词条出现,然而用户可能认为它们是相同的词。fox
和foxes
非常相似, 就像dog
和dogs
;他们有相同的词根。jumped
和leap
, 尽管没有相同的词根,但他们的意思很相近。他们是同义词。使用前面的索引搜索
+Quick +fox
不会得到任何匹配文档。(记住,+
前缀表明这个词必须存在。)只有同时出现Quick
和fox
的文档才满足这个查询条件,但是第一个文档包含quick fox
,第二个文档包含Quick foxes
。我们的用户可以合理的期望两个文档与查询匹配。我们可以做的更好。
如果我们将词条规范为标准模式,那么我们可以找到与用户搜索的词条不完全一致,但具有足够相关性的文档。例如:
Quick
可以小写化为quick
。foxes
可以 词干提取 --变为词根的格式-- 为fox
。类似的,dogs
可以为提取为dog
。jumped
和leap
是同义词,可以索引为相同的单词jump
。现在索引看上去像这样:
Term Doc_1 Doc_2 ------------------------- brown | X | X dog | X | X fox | X | X in | | X jump | X | X lazy | X | X over | X | X quick | X | X summer | | X the | X | X ------------------------这还远远不够。我们搜索
+Quick +fox
仍然 会失败,因为在我们的索引中,已经没有Quick
了。但是,如果我们对搜索的字符串使用与content
域相同的标准化规则,会变成查询+quick +fox
,这样两个文档都会匹配!五、初步检索
1、_cat
- GET /_cat/nodes:查看所有节点
- GET /_cat/health:查看es健康状况
- GET /_cat/master:查看主节点
- GET /_cat/indices:查看所有索引
2、put/post 新增、更新数据
put/post localhost:9200/xujian/book/1 { "bookName":"xujian", "price":30 }put可以新增可以修改,必须指定id。
post新增,如果不指定id,会自动生成id,指定id就修改这个数据,并新增版本号。
3、get 查询文档
GET localhost:9200/xujian/book/1{ "_index": "xujian", //在哪个索引 "_type": "book", //在哪个类型 "_id": "1", //记录id "_version": 2, //版本号 "_seq_no": 1, //并发控制字段,每次自动加1,用来做乐观锁 "_primary_term": 1, //主分片重新分配,如重启,变化 "found": true, "_source": { //实际的内容 "bookName": "xujian", "price": 30 } }4、delete 删除文档&索引
DELETE localhost:9200/xujian/book/1/ { "_index": "xujian", "_type": "book", "_id": "1", "_version": 6, "result": "deleted", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "_seq_no": 5, "_primary_term": 1 }DELETE localhost:9200/xujian/
六、springBoot整合Elasticsearch
1、新建一个springBoot的Maven工程;
2、引入POM文件
<?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>org.example</groupId> <artifactId>springBoot-elasticsearch</artifactId> <version>1.0-SNAPSHOT</version> <parent> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-parent</artifactId> <version>2.3.4.RELEASE</version> <relativePath/> <!-- lookup parent from repository --> </parent> <dependencies> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter</artifactId> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-web</artifactId> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-data-elasticsearch</artifactId> </dependency> <dependency> <groupId>org.projectlombok</groupId> <artifactId>lombok</artifactId> <optional>true</optional> </dependency> <dependency> <groupId>com.alibaba</groupId> <artifactId>fastjson</artifactId> <version>1.2.74</version> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-thymeleaf</artifactId> </dependency> <dependency> <groupId>org.jsoup</groupId> <artifactId>jsoup</artifactId> <version>1.10.2</version> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-test</artifactId> <scope>test</scope> <exclusions> <exclusion> <groupId>org.junit.vintage</groupId> <artifactId>junit-vintage-engine</artifactId> </exclusion> </exclusions> </dependency> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <scope>test</scope> </dependency> </dependencies> </project>3、修改application.yml配置文件
server: port:9090 spring: thymeleaf: cache:false prefix:classpath:/templates/ elasticsearch: rest: uris: http://localhost:92004、创建ES文档和映射
首先创建一个JAVA对象,然后通过注解来声明字段的映射属性。spring提供的注解有
@Document
、@Id
、@Field
,其中@Document
作用在类,@Id
、@Field
作用在成员变量,@Id
标记一个字段作为id主键。package com.es.elsaticsearch.entity; import org.springframework.data.elasticsearch.annotations.Document; @Document(indexName = "jy_book") public class Book { private Integer id; private String bookName; private String author; public Integer getId() { return id; } public void setId(Integer id) { this.id = id; } public String getBookName() { return bookName; } public void setBookName(String bookName) { this.bookName = bookName; } public String getAuthor() { return author; } public void setAuthor(String author) { this.author = author; } @Override public String toString() { return "Book{" + "id=" + id + ", bookName='" + bookName + '\'' + ", author='" + author + '\'' + '}'; } }5、创建一个repository继承ElasticsearchRepository类的方法。
package com.es.elsaticsearch.repository; import com.es.elsaticsearch.entity.Book; import org.springframework.data.elasticsearch.repository.ElasticsearchRepository; import java.util.List; public interface BookRepository extends ElasticsearchRepository<Book,Integer> { List<Book> findByBookNameLike(String bookName); }6、创建测试类
package com.es.elsaticsearch; import com.alibaba.fastjson.JSON; import com.alibaba.fastjson.JSONArray; import com.alibaba.fastjson.JSONObject; import com.es.elsaticsearch.entity.Book; import com.es.elsaticsearch.entity.User; import com.es.elsaticsearch.repository.BookRepository; import com.es.elsaticsearch.service.ContentService; import org.elasticsearch.action.admin.indices.delete.DeleteIndexRequest; import org.elasticsearch.action.bulk.BulkRequest; import org.elasticsearch.action.bulk.BulkResponse; import org.elasticsearch.action.delete.DeleteRequest; import org.elasticsearch.action.delete.DeleteResponse; import org.elasticsearch.action.get.GetRequest; import org.elasticsearch.action.get.GetResponse; import org.elasticsearch.action.index.IndexRequest; import org.elasticsearch.action.index.IndexResponse; import org.elasticsearch.action.search.SearchRequest; import org.elasticsearch.action.search.SearchResponse; import org.elasticsearch.action.support.master.AcknowledgedResponse; import org.elasticsearch.action.update.UpdateRequest; import org.elasticsearch.action.update.UpdateResponse; import org.elasticsearch.client.RequestOptions; import org.elasticsearch.client.RestHighLevelClient; import org.elasticsearch.client.indices.CreateIndexRequest; import org.elasticsearch.client.indices.CreateIndexResponse; import org.elasticsearch.client.indices.GetIndexRequest; import org.elasticsearch.common.unit.TimeValue; import org.elasticsearch.common.xcontent.XContentType; import org.elasticsearch.index.query.MatchAllQueryBuilder; import org.elasticsearch.index.query.QueryBuilders; import org.elasticsearch.search.SearchHit; import org.elasticsearch.search.builder.SearchSourceBuilder; import org.elasticsearch.search.fetch.subphase.FetchSourceContext; import org.junit.Test; import org.junit.runner.RunWith; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.boot.test.context.SpringBootTest; import org.springframework.test.context.junit4.SpringRunner; import java.io.IOException; import java.util.ArrayList; import java.util.List; import java.util.Map; import java.util.concurrent.TimeUnit; import java.util.stream.Collectors; @RunWith(SpringRunner.class) @SpringBootTest public class ElsaticsearchApplicationTests { //日志信息输出 private static final Logger logger = LoggerFactory.getLogger(ElsaticsearchApplicationTests.class); /** * 方式一:ElasticsearchTemplate操作ES */ @Autowired private RestHighLevelClient restHighLevelClient; /** * 方式二:编写一个ElasticsearchRepository子接口来操作ES */ @Autowired private BookRepository bookRepository; @Autowired private ContentService contentService; @Test public void test02() { Book book = new Book(); book.setId(1); book.setBookName("红楼梦"); book.setAuthor("曹雪芹"); this.bookRepository.save(book); List<Book> bookList = bookRepository.findByBookNameLike("游"); for (Book b : bookList) { System.out.println(b.getBookName()); } } //创建索引 @Test public void testCreateIndex() throws IOException { CreateIndexRequest createIndexRequest = new CreateIndexRequest("dong"); CreateIndexResponse response = restHighLevelClient.indices().create(createIndexRequest, RequestOptions.DEFAULT); System.out.println(response); } /** * 测试索引是否存在 * * @throws IOException */ @Test public void testExistIndex() throws IOException { GetIndexRequest request = new GetIndexRequest("ywb"); boolean exists = restHighLevelClient.indices().exists(request, RequestOptions.DEFAULT); System.out.println(exists); } /** * 删除索引 */ @Test public void deleteIndex() throws IOException { DeleteIndexRequest deleteIndexRequest = new DeleteIndexRequest("ywb"); AcknowledgedResponse delete = restHighLevelClient.indices().delete(deleteIndexRequest, RequestOptions.DEFAULT); System.out.println(delete.isAcknowledged()); } /** * 测试添加文档 * * @throws IOException */ @Test public void createDocument() throws IOException { User user = new User("ywb", 18); IndexRequest request = new IndexRequest("ywb"); request.id("1"); request.timeout(TimeValue.timeValueSeconds(1)); request.timeout("1s"); //将我们的数据放入请求,json request.source(JSON.toJSONString(user), XContentType.JSON); //客服端发送请求 IndexResponse index = restHighLevelClient.index(request, RequestOptions.DEFAULT); System.out.println(index.toString()); //对应我们的命令返回状态 System.out.println(index.status()); } //判断是否存在文档 @Test public void testIsExist() throws IOException { GetRequest getRequest = new GetRequest("ywb", "1"); //不获取返回的source的上下文 getRequest.fetchSourceContext(new FetchSourceContext(false)); getRequest.storedFields("_none_"); boolean exists = restHighLevelClient.exists(getRequest, RequestOptions.DEFAULT); System.out.println(exists); } //获取文档信息 @Test public void testGetDocument() throws IOException { GetRequest getRequest = new GetRequest("ywb", "1"); GetResponse response = restHighLevelClient.get(getRequest, RequestOptions.DEFAULT); //打印文档信息 System.out.println(response.getSourceAsString()); System.out.println(response); } //更新文档信息 @Test public void testUpdateDocument() throws IOException { UpdateRequest request = new UpdateRequest("ywb", "1"); request.timeout("1s"); User user = new User("ywb java", 19); request.doc(JSON.toJSONString(user), XContentType.JSON); UpdateResponse update = restHighLevelClient.update(request, RequestOptions.DEFAULT); System.out.println(update); System.out.println(update.status()); } //删除文档 @Test public void testDeleteDocument() throws IOException { DeleteRequest request = new DeleteRequest("ywb", "1"); request.timeout("10s"); User user = new User("ywb java", 19); DeleteResponse update = restHighLevelClient.delete(request, RequestOptions.DEFAULT); System.out.println(update.status()); } //批量插入数据 @Test public void testBulkRequest() throws IOException { BulkRequest bulkRequest = new BulkRequest(); bulkRequest.timeout("10s"); ArrayList<User> users = new ArrayList<>(); users.add(new User("zhangsan", 1)); users.add(new User("lishi", 12)); users.add(new User("wangwu", 13)); users.add(new User("zhaoliu", 14)); users.add(new User("tianqi", 15)); for (int i = 0; i < users.size(); i++) { bulkRequest.add( new IndexRequest("ywb") .id("" + i + 1) .source(JSON.toJSONString(users.get(i)), XContentType.JSON) ); } BulkResponse bulk = restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT); System.out.println(bulk); } /** * 搜索请求 * 条件构造 * * @throws IOException */ @Test public void testSearch() throws IOException { SearchRequest searchRequest = new SearchRequest("dong"); //构建搜索条件 SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); //查询所有 MatchAllQueryBuilder matchAllQueryBuilder = QueryBuilders.matchAllQuery(); // TermQueryBuilder queryBuilder = QueryBuilders.termQuery("name","zhangsan"); searchSourceBuilder.query(matchAllQueryBuilder); searchSourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS)); searchRequest.source(searchSourceBuilder); SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT); System.out.println(JSON.toJSONString(searchResponse.getHits())); System.out.println("======="); for (SearchHit hit : searchResponse.getHits().getHits()) { System.out.println(hit.getSourceAsMap()); } } @Test public void search() { SearchRequest searchRequest = new SearchRequest("ywb"); SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); MatchAllQueryBuilder matchAllQueryBuilder = new MatchAllQueryBuilder(); String writeableName = matchAllQueryBuilder.getWriteableName(); logger.info(writeableName); } @Test public void test() throws IOException { contentService.parseContent("程序员"); } @Test public void testSearchContent() throws IOException { List<Map<String, Object>> java = contentService.searchPage("python", 1, 15); for (Map<String, Object> stringObjectMap : java) { for (Map.Entry<String, Object> stringObjectEntry : stringObjectMap.entrySet()) { System.out.println(stringObjectEntry); } } } @Test public void stream() { ArrayList<User> users = new ArrayList<>(); users.add(new User("张三", 18)); users.add(new User("李四", 19)); users.add(new User("王五", 20)); users.add(new User("赵六", 21)); users.add(new User("田七", 22)); users.stream().filter((u) -> u.getAge() > 18).forEach(System.out::println); long count = users.stream().filter((u) -> u.getName().equals("张三")).count(); System.out.println(count); List<Integer> collect = users.stream().map(User::getAge).collect(Collectors.toList()); Map<String, User> collect1 = users.stream().collect(Collectors.toMap(User::getName, v -> v, (o, n) -> n)); for (Map.Entry<String, User> stringUserEntry : collect1.entrySet()) { System.out.println("key:" + stringUserEntry.getKey() + "," + "value:" + stringUserEntry.getValue()); } } @Test public void testString() { List<String> list1 = new ArrayList<>(); list1.add("a"); list1.add("b"); list1.add("c"); list1.add("d"); list1.add("e"); List<String> list2 = new ArrayList<>(); list2.add("a"); list2.add("b"); list2.add("c"); list2.add("d"); list1.addAll(list2); System.out.println("添加到list1"); for (String string : list1) { System.out.println(string); } list1.removeAll(list2); System.out.println("去除重复"); for (String string : list1) { System.out.println(string); } } @Test public void testRemove() { ArrayList<User> users = new ArrayList<>(); users.add(new User(1, "张三", 12)); users.add(new User(2, "李四", 13)); users.add(new User(3, "王五", 14)); users.add(new User(4, "赵六", 15)); System.out.println(users); Object o = JSONObject.toJSON(users); System.out.println(o); String s = JSONObject.toJSONString(users); JSONArray objects = JSON.parseArray(s); List<User> users1 = objects.toJavaList(User.class); System.out.println(s); // ArrayList<User> list = new ArrayList<>(); // list.add(new User(1,"张三1",12)); // list.add(new User(2,"李四1",13)); // // boolean b = users.removeAll(list); // // for (User user : users) { // System.out.println(user.getName()); // } // System.out.println("========="); // boolean b1 = users.addAll(list); // for (User user : users) { // System.out.println(user); // } } @Test public void testObject() { boolean equals = new User(1, "张三", 12).equals(new User(1, "张三", 12)); System.out.println(equals); } }至此,SpringBoot整合Elasticsearch基本结束。
七、源码
Git项目地址:springBoot
八、参考文献