Elasticsearch

一、Elasticsearch是什么

  Elasticsearch 是一个分布式的免费开源搜索和分析引擎,适用于包括文本、数字、地理空间、结构化和非结构化数据等在内的所有类型的数据。Elasticsearch 在 Apache Lucene 的基础上开发而成,由 Elasticsearch N.V.(即现在的 Elastic)于 2010 年首次发布。Elasticsearch 以其简单的 REST 风格 API、分布式特性、速度和可扩展性而闻名,是 Elastic Stack 的核心组件;Elastic Stack 是一套适用于数据采集、扩充、存储、分析和可视化的免费开源工具。人们通常将 Elastic Stack 称为 ELK Stack(代指 Elasticsearch、Logstash 和 Kibana),目前 Elastic Stack 包括一系列丰富的轻量型数据采集代理,这些代理统称为 Beats,可用来向 Elasticsearch 发送数据。

Elasticsearch的底层是Lucene,但是无法直接使用,要写代码调用它的接口。Elastic封装了Lucene,提供了REST API的操作接口,开箱即用。 

官方文档:https://www.elastic.co/guide/index.html

安装:以Elasticsearch 7.9.3为例

  1. Download the Elasticsearch 7.9.3 Windows zip file from the Elasticsearch download page.
  2. Extract the contents of the zip file to a directory on your computer, for example, C:\Program Files.
  3. Open a command prompt as an Administrator and navigate to the directory that contains the extracted files, for example: cd C:\Program Files\elasticsearch-7.9.3  
  4. Start Elasticsearch:bin\elasticsearch.bat 

  通过访问http://127.0.0.1:9200,测试Elasticsearch服务是否正常启动。

  注:9200为浏览器访问http RESTful端口,9300为Elasticsearch集群间组件的通信接口。

  浏览器返回以下结果表示正常启动。

{
  "name" : "QtI5dUu",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "DMXhqzzjTGqEtDlkaMOzlA",
  "version" : {
    "number" : "7.9.3",
    "build_flavor" : "default",
    "build_type" : "tar",
    "build_hash" : "00d8bc1",
    "build_date" : "2018-06-06T16:48:02.249996Z",
    "build_snapshot" : false,
    "lucene_version" : "7.3.1",
    "minimum_wire_compatibility_version" : "5.6.0",
    "minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search"
}

二、Elasticsearch 的用途

  • 应用程序搜索
  • 网站搜索
  • 企业搜索
  • 日志处理和分析
  • 基础设施指标和容器监测
  • 应用程序性能监测
  • 地理空间数据分析和可视化
  • 安全分析
  • 业务分析

三、Elasticsearch 的工作原理

原始数据会从多个来源(包括日志、系统指标和网络应用程序)输入到 Elasticsearch 中。数据采集指在 Elasticsearch 中进行索引之前解析、标准化并充实这些原始数据的过程。这些数据在 Elasticsearch 中索引完成之后,用户便可针对他们的数据运行复杂的查询,并使用聚合来检索自身数据的复杂汇总。

四、基本概念

  1、Index(索引)

动词:相当于MySQL中的insert;

名词:相对于MySQL中的Database

  2、Type(类型)

  在index中,可以定义一个或多个类型。类似于MySQL中的Table,每种类型的数据放在一起。

  注:在Elasticsearch 6.X中,一个index中只能包含一个type。

  3、document(文档)

  保存在某个索引下,某种类型的一个数据,数据格式为JSON格式。类比MySQL中的某个table里面的一行数据。

4、倒排索引机制

Elasticsearch 使用一种称为 倒排索引 的结构,它适用于快速的全文搜索。一个倒排索引由文档中所有不重复词的列表构成,对于其中每个词,有一个包含它的文档列表。

例如,假设我们有两个文档,每个文档的 content 域包含如下内容:

  1. The quick brown fox jumped over the lazy dog
  2. Quick brown foxes leap over lazy dogs in summer

为了创建倒排索引,我们首先将每个文档的 content 域拆分成单独的 词(我们称它为 词条 或 tokens ),创建一个包含所有不重复词条的排序列表,然后列出每个词条出现在哪个文档。结果如下所示:

Term      Doc_1  Doc_2
-------------------------
Quick   |       |  X
The     |   X   |
brown   |   X   |  X
dog     |   X   |
dogs    |       |  X
fox     |   X   |
foxes   |       |  X
in      |       |  X
jumped  |   X   |
lazy    |   X   |  X
leap    |       |  X
over    |   X   |  X
quick   |   X   |
summer  |       |  X
the     |   X   |
------------------------

现在,如果我们想搜索 quick brown ,我们只需要查找包含每个词条的文档:

Term      Doc_1  Doc_2
-------------------------
brown   |   X   |  X
quick   |   X   |
------------------------
Total   |   2   |  1

两个文档都匹配,但是第一个文档比第二个匹配度更高。如果我们使用仅计算匹配词条数量的简单 相似性算法 ,那么,我们可以说,对于我们查询的相关性来讲,第一个文档比第二个文档更佳。

但是,我们目前的倒排索引有一些问题:

  • Quick 和 quick 以独立的词条出现,然而用户可能认为它们是相同的词。
  • fox 和 foxes 非常相似, 就像 dog 和 dogs ;他们有相同的词根。
  • jumped 和 leap, 尽管没有相同的词根,但他们的意思很相近。他们是同义词。

使用前面的索引搜索 +Quick +fox 不会得到任何匹配文档。(记住,+ 前缀表明这个词必须存在。)只有同时出现 Quick 和 fox 的文档才满足这个查询条件,但是第一个文档包含 quick fox ,第二个文档包含 Quick foxes 。

我们的用户可以合理的期望两个文档与查询匹配。我们可以做的更好。

如果我们将词条规范为标准模式,那么我们可以找到与用户搜索的词条不完全一致,但具有足够相关性的文档。例如:

  • Quick 可以小写化为 quick 。
  • foxes 可以 词干提取 --变为词根的格式-- 为 fox 。类似的, dogs 可以为提取为 dog 。
  • jumped 和 leap 是同义词,可以索引为相同的单词 jump 。

现在索引看上去像这样:

Term      Doc_1  Doc_2
-------------------------
brown   |   X   |  X
dog     |   X   |  X
fox     |   X   |  X
in      |       |  X
jump    |   X   |  X
lazy    |   X   |  X
over    |   X   |  X
quick   |   X   |  X
summer  |       |  X
the     |   X   |  X
------------------------

这还远远不够。我们搜索 +Quick +fox 仍然 会失败,因为在我们的索引中,已经没有 Quick 了。但是,如果我们对搜索的字符串使用与 content 域相同的标准化规则,会变成查询 +quick +fox ,这样两个文档都会匹配!

五、初步检索

  1、_cat

  • GET /_cat/nodes:查看所有节点
  • GET /_cat/health:查看es健康状况
  • GET /_cat/master:查看主节点
  • GET /_cat/indices:查看所有索引

  2、put/post  新增、更新数据

put/post localhost:9200/xujian/book/1
{
    "bookName":"xujian",
    "price":30
}

  put可以新增可以修改,必须指定id。

  post新增,如果不指定id,会自动生成id,指定id就修改这个数据,并新增版本号。

  3、get  查询文档

GET localhost:9200/xujian/book/1
{
    "_index": "xujian", //在哪个索引
    "_type": "book",    //在哪个类型
    "_id": "1",            //记录id
    "_version": 2,      //版本号
    "_seq_no": 1,     //并发控制字段,每次自动加1,用来做乐观锁
    "_primary_term": 1,  //主分片重新分配,如重启,变化
    "found": true,
    "_source": {       //实际的内容
        "bookName": "xujian",
        "price": 30
    }
}

  4、delete  删除文档&索引

DELETE  localhost:9200/xujian/book/1/
{
    "_index": "xujian",
    "_type": "book",
    "_id": "1",
    "_version": 6,
    "result": "deleted",
    "_shards": {
        "total": 2,
        "successful": 1,
        "failed": 0
    },
    "_seq_no": 5,
    "_primary_term": 1
}
DELETE localhost:9200/xujian/

六、springBoot整合Elasticsearch

  1、新建一个springBoot的Maven工程;

  2、引入POM文件

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>org.example</groupId>
    <artifactId>springBoot-elasticsearch</artifactId>
    <version>1.0-SNAPSHOT</version>

    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>2.3.4.RELEASE</version>
        <relativePath/> <!-- lookup parent from repository -->
    </parent>

    <dependencies>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-data-elasticsearch</artifactId>
        </dependency>
        <dependency>
            <groupId>org.projectlombok</groupId>
            <artifactId>lombok</artifactId>
            <optional>true</optional>
        </dependency>
        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>fastjson</artifactId>
            <version>1.2.74</version>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-thymeleaf</artifactId>
        </dependency>

        <dependency>
            <groupId>org.jsoup</groupId>
            <artifactId>jsoup</artifactId>
            <version>1.10.2</version>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-test</artifactId>
            <scope>test</scope>
            <exclusions>
                <exclusion>
                    <groupId>org.junit.vintage</groupId>
                    <artifactId>junit-vintage-engine</artifactId>
                </exclusion>
            </exclusions>
        </dependency>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <scope>test</scope>
        </dependency>
    </dependencies>
</project>

  3、修改application.yml配置文件

server:
  port:9090

spring:
  thymeleaf:
    cache:false
    prefix:classpath:/templates/
  elasticsearch:
    rest:
      uris: http://localhost:9200

  4、创建ES文档和映射  

  首先创建一个JAVA对象,然后通过注解来声明字段的映射属性。spring提供的注解有@Document@Id@Field,其中@Document作用在类,@Id@Field作用在成员变量,@Id标记一个字段作为id主键。

package com.es.elsaticsearch.entity;
import org.springframework.data.elasticsearch.annotations.Document;
@Document(indexName = "jy_book")
public class Book {
    private Integer id;
    private String bookName;
    private String author;

    public Integer getId() {
        return id;
    }

    public void setId(Integer id) {
        this.id = id;
    }

    public String getBookName() {
        return bookName;
    }

    public void setBookName(String bookName) {
        this.bookName = bookName;
    }

    public String getAuthor() {
        return author;
    }

    public void setAuthor(String author) {
        this.author = author;
    }

    @Override
    public String toString() {
        return "Book{" +
                "id=" + id +
                ", bookName='" + bookName + '\'' +
                ", author='" + author + '\'' +
                '}';
    }
}

  5、创建一个repository继承ElasticsearchRepository类的方法。

package com.es.elsaticsearch.repository;

import com.es.elsaticsearch.entity.Book;
import org.springframework.data.elasticsearch.repository.ElasticsearchRepository;

import java.util.List;

public interface BookRepository extends ElasticsearchRepository<Book,Integer> {

    List<Book> findByBookNameLike(String bookName);
}

  6、创建测试类

package com.es.elsaticsearch;
import com.alibaba.fastjson.JSON;
import com.alibaba.fastjson.JSONArray;
import com.alibaba.fastjson.JSONObject;
import com.es.elsaticsearch.entity.Book;
import com.es.elsaticsearch.entity.User;
import com.es.elsaticsearch.repository.BookRepository;
import com.es.elsaticsearch.service.ContentService;
import org.elasticsearch.action.admin.indices.delete.DeleteIndexRequest;
import org.elasticsearch.action.bulk.BulkRequest;
import org.elasticsearch.action.bulk.BulkResponse;
import org.elasticsearch.action.delete.DeleteRequest;
import org.elasticsearch.action.delete.DeleteResponse;
import org.elasticsearch.action.get.GetRequest;
import org.elasticsearch.action.get.GetResponse;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.index.IndexResponse;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.action.support.master.AcknowledgedResponse;
import org.elasticsearch.action.update.UpdateRequest;
import org.elasticsearch.action.update.UpdateResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.client.indices.CreateIndexRequest;
import org.elasticsearch.client.indices.CreateIndexResponse;
import org.elasticsearch.client.indices.GetIndexRequest;
import org.elasticsearch.common.unit.TimeValue;
import org.elasticsearch.common.xcontent.XContentType;
import org.elasticsearch.index.query.MatchAllQueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.search.fetch.subphase.FetchSourceContext;
import org.junit.Test;
import org.junit.runner.RunWith;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.test.context.junit4.SpringRunner;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.concurrent.TimeUnit;
import java.util.stream.Collectors;

@RunWith(SpringRunner.class)
@SpringBootTest
public class ElsaticsearchApplicationTests {

    //日志信息输出
    private static final Logger logger = LoggerFactory.getLogger(ElsaticsearchApplicationTests.class);
    /**
     * 方式一:ElasticsearchTemplate操作ES
     */
    @Autowired
    private RestHighLevelClient restHighLevelClient;
    /**
     * 方式二:编写一个ElasticsearchRepository子接口来操作ES
     */
    @Autowired
    private BookRepository bookRepository;
    @Autowired
    private ContentService contentService;

    @Test
    public void test02() {
        Book book = new Book();
        book.setId(1);
        book.setBookName("红楼梦");
        book.setAuthor("曹雪芹");
        this.bookRepository.save(book);
        List<Book> bookList = bookRepository.findByBookNameLike("游");
        for (Book b : bookList) {
            System.out.println(b.getBookName());
        }
    }

    //创建索引
    @Test
    public void testCreateIndex() throws IOException {
        CreateIndexRequest createIndexRequest = new CreateIndexRequest("dong");
        CreateIndexResponse response = restHighLevelClient.indices().create(createIndexRequest, RequestOptions.DEFAULT);
        System.out.println(response);
    }

    /**
     * 测试索引是否存在
     *
     * @throws IOException
     */
    @Test
    public void testExistIndex() throws IOException {
        GetIndexRequest request = new GetIndexRequest("ywb");
        boolean exists = restHighLevelClient.indices().exists(request, RequestOptions.DEFAULT);
        System.out.println(exists);
    }

    /**
     * 删除索引
     */
    @Test
    public void deleteIndex() throws IOException {
        DeleteIndexRequest deleteIndexRequest = new DeleteIndexRequest("ywb");
        AcknowledgedResponse delete = restHighLevelClient.indices().delete(deleteIndexRequest, RequestOptions.DEFAULT);
        System.out.println(delete.isAcknowledged());
    }

    /**
     * 测试添加文档
     *
     * @throws IOException
     */
    @Test
    public void createDocument() throws IOException {
        User user = new User("ywb", 18);
        IndexRequest request = new IndexRequest("ywb");
        request.id("1");
        request.timeout(TimeValue.timeValueSeconds(1));
        request.timeout("1s");
        //将我们的数据放入请求,json
        request.source(JSON.toJSONString(user), XContentType.JSON);
        //客服端发送请求
        IndexResponse index = restHighLevelClient.index(request, RequestOptions.DEFAULT);
        System.out.println(index.toString());
        //对应我们的命令返回状态
        System.out.println(index.status());
    }

    //判断是否存在文档
    @Test
    public void testIsExist() throws IOException {
        GetRequest getRequest = new GetRequest("ywb", "1");
        //不获取返回的source的上下文
        getRequest.fetchSourceContext(new FetchSourceContext(false));
        getRequest.storedFields("_none_");
        boolean exists = restHighLevelClient.exists(getRequest, RequestOptions.DEFAULT);
        System.out.println(exists);
    }

    //获取文档信息
    @Test
    public void testGetDocument() throws IOException {
        GetRequest getRequest = new GetRequest("ywb", "1");
        GetResponse response = restHighLevelClient.get(getRequest, RequestOptions.DEFAULT);
        //打印文档信息
        System.out.println(response.getSourceAsString());
        System.out.println(response);
    }

    //更新文档信息
    @Test
    public void testUpdateDocument() throws IOException {
        UpdateRequest request = new UpdateRequest("ywb", "1");
        request.timeout("1s");
        User user = new User("ywb java", 19);
        request.doc(JSON.toJSONString(user), XContentType.JSON);
        UpdateResponse update = restHighLevelClient.update(request, RequestOptions.DEFAULT);
        System.out.println(update);
        System.out.println(update.status());
    }

    //删除文档
    @Test
    public void testDeleteDocument() throws IOException {
        DeleteRequest request = new DeleteRequest("ywb", "1");
        request.timeout("10s");
        User user = new User("ywb java", 19);
        DeleteResponse update = restHighLevelClient.delete(request, RequestOptions.DEFAULT);
        System.out.println(update.status());
    }

    //批量插入数据
    @Test
    public void testBulkRequest() throws IOException {
        BulkRequest bulkRequest = new BulkRequest();
        bulkRequest.timeout("10s");
        ArrayList<User> users = new ArrayList<>();
        users.add(new User("zhangsan", 1));
        users.add(new User("lishi", 12));
        users.add(new User("wangwu", 13));
        users.add(new User("zhaoliu", 14));
        users.add(new User("tianqi", 15));
        for (int i = 0; i < users.size(); i++) {
            bulkRequest.add(
                    new IndexRequest("ywb")
                            .id("" + i + 1)
                            .source(JSON.toJSONString(users.get(i)), XContentType.JSON)
            );
        }
        BulkResponse bulk = restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);
        System.out.println(bulk);

    }

    /**
     * 搜索请求
     * 条件构造
     *
     * @throws IOException
     */
    @Test
    public void testSearch() throws IOException {
        SearchRequest searchRequest = new SearchRequest("dong");
        //构建搜索条件
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
        //查询所有
        MatchAllQueryBuilder matchAllQueryBuilder = QueryBuilders.matchAllQuery();
//        TermQueryBuilder queryBuilder = QueryBuilders.termQuery("name","zhangsan");
        searchSourceBuilder.query(matchAllQueryBuilder);
        searchSourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));
        searchRequest.source(searchSourceBuilder);
        SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
        System.out.println(JSON.toJSONString(searchResponse.getHits()));
        System.out.println("=======");
        for (SearchHit hit : searchResponse.getHits().getHits()) {
            System.out.println(hit.getSourceAsMap());
        }
    }

    @Test
    public void search() {
        SearchRequest searchRequest = new SearchRequest("ywb");
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
        MatchAllQueryBuilder matchAllQueryBuilder = new MatchAllQueryBuilder();
        String writeableName = matchAllQueryBuilder.getWriteableName();
        logger.info(writeableName);
    }

    @Test
    public void test() throws IOException {
        contentService.parseContent("程序员");
    }

    @Test
    public void testSearchContent() throws IOException {
        List<Map<String, Object>> java = contentService.searchPage("python", 1, 15);
        for (Map<String, Object> stringObjectMap : java) {
            for (Map.Entry<String, Object> stringObjectEntry : stringObjectMap.entrySet()) {
                System.out.println(stringObjectEntry);
            }
        }
    }

    @Test
    public void stream() {
        ArrayList<User> users = new ArrayList<>();
        users.add(new User("张三", 18));
        users.add(new User("李四", 19));
        users.add(new User("王五", 20));
        users.add(new User("赵六", 21));
        users.add(new User("田七", 22));

        users.stream().filter((u) -> u.getAge() > 18).forEach(System.out::println);
        long count = users.stream().filter((u) -> u.getName().equals("张三")).count();
        System.out.println(count);
        List<Integer> collect = users.stream().map(User::getAge).collect(Collectors.toList());
        Map<String, User> collect1 = users.stream().collect(Collectors.toMap(User::getName, v -> v, (o, n) -> n));
        for (Map.Entry<String, User> stringUserEntry : collect1.entrySet()) {
            System.out.println("key:" + stringUserEntry.getKey() + "," + "value:" + stringUserEntry.getValue());
        }
    }

    @Test
    public void testString() {
        List<String> list1 = new ArrayList<>();
        list1.add("a");
        list1.add("b");
        list1.add("c");
        list1.add("d");
        list1.add("e");


        List<String> list2 = new ArrayList<>();
        list2.add("a");
        list2.add("b");
        list2.add("c");
        list2.add("d");

        list1.addAll(list2);
        System.out.println("添加到list1");
        for (String string : list1) {
            System.out.println(string);
        }
        list1.removeAll(list2);
        System.out.println("去除重复");
        for (String string : list1) {
            System.out.println(string);
        }
    }

    @Test
    public void testRemove() {
        ArrayList<User> users = new ArrayList<>();
        users.add(new User(1, "张三", 12));
        users.add(new User(2, "李四", 13));
        users.add(new User(3, "王五", 14));
        users.add(new User(4, "赵六", 15));
        System.out.println(users);
        Object o = JSONObject.toJSON(users);
        System.out.println(o);
        String s = JSONObject.toJSONString(users);
        JSONArray objects = JSON.parseArray(s);
        List<User> users1 = objects.toJavaList(User.class);
        System.out.println(s);

//        ArrayList<User> list = new ArrayList<>();
//        list.add(new User(1,"张三1",12));
//        list.add(new User(2,"李四1",13));
//
//        boolean b = users.removeAll(list);
//
//        for (User user : users) {
//            System.out.println(user.getName());
//        }
//        System.out.println("=========");
//        boolean b1 = users.addAll(list);
//        for (User user : users) {
//            System.out.println(user);
//        }
    }

    @Test
    public void testObject() {
        boolean equals = new User(1, "张三", 12).equals(new User(1, "张三", 12));
        System.out.println(equals);
    }
}

  至此,SpringBoot整合Elasticsearch基本结束。

七、源码

  Git项目地址:springBoot

八、参考文献

  1、https://www.elastic.co/guide/

  2、https://www.bilibili.com/video/BV1hh411D7sb

posted @ 2021-08-21 17:13  温布利往事  阅读(206)  评论(0编辑  收藏  举报