微服务之分布式搜索引擎elasticsearch

什么是elasticsearch

elasticsearch是一款非常强大的开源搜索引擎，可以帮助我们从海量数据中快速找到需要的内容。

elasticsearch结合kibana、Logstash、Beats，也就是elastic stack（ELK）。被广泛应用在日志数据分析、实时监控等领域。

elasticsearch是elastic stack的核心，负责存储、搜索、分析数据。

正向索引和倒排索引

传统数据库（如MySQL）采用正向索引

elasticsearch采用倒排索引：文档（document）：每条数据就是一个文档词条（term）：文档按照语义分成的词语

倒排索引中包含两部分内容：

词条词典（Term Dictionary）：记录所有词条，以及词条与倒排列表（Posting List）之间的关系，会给词条创建索引，提高查询和插入效率

倒排列表（Posting List）：记录词条所在的文档id、词条出现频率、词条在文档中的位置等信息

文档id：用于快速获取文档词条频率（TF）：文档在词条出现的次数，用于评分

文档

elasticsearch是面向文档存储的，可以是数据库中的一条商品数据，一个订单信息。文档数据会被序列化为json格式后存储在elasticsearch中。

架构

安装elasticsearch

1.1.创建网络

因为我们还需要部署kibana容器，因此需要让es和kibana容器互联。这里先创建一个网络：

docker network create es-net

1.2.拉取镜像,注意拉取的es版本必须与il中文分词器版本一致不然会有bug

docker pull elasticsearch

1.3运行

docker run -d \
    --name es \
    -e "ES_JAVA_OPTS=-Xms512m -Xmx512m" \
    -e "discovery.type=single-node" \
    -v es-data:/usr/share/elasticsearch/data \
    -v es-plugins:/usr/share/elasticsearch/plugins \
    --privileged \
    --network es-net \
    -p 9200:9200 \
    -p 9300:9300 \
elasticsearch:7.12.1

命令解释：

- `-e "cluster.name=es-docker-cluster"`：设置集群名称

- `-e "http.host=0.0.0.0"`：监听的地址，可以外网访问

- `-e "ES_JAVA_OPTS=-Xms512m -Xmx512m"`：内存大小不推荐小于512m的内存

- `-e "discovery.type=single-node"`：非集群模式

- `-v es-data:/usr/share/elasticsearch/data`：挂载逻辑卷，绑定es的数据目录

- `-v es-logs:/usr/share/elasticsearch/logs`：挂载逻辑卷，绑定es的日志目录

- `-v es-plugins:/usr/share/elasticsearch/plugins`：挂载逻辑卷，绑定es的插件目录

- `--privileged`：授予逻辑卷访问权

- `--network es-net` ：加入一个名为es-net的网络中

- `-p 9200:9200`：端口映射配置

2.部署kibana

kibana可以给我们提供一个elasticsearch的可视化界面，便于我们学习。

docker pull kibana //版本应该与es一致

运行docker命令，部署kibana

docker run -d \
--name kibana \
-e ELASTICSEARCH_HOSTS=http://es:9200 \
--network=es-net \
-p 5601:5601  \
kibana:7.12.1

- `--network es-net` ：加入一个名为es-net的网络中，与elasticsearch在同一个网络中

- `-e ELASTICSEARCH_HOSTS=http://es:9200"`：设置elasticsearch的地址，因为kibana已经与elasticsearch在一个网络，因此可以用容器名直接访问elasticsearch

- `-p 5601:5601`：端口映射配置

3.安装IK分词器

# 进入容器内部
docker exec -it elasticsearch /bin/bash

# 在线下载并安装
./bin/elasticsearch-plugin  install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.12.1/elasticsearch-analysis-ik-7.12.1.zip

#退出
exit
#重启容器
docker restart elasticsearch

索引操作

mapping属性

创建索引库

文档操作

RestClient操作索引库

什么是RestClient

ES官方提供了各种不同语言的客户端，用来操作ES。这些客户端的本质就是组装DSL语句，通过http请求发送给ES。官方文档地址：https://www.elastic.co/guide/en/elasticsearch/client/index.html

初始化JavaRestClient

1引入es的RestHighLevelClient依赖：

<dependency>
    <groupId>org.elasticsearch.client</groupId>
    <artifactId>elasticsearch-rest-high-level-client</artifactId>
</dependency>

2.因为SpringBoot默认的ES版本是7.6.2，所以我们需要覆盖默认的ES版本：

<properties>
    <java.version>1.8</java.version>
    <elasticsearch.version>7.12.1</elasticsearch.version> 
</properties>

3.初始化RestHighLevelClient：

RestHighLevelClient client = new RestHighLevelClient(RestClient.builder(
        HttpHost.create("http://192.168.150.101:9200")
));

4：创建索引库

创建索引库代码如下：

@Test
void testCreateHotelIndex() throws IOException {
    // 1.创建Request对象
    CreateIndexRequest request = new CreateIndexRequest("hotel");
    // 2.请求参数，MAPPING_TEMPLATE是静态常量字符串，内容是创建索引库的DSL语句
    request.source(MAPPING_TEMPLATE, XContentType.JSON);
    // 3.发起请求
    client.indices().create(request, RequestOptions.DEFAULT);
}

步骤5：删除索引库、判断索引库是否存在

删除索引库代码如下：

@Test
void testDeleteHotelIndex() throws IOException {
    // 1.创建Request对象 
    DeleteIndexRequest request = new DeleteIndexRequest("hotel");
    // 2.发起请求
    client.indices().delete(request, RequestOptions.DEFAULT);
}

判断索引库是否存在

@Test
void testExistsHotelIndex() throws IOException {
    // 1.创建Request对象
    GetIndexRequest request = new GetIndexRequest("hotel");
    // 2.发起请求 
    boolean exists = client.indices().exists(request, RequestOptions.DEFAULT);
    // 3.输出
    System.out.println(exists);
}

RestClient操作文档

步骤1：初始化JavaRestClient

public class ElasticsearchDocumentTest {

    // 客户端
    private RestHighLevelClient client; 
    
    @BeforeEach
    void setUp() {
        client = new RestHighLevelClient(RestClient.builder(
                HttpHost.create("http://192.168.150.101:9200")
        ));
    }
    @AfterEach
    void tearDown() throws IOException {
        client.close();
    }
}

步骤2：添加酒店数据到索引库

//先查询酒店数据，然后给这条数据创建倒排索引，即可完成添加：
@Test
void testIndexDocument() throws IOException {
    // 1.创建request对象 
    IndexRequest request = new IndexRequest("indexName").id("1");
    // 2.准备JSON文档
    request.source("{\"name\": \"Jack\", \"age\": 21}", XContentType.JSON);
    // 3.发送请求
    client.index(request, RequestOptions.DEFAULT);
}

步骤3：根据id查询酒店数据

根据id查询到的文档数据是json，需要反序列化为java对象：

@Test
void testGetDocumentById() throws IOException {
    // 1.创建request对象
    GetRequest request = new GetRequest("indexName", "1");
    // 2.发送请求，得到结果
    GetResponse response = client.get(request, RequestOptions.DEFAULT);
    // 3.解析结果 
    String json = response.getSourceAsString();

    System.out.println(json);
}

步骤4：根据id修改酒店数据

修改文档数据有两种方式：

方式一：全量更新。再次写入id一样的文档，就会删除旧文档，添加新文档

方式二：局部更新。只更新部分字段，我们演示方式二

@Test
void testUpdateDocumentById() throws IOException {
    // 1.创建request对象
    UpdateRequest request = new UpdateRequest("indexName", "1");
    // 2.准备参数，每2个参数为一对 key value
    request.doc(
            "age", 18,
            "name", "Rose"
    );
    // 3.更新文档
    client.update(request, RequestOptions.DEFAULT);
}

步骤5：根据id删除文档数据

删除文档代码如下：

@Test
void testDeleteDocumentById() throws IOException {
    // 1.创建request对象
    DeleteRequest request = new DeleteRequest("indexName", "1");
    // 2.删除文档 
    client.delete(request, RequestOptions.DEFAULT);
}

利用JavaRestClient批量导入酒店数据到ES

需求：批量查询酒店数据，然后批量导入索引库中

思路：

利用mybatis-plus查询酒店数据

将查询到的酒店数据（Hotel）转换为文档类型数据（HotelDoc）

利用JavaRestClient中的Bulk批处理，实现批量新增文档，示例代码如下

 void testBulkRequest() throws IOException {
        //批量查询酒店数据
        List<Hotel> hotels = service.list();
        //转换为文档类型

        // 创建Request
        BulkRequest request = new BulkRequest();
        //准备参数 ,添加多个新增的request
        for (Hotel hotel : hotels) {
            HotelDoc hotelDoc = new HotelDoc(hotel);
            //创建新增文档的request对象
            request.add(new IndexRequest("hotel")
                    .id(hotelDoc.getId().toString())
                    .source(JSON.toJSONString(hotelDoc),XContentType.JSON));
        }
        //发送请求
        client.bulk(request,RequestOptions.DEFAULT);
    }