ElasticSearch-环境准备

前言

收集大量的日志信息之后，把这些日志存放在哪里？才能对其日志内容进行搜素呢？MySQL？

1.MySQL海量数据下全文检索效率低

如果MySQL里存储了1000W条这样的数据，每条记录的details字段有128个字。

用户想要查询details字段包含“ajax”这个关键词的记录。

select * from tb_log where log_details like "%ajax%";

使用like模糊查询且左边有通配符，会导致索引失效；

每次执行这条SQL语句，都需要逐一查询logtable中每条记录，最头痛的是找到这条记录之后，每次还要对这条记录中details字段里的文本内容进行全文扫描。

判断这个当前记录中的details字段是的内容否包含 “ajax”？有可能会查询 10000w*128次.

所以想要支持搜素details字段的Text内容的情况下，把海量的日志信息存在MySQL中是不太合理的。

2.MySQL无法实现分词查询功能

如果用户不明确搜索目的，虽然在电商平台上搜索了1个关键词 ‘平板电视’，但也想看看平板电脑、液晶电视等商品；

select * from tb_good where good_name like "%平板电视%";

以上SQL语句只能查询到商品名称包含“平板电视“4个字连在一起的平板电视商品；

无法按照自然语言习对关键词进行分词处理，导致用户无法获得丰富的查询结果，例如：把商品名称包含“电视“和“平板”的商品也查询出来；

ES不是银弹只做搜索功能，否则搭建ES维护起来也复杂；

一、Elasticsearch简介

Elasticsearch是一个开源的高扩展的分布式全文检索引擎;

Elasticsearch是一个基于Lucene的分布式、高性能、可伸缩的搜素和分析系统，它对外提供了RESTful web API。

ElasticSearch和MySQL相比有以下区别

Mysql数据操作具备事务性，而ElasticSearch没有
MySQL支持外键，而ElasticSearch不支持
Mysql采用B+树索引，而ElasticSearch采用倒排索引

1.倒排索引

ElasticSearch之所以支持全文检索，1大核心原因是ES在存储数据的时候可以对数据进行分词并构建倒排索引，这是一种典型的空间换时间思想；

倒排索引是1种用于全文搜索的数据结构

倒排索引不是由记录来确定属性值，而是由属性值来确定记录的位置，因而称为倒排索引(inverted index)。

倒排索引将文档中的每1个单词映射到包含该单词的所有文档的列表中

倒排索引在文本搜索和信息检索中广泛应用，如搜索引擎、网站搜索、文本分类等场景中。

2.全文检索

全文检索：把用户输入的关键词也进行分词，利用倒排索引，快速锁定关键词出现在那些文档。

说白了就是根据value查询key（根据文档中内容关键字，找到该该关键字所在的文档的）而非根据key查询value。

3.Lucene

Lucene是apache软件基金会4 jakarta项目组的一个java子项目，是一个开放源代码的全文检索引擎JAR包。帮助我我们实现了以上的需求。

lucene实现倒排索引之后，那么海量的数据如何分布式存储？如何高可用？集群节点之间如何管理？这是Elasticsearch实现的功能。

常说的ELK是Elasticsearch（全文搜素）+Logstash（内容收集）+Kibana（内容展示）三大开源框架首字母大写简称。

4.分片

分片是Elasticsearch中数据存储的基本单位。

Elasticsearch是1个分布式搜索引擎，它允许将1个索引分解成多个部分，每个部分都存储在不同的节点上。

这种分布式的存储方式使得Elasticsearch能够处理大量的数据，同时保持高效的查询性能。

1个分片默认可以存储最大约20亿个文档

索引的分片完成分配后，由于索引的路由机制，将不能重新修改分片数量。

5.分片副本

副本是对分片的复制，用于提供高可用性和故障恢复能力。

每个主分片都可以有1个或多个副本分片。

如果主分片所在的节点发生故障，副本分片可以用于提供服务，从而确保数据的高可靠性。

副本还允许查询请求在多个节点上并行处理，从而提高查询性能。

默认情况下，Elasticsearch为每个索引创建5个主分片，并为每个主分片创建1个副本分片。

6.分片和分片副本总结

分片和副本在Elasticsearch中扮演着不同的角色：

分片负责数据的分布存储
副本则负责数据的复制以提供高可用性和查询性能

用户可以根据自己的需求调整分片和副本的数量，以优化存储和查询性能。

7.ES底层写入流程

二、Elasticsearch安装

基于docker安装ElasticSearch与kibana客户端；

1.安装ElasticSearch

1.1.下载ElasticSearch7.10.1镜像

注意ES的版本要和客户端的依赖包版本保存一致；

docker pull elasticsearch:7.10.1

1.2.创建ElasticSearch配置和数据存储目录

mkdir -p /mydata/elasticsearch/{config,data,plugins}

1.3.配置文件配置ip信息

echo "http.host: 0.0.0.0" > /mydata/elasticsearch/config/elasticsearch.yml

1.4.配置目录权限

 chmod -R 775 /mydata/elasticsearch/

1.5.启动容器并配置端口映射与目录映射

docker run --name elasticsearch -p 9200:9200 -p 9300:9300 \
-e "discovery.type=single-node" \
-e ES_JAVA_OPTS="-Xms256m -Xmx1024m" \
-v /mydata/elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml \
-v /mydata/elasticsearch/data:/usr/share/elasticsearch/data \
-v /mydata/elasticsearch/plugins:/usr/share/elasticsearch/plugins \
-d elasticsearch:7.10.1

1.6.确定是否启动成功

2.安装Kibana

Kibana是1个ElasticSearch的客户端，可以通过可视化界面完成对于ElasticSearch的各种操作。

2.1.下载Kibana7.10.1镜像

docker pull kibana:7.10.1

2.2.创建配置文件夹

[root@itcast ~]# mkdir -p  /mydata/kibana/config
[root@itcast ~]# cd  /mydata/kibana/config
[root@itcast config]# touch kibana.yml
[root@itcast config]#

2.3.创建并修改配置文件

## ** THIS IS AN AUTO-GENERATED FILE **
## Default Kibana configuration for docker target
##根据自己实际IP修改elasticsearch地址
server.name: kibana
server.host: "0"
elasticsearch.hosts: [ "http://192.168.56.18:9200" ]
xpack.monitoring.ui.container.elasticsearch.enabled: true
#开启中文配置
i18n.locale: "zh-CN"

2.4.启动容器

docker run -d \
>   --name=kibana \
>   --restart=always \
>   -p 5601:5601 \
>   -v /mydata/kibana/config/kibana.yml:/usr/share/kibana/config/kibana.yml \
>   kibana:7.10.1

2.5.访问效果如下

3.安装DockerPortainer

Docker容器在修改完配置之后很容易出现故障导致启动失败；

此时就需要1个图形化的管理工具，对容器进行可视化操作，也方便查看日志，快速定位问题所在；

3.1.下载portainer镜像

docker pull portainer/portainer

3.2.启动portainer容器

portainer也是1个容器但是这个容器是用来管理其他容器的；

docker pull portainer/portainer
2.2 启动portainer
#创建文件存储文件
docker volume create portainer_data
#创建并启动容器 --restart=always   开机自动启动`
docker run -d -p 9000:9000 --name=portainer --restart=always
 -v /var/run/docker.sock:/var/run/docker.sock -v portainer_data:/data portainer/portainer

3.3.管理容器

最后确保以下3个容器都已经正常运行；

三、Elasticsearch使用

ES在使用时，会涉及到5个核心概念：

索引（Index）
映射（Mapping）
域（Field）
文档（Document）
倒排索引

在老版本ElasticSearch中还有1个概念Type，用于进行数据分类，但是在ES7开始已经将Type移除;

ElasticSearch	Mysql
索引（Index）	表（Table）
映射（Mapping）	表结构
域（Field）	字段列（Column）
文档（Document）	一条数据（Row）

1.索引（Index）

索引相当于关系型数据库中的一张表，1个index（索引）中包含若干个document（文档）；

1.1.添加索引

先添加索引在指定索引的映射

#先创建索引
PUT student
#再补充索引中的映射
PUT student/_mapping
{
  "properties":{
      "username":{
        "type":"text"
      },
    "age":{"type":"integer"},
    "birthday":{"type":"date","format":"yyyy-MM-dd" }
  }
  
}

添加索引并指定映射

PUT student
 {
  "mappings":{
  "properties":{
    "username":{"type":"text"},
    "age":{"type":"integer"},
    "birthday":{"type":"date","format":"yyyy-MM-dd" }
               }
          }
   
 }

1.2.查询索引

GET person

1.3.查询多个索引

PUT person1
GET person,person1

1.4. 查询所有索引信息

GET _all

1.5.删除索引

DELETE person1

2.别名机制

由于倒排索引的缘故，在ES中无法删除字段，也无法修改字段类型;

所有我们一般对外暴露index（索引）的别名，而不是真正的索引名称；

如果在创建索引的时候或者在映射中新增字段，指定了错误的映射，有2种解决方案；

2.1.索引没有投入使用

如果索引中没有导入数据，删除索引再重新指定映射；

2.2.索引已经投入使用

新创建1个正确的索引
把之前错误索引中数据导入到新建正确索引中
使用别名指向新的正确索引；

#模拟错误：新增了1个student1索引
PUT student1
PUT student1/_mapping
{
  "properties":{
    "birtday":{"type":"text"}
  }
  
}

#创建别名student--》student1
POST _aliases
{
  "actions": [
    {
      "add": {
        "index": "student1",
        "alias": "student"
      }
    }
  ]
  
}

#发现指定错了映射，birtday字段应该为date类型
GET student1



#开始修改，新增1个索引指定正确的映射student2，设置birtday字段为date类型
PUT student2
PUT student2/_mapping
{
  "properties":{
    "birthday":{"type":"date","format":"yyyy-MM-dd" }
  }
  
}

#删除之前指定的别名 student--》student1
POST _aliases
{
  "actions": [
    {
      "remove": {
        "index": "student1",
        "alias": "student"
      }
    }
  ]
}

#创建新的别名student--》student2
POST _aliases
{
  "actions": [
    {
      "add": {
        "index": "student2",
        "alias": "student"
      }
    }
  ]
  
}

#查看新增的索引student2确定索引也是正确的
GET student2
#删除student1
DELETE student1;
使用Java程序把之前的数据填充到student2索引中

3.域（Field）

域（Field）相当于数据表中的字段列；

创建完索引之后需要在索引中设置域的相关信息，如：域的名称，域的数据类型等。这个过程称为映射（Mapping）；

ES中的域（字段）支持以下几种数据类型

3.1.字符串

text：会进行分词，如华为手机，会分成：华为，手机。 被分出来的每一个词，称为term（词条）
keyword：不会进行分词，如华为手机，只有一个词条，即华为手机。

3.2.数值

long：带符号64位整数
integer：带符号32位整数
short：带符号16位整数
byte：带符号8位整数
double：双精度64位浮点数
float：单精度32位浮点数
half_float：半精度16位浮点数

3.3.布尔：

boolean

3.4.二进制：

binary

3.5.日期：

date

3.6.范围类型：

integer_range
float_range
long_range
double_range
date_range

3.7.数组

3.8.对象

4.文档（Document）

ES中最小的数据单元，代表索引中的一条数据，通常是使用json的数据格式表示的；

4.1.添加文档

4.1.1.添加文档，手动设置id

POST person/_doc/1
{
  "name":"张三",
  "age":18,
  "address":"北京"
}

4.1.2.添加文档，自动生成id

POST person/_doc
{
  "name":"李四",
  "age":20,
  "address":"北京"
}

4.2.查询文档

4.2.1.根据id查询文档

GET person/_doc/1

4.2.2.查询所有文档

GET person/_search

4.2.3.查询条件

GET logstash-2022.12.12/_search
{
  "_source": { 
    "includes":["@timestamp","message","stream"]
  },
  "sort": [
    {
      "@timestamp": {
        "order": "desc"
      }
    }
  ],
  "from": 1,
  "size": 3
}

4.3.修改文档

PUT person/_doc/1
{
  "name": "张三丰",
  "age": 180,
  "address": "武当山"
}

4.4.删除文档

DELETE person/_doc/1

四、SpringBoot操作ElasticSearch

使用restHighLevelClient操作ES的流程如下：

1.构建查询请求

2.处理ES响应的查询结果集

1.SpringBoot整合ElasticSearch

1.1.建立Maven工程，并引入相关坐标

<dependencies>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-test</artifactId>
        </dependency>
        <!--引入es的坐标-->
        <dependency>
            <groupId>org.elasticsearch.client</groupId>
            <artifactId>elasticsearch-rest-high-level-client</artifactId>
            <version>7.10.1</version>
        </dependency>
        <dependency>
            <groupId>org.elasticsearch.client</groupId>
            <artifactId>elasticsearch-rest-client</artifactId>
            <version>7.10.1</version>
        </dependency>
        <dependency>
            <groupId>org.elasticsearch</groupId>
            <artifactId>elasticsearch</artifactId>
            <version>7.10.1</version>
        </dependency>

        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <scope>test</scope>
        </dependency>
    </dependencies>

pom.xml

1.2.application.yml

elasticsearch:
  host: 192.168.56.18
  port: 9200

1.3.创建启动类EsApplication

package com.zhanggen.es.demo;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;

@SpringBootApplication
public class EsApplication {
    public static void main(String[] args) {
        SpringApplication.run(EsApplication.class,args);
    }
}

EsApplication.java

1.4.创建es配置类

package com.zhanggen.es.demo.config;

import org.apache.http.HttpHost;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestClientBuilder;
import org.elasticsearch.client.RestHighLevelClient;
import org.springframework.boot.context.properties.ConfigurationProperties;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

@Configuration
@ConfigurationProperties(prefix="elasticsearch")
public class ElasticSearchConfig {

    private String host;

    private int port;


    public String getHost() {
        return host;
    }

    public void setHost(String host) {
        this.host = host;
    }

    public int getPort() {
        return port;
    }

    public void setPort(int port) {
        this.port = port;
    }

    @Bean
    public RestHighLevelClient restHighLevelClient(){
        RestClientBuilder builder = RestClient.builder(new HttpHost(host, port, "http"));
        builder.setRequestConfigCallback(requestConfigBuilder ->{
            requestConfigBuilder.setConnectionRequestTimeout(500000);
            requestConfigBuilder.setSocketTimeout(500000);
            requestConfigBuilder.setConnectTimeout(500000);
            return requestConfigBuilder;
        });
        return new RestHighLevelClient(builder);
    }
}

ElasticSearchConfig.java

1.5.创建测试类ESTest

package com.zhanggen.es.demo;

import org.elasticsearch.client.RestHighLevelClient;
import org.junit.runner.RunWith;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.test.context.junit4.SpringRunner;

@SpringBootTest
@RunWith(SpringRunner.class)
public class ESTest {
    @Autowired
    private RestHighLevelClient restHighLevelClient;


}

ESTest.java

2.操作索引

对ES中索引的增、删、查操作

package com.itheima.es.demo;

import org.elasticsearch.action.admin.indices.delete.DeleteIndexRequest;
import org.elasticsearch.action.support.master.AcknowledgedResponse;
import org.elasticsearch.client.IndicesClient;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.client.indices.CreateIndexRequest;
import org.elasticsearch.client.indices.CreateIndexResponse;
import org.elasticsearch.client.indices.GetIndexRequest;
import org.elasticsearch.client.indices.GetIndexResponse;
import org.elasticsearch.cluster.metadata.MappingMetadata;
import org.elasticsearch.common.xcontent.XContentType;
import org.junit.Test;
import org.junit.runner.RunWith;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.test.context.junit4.SpringRunner;

import java.io.IOException;
import java.util.Map;

@SpringBootTest
@RunWith(SpringRunner.class)
public class ESTest {

    //注入操作ElasticSearch的客户端
    @Autowired
    private RestHighLevelClient restHighLevelClient;

    //测试创建索引
    @Test
    public void createIndexTest() throws IOException {
        IndicesClient indicesClient = restHighLevelClient.indices();
        //设置索引名称
        CreateIndexRequest createIndexRequest=new CreateIndexRequest("student");
        //定义索引的映射(结构)
        String mappingInfo = "{\n" +
                "  \"properties\":{\n" +
                "    \"name\":{\n" +
                "      \"type\":\"keyword\"\n" +
                "    },\n" +
                "    \"age\":{\n" +
                "      \"type\":\"integer\"\n" +
                "    },\n" +
                "    \"address\":{\n" +
                "      \"type\":\"text\"\n" +
                "    }\n" +
                "  }\n" +
                "}";
        createIndexRequest.mapping(mappingInfo, XContentType.JSON);
        CreateIndexResponse response = indicesClient.create(createIndexRequest, RequestOptions.DEFAULT);
        System.out.println(response.isAcknowledged());

    }

    //测试查询索引
    @Test
    public  void findIndexTest() throws IOException {
        IndicesClient indicesClient = restHighLevelClient.indices();
        GetIndexRequest getIndexRequest = new GetIndexRequest("student");
        GetIndexResponse response = indicesClient.get(getIndexRequest, RequestOptions.DEFAULT);
        Map<String, MappingMetadata> mappings = response.getMappings();
        for (String key : mappings.keySet()) {
            System.out.println(key+"==="+mappings.get(key).getSourceAsMap());
        }
    }
    //删除索引
    @Test
    public void delIndex() throws IOException {
        IndicesClient indicesClient = restHighLevelClient.indices();
        DeleteIndexRequest delIndexRequest = new DeleteIndexRequest("student");
        AcknowledgedResponse response = indicesClient.delete(delIndexRequest, RequestOptions.DEFAULT);
        System.out.println(response.isAcknowledged());
    }

}