SpringBoot3.x整合ElasticSearch8.x

ElasticSearch

docker部署

容器编排（docker-compose.yml）

version: "3.1"
# 服务配置
services:
  elasticsearch:
    container_name: elasticsearch-8.8.1
    image: docker.elastic.co/elasticsearch/elasticsearch:8.8.1
    # 用来给容器root权限（不安全）可移除
    privileged: true
    # 在linux里ulimit命令可以对shell生成的进程的资源进行限制
    ulimits:
      memlock:
        soft: -1
        hard: -1
    environment:
      - "ES_JAVA_OPTS=-Xms1024m -Xmx1024m"
      - "http.host=0.0.0.0"
      - "node.name=elastic01"
      - "cluster.name=cluster_elasticsearch"
      - "discovery.type=single-node"
    ports:
      - "9200:9200"
      - "9300:9300"
    volumes:
      # - ./elasticsearch/config:/usr/share/elasticsearch/config
      - ./elasticsearch/data:/usr/share/elasticsearch/data
      # - ./elasticsearch/plugin/xxx:/usr/share/elasticsearch/plugins/xxx
    networks: 
      - elastic_net
  kibana:
    container_name: kibana-8.8.1
    image: docker.elastic.co/kibana/kibana:8.8.1
    ports:
      - "5601:5601"
    # volumes:
    #   - ./kibana/config:/usr/share/kibana/config
    networks:
      - elastic_net
# 网络配置
networks:
  elastic_net:
    driver: bridge

---部署---

启动服务 docker-compose up -d
创建文件

# 1.创建 kibana 映射目录
# 2.拷贝 elasticsearch 配置
# 3.拷贝 kibana 配置
mkdir kibana
docker cp elasticsearch-8.8.1:/usr/share/elasticsearch/config ./elasticsearch
docker cp kibana-8.8.1:/usr/share/kibana/config ./kibana

elasticsearch配置（config/elasticsearch.yml）

# 集群节点名称
node.name: "elastic01"
# 设置集群名称为elasticsearch
cluster.name: "cluster_elasticsearch"
# 网络访问限制
network.host: 0.0.0.0
# 以单一节点模式启动
discovery.type: single-node


# 是否支持跨域
http.cors.enabled: true
# 表示支持所有域名
http.cors.allow-origin: "*"
# 内存交换的选项，官网建议为true
bootstrap.memory_lock: true

# 修改安全配置 关闭 证书校验
xpack.security.http.ssl:
  enabled: false
xpack.security.transport.ssl:
  enabled: false

kibana配置（kibana.yml）

# 中文
i18n.locale: zh-CN

放开docker-compose.yml中的注解

- ./elasticsearch/config:/usr/share/elasticsearch/config
- ./kibana/config:/usr/share/kibana/config
# 重启容器
docker compose restart
# 访问地址：
elastic：http://localhost:9200
kibana：http://localhost:5601

设置elastic密码

docker exec -it elasticsearch-8.8.1 /usr/share/elasticsearch/bin/elasticsearch-reset-password -uelastic

设置kibana密码

docker exec -it elasticsearch-8.8.1 /usr/share/elasticsearch/bin/elasticsearch-reset-password -ukibana_system

添加kibana配置

elasticsearch.username: kibana_system
elasticsearch.password: Ux*7RMHX0ErLEn4=RMmx

获取验证码

docker exec -it kibana-8.8.1 /usr/share/kibana/bin/kibana-verification-code

登录kibana

elastic
xxxx  elastic密码

基本操作

ES8.x：URL的组成

/索引/文档/内容ID

索引Index

# 查看索引列表
GET /_cat/indices?v=true&pretty

# 查看分片情况
GET /_cat/shards?v=true&pretty

# 创建索引（Create Index）
PUT /<index_name>
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 1
  }
}

# 查看索引是否存在
HEAD /<index_name>
 
# 获取索引（Get Index）
GET /<index_name>

# 更新索引设置（Update Index Settings）
PUT /<index_name>/_settings
{
  "settings": {
    "number_of_replicas": 2
  }
}

# 删除索引（Delete Index）：
DELETE /<index_name>

文档Document

# 查询文档
GET /xdclass_shop/_doc/1
# 新增文档（指定ID）
PUT /xdclass_shop/_doc/1
{
  "id":5555,
  "title":"content111",
  "pv":144
}
# 新增文档（不指定ID,自动生成id）
POST /xdclass_shop/_doc
{
  "id":123,
  "title":"content222",
  "pv":244
}

# 修改（put和post都行，需要指定id）
PUT /xdclass_shop/_doc/1
{
  "id":999,
  "title":"content111v2",
  "pv":999,
  "uv":55
}

POST /xdclass_shop/_doc/1
{
  "id":999,
  "title":"content222v2",
  "pv":999,
  "uv":559
}

# 搜索
GET /xdclass_shop/_search

# 字段解释
#   took字段表示该操作的耗时（单位为毫秒）。
#   timed_out字段表示是否超时。
#   hits字段表示搜到的记录，数组形式。
#   total：返回记录数，本例是1条。
#   max_score：最高的匹配程度，本例是1.0

# 删除数据
DELETE /xdclass_shop/_doc/1

Mapping和常见字段类型

什么是Mapping
- 类似于数据库中的表结构定义 schema，
- 定义索引中的字段的名称，字段的数据类型，比如字符串、数字、布尔等

查看索引库的字段类型

# GET /[index]/_mapping
GET /my_index/_mapping

Dynamic Mapping（动态映射）
- 用于在索引文档时自动检测和定义字段的数据类型
- 当我们向索引中添加新文档时，Elasticsearch会自动检测文档中的各个字段，并根据它们的值来尝试推断字段类型
- 常见的字段类型包括文本（text）、关键词（keyword）、日期（date）、数值（numeric）等
- 动态映射具备自动解析和创建字段的便利性，但在某些情况下，由于字段类型的不确定性，动态映射可能会带来一些问题
- 例如字段解析错误、字段类型不一致等，如果对字段类型有明确的要求，最好在索引创建前通过显式映射定义来指定字段类型
ElasticSearch常见的数据类型
- 在 ES 7.X后有两种字符串类型：Text 和 Keyword
  - Text类型：用于全文搜索的字符串类型，支持分词和索引建立
  - Keyword类型：用于精确匹配的字符串类型，不进行分词，适合用作过滤和聚合操作。
- Numeric类型：包括整数类型（long、integer、short、byte）和浮点数类型（double、float）。
- Date类型：用于存储日期和时间的类型。
- Boolean类型：用于存储布尔值（true或false）的类型。
- Binary类型：用于存储二进制数据的类型。
- Array类型：用于存储数组或列表数据的类型。
- Object类型：用于存储复杂结构数据的类型
指定索引库字段类型mapping

PUT /my_index
{
  "mappings": {
    "properties": {
      "id": {
        "type": "keyword"
      },
      "title": {
        "type": "text"
      },
      "price": {
        "type": "float"
      }
    }
  }
}

最高频使用的数据类型
- text字段类型
  - text类型主要用于全文本搜索，适合存储需要进行全文本分词的文本内容，如文章、新闻等。
  - text字段会对文本内容进行分词处理，将文本拆分成独立的词项（tokens）进行索引
  - 分词的结果会建立倒排索引，使搜索更加灵活和高效。
  - text字段在搜索时会根据分词结果进行匹配，并计算相关性得分，以便返回最佳匹配的结果。
- keyword字段类型
  - keyword类型主要用于精确匹配和聚合操作，适合存储不需要分词的精确值，如ID、标签、关键字等。
  - keyword字段不会进行分词处理，而是将整个字段作为一个整体进行索引和搜索
  - 这使得搜索只能从精确的值进行匹配，而不能根据词项对内容进行模糊检索。
  - keyword字段适合用于过滤和精确匹配，同时可以进行快速的基于精确值的聚合操作。
- 总结
  - 在选择text字段类型和keyword字段类型时，需要根据具体的需求进行权衡和选择：
  - 如果需要进行全文本检索，并且希望根据分词结果计算相关性得分，以获得最佳的匹配结果，则选择text字段类型。
  - 如果需要进行精确匹配、排序或聚合操作，并且不需要对内容进行分词，则选择keyword字段类型。

案例实战

创建索引并插入文档

PUT /my_index
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text"
      },
      "tags": {
        "type": "keyword"
      },
      "publish_date": {
        "type": "date"
      },
      "rating": {
        "type": "float"
      },
      "is_published": {
        "type": "boolean"
      },
      "author": {
        "properties": {
          "name": {
            "type": "text"
          },
          "age": {
            "type": "integer"
          }
        }
      },
      "comments": {
        "type": "nested",
        "properties": {
          "user": {
            "type": "keyword"
          },
          "message": {
            "type": "text"
          }
        }
      }
    }
  }
}

POST /my_index/_doc/1
{
  "title": "小滴课堂最近上线了新课 Elasticsearch Introduction",
  "tags": ["search", "big data", "distributed system", "小滴课堂"],
  "publish_date": "2025-01-01",
  "rating": 4.5,
  "is_published": true,
  "author": {
    "name": "John Doe",
    "age": 30
  },
  "comments": [
    {
      "user": "Alice",
      "message": "Great article!"
    },
    {
      "user": "Bob",
      "message": "Very informative."
    }
  ]
}

查询匹配text字段类型（分词）

GET /my_index/_search
{
  "query": {
    "match": {
      "title": "Elasticsearch"
    }
  }
}

查询匹配keyword字段类型（不分词）

GET /my_index/_search
{
  "query": {
    "match": {
      "tags": "big data"
    }
  }
}

分词器

什么是搜索引擎的分词
- 在Elasticsearch 8.X中，分词（tokenization）是将文本内容拆分成独立的单词或词项（tokens）的过程
- 分词是搜索引擎在建立索引和执行查询时的关键步骤，将文本拆分成单词，并构建倒排索引，可以实现更好的搜索和检索效果。
- 案例
```
假设我们有两个产品标题：

"Apple iPhone 12 Pro Max 256GB"

"Samsung Galaxy S21 Ultra 128GB"

使用默认的标准分词器（Standard Tokenizer），这些标题会被分割为以下令牌：

标题1：["Apple", "iPhone", "12", "Pro", "Max", "256GB"]

标题2：["Samsung", "Galaxy", "S21", "Ultra", "128GB"]

分词器根据标点符号和空格将标题拆分为独立的词语。当我们执行搜索时，可以将查询进行分词，并将其与标题中的令牌进行匹配。

例如
  如果我们搜索"iPhone 12"，使用默认的分词器，它会将查询分解为["iPhone", "12"]，然后与令牌进行匹配。 
  对于标题1，令牌["iPhone", "12"]匹配，它与查询相符。 标题2中没有与查询相符的令牌
```
- 分词规则是指定义如何将文本进行拆分的规则和算法
- Elasticsearch使用一系列的分词器（analyzer）和标记器（tokenizer）来处理文本内容
- 分词器通常由一个或多个标记器组成，用于定义分词的具体规则
  - 以下是分词的一般过程：
    - 标记化（Tokenization）：
      - 分词的第一步是将文本内容拆分成单个标记（tokens），标记可以是单词、数字、特殊字符等。
      - 标记化过程由标记器（tokenizer）执行，标记器根据一组规则将文本切分为标记。
    - 过滤（Filtering）：
      - 标记化后，标记会进一步被过滤器（filters）处理。
      - 过滤器执行各种转换和操作，如转换为小写、去除停用词（stop words）,词干提取（stemming）,同义词扩展等。
    - 倒排索引（Inverted Indexing）：
      - 分词处理完成后，Elasticsearch使用倒排索引（inverted index）来存储分词结果。
      - 倒排索引是一种数据结构，通过将标记与其所属文档进行映射，快速确定包含特定标记的文档。
    - 查询匹配：
      - 当执行查询时，查询的文本也会进行分词处理。
      - Elasticsearch会利用倒排索引来快速查找包含查询标记的文档，并计算相关性得分。
- 常见的分词器，如Standard分词器、Simple分词器、Whitespace分词器、IK分词等，还支持自定义分词器
默认的Standard分词器的分词规则
- 标点符号切分：
  - 标点符号会被删除，并将连字符分隔为两个独立的词。
  - 例如，"Let's go!" 会被切分为 "Let", "s", "go"。
- 小写转换：
  - 所有的文本会被转换为小写形式。
  - 例如，"Hello World" 会被切分为 "hello", "world"。
- 停用词过滤：
  - 停用词（stop words）是在搜索中没有实际意义的常见词，如 "a", "an", "the" 等。
  - 停用词会被过滤掉，不会作为独立的词进行索引和搜索。
- 词干提取：
  - 通过应用Porter2词干提取算法，将单词还原为其原始形式。
  - 例如，running -> run、swimming -> swim、jumped -> jump
- 词分隔：
  - 按照空格将文本切分成词项（tokens）。
如何查看ES分词存储效果？
- 使用analyze API 来对文本进行分词处理并查看分词结果，基本语法如下
```
GET /_analyze
{
  "analyzer": "分词器名称",
  "text": "待分析的文本"
}
```
- 案例
```
#字段是text类型
POST /my_index/_analyze
{
  "field": "title",
  "text": "This is some text to analyze"
}

#字段是text类型
POST /my_index/_analyze
{
  "field": "title",
  "text": "今天我在小滴课堂学习架构大课"
}

#字段是keyword类型
POST /my_index/_analyze
{
  "field": "tags",
  "text": "This is some text to analyze"
}

#字段是keyword类型
POST /my_index/_analyze
{
  "field": "tags",
  "text": ["This is","小滴课堂","Spring Boot" ]
}
```
  - 每个分词结果对象包含
    - 分词后的单词（token）
    - 开始位置（start_offset）
    - 结束位置（end_offset）
    - 类型（type）
      - ALPHANUM是一种数据类型，表示一个字符串字段只包含字母和数字，并且不会进行任何其他的分词或处理
      - 它会忽略字段中的任何非字母数字字符（如标点符号、空格等），只保留字母和数字字符
    - 单词在原始文本中的位置（position）

IK中文分词器

背景

在Elasticsearch 8.X中，分词（tokenization）是将文本内容拆分成独立的单词或词项（tokens）的过程
默认的Standard分词器对中文支持不是很友好，比如

#字段是text类型
POST /my_index/_analyze
{
  "field": "title",
  "text": "我今天去小滴课堂学习spring cloud项目大课"
}

#结果如下，中文每个字单独一个词
{
  "tokens": [
    {
      "token": "我",
      "start_offset": 0,
      "end_offset": 1,
      "type": "<IDEOGRAPHIC>",
      "position": 0
    },
    {
      "token": "今",
    },
    {
      "token": "天",
    },
    ......
    {
      "token": "spring",
      "start_offset": 10,
      "end_offset": 16,
      "type": "<ALPHANUM>",
      "position": 10
    },
    {
      "token": "cloud",
      "start_offset": 17,
      "end_offset": 22,
      "type": "<ALPHANUM>",
      "position": 11
    },
    {
      "token": "项",
      "start_offset": 22,
      "end_offset": 23,
      "type": "<IDEOGRAPHIC>",
      "position": 12
    },
    {
      "token": "目",
      "start_offset": 23,
      "end_offset": 24,
      "type": "<IDEOGRAPHIC>",
      "position": 13
    },
  ]
}

什么是IK分词器
- 是一个基于Java开发的开源中文分词器，用于将中文文本拆分成单个词语（词项）
- 是针对中文语言的特点和需求而设计的，可以有效处理中文分词的复杂性和多样性
- 地址
  - 多个版本：https://github.com/medcl/elasticsearch-analysis-ik/releases
  - 本文使用版本：https://github.com/medcl/elasticsearch-analysis-ik/releases/tag/v8.8.1
- 注意：Elastic Search版本和IK分词器版本需要对应
- 特点
  - 高效且灵活
    - IK分词器采用了多种算法和数据结构，以提供高效的分词速度。
    - 支持细粒度的分词，可以根据应用需求进行灵活的配置。
  - 分词准确性
    - IK分词器使用了词典和规则来进行分词，可以准确地将句子拆分成词语。
    - 还提供了词性标注功能，可以帮助识别词语的不同含义和用途。
  - 支持远程扩展词库
    - IK分词器可以通过配置和加载外部词典，扩展分词的能力
    - 用户可以根据实际需求，添加自定义的词典，提升分词准确性和覆盖范围。
  - 兼容性与集成性
    - IK分词器兼容了Lucene和Elasticsearch等主流搜索引擎，可以方便地集成到这些系统中。
    - 提供了相应的插件和配置选项，使得分词器的集成变得简单。
- 安装
  - 解压，放到本地/elasticsearch/plugins目录下
  - 放开docker-compose.yml的挂载注释
  - 更新Elastic Search即可
```
docker compose up -d
```

IK有两种颗粒度的拆分

ik_smart: 会做最粗粒度的拆分
ik_max_word（常用）: 会将文本做最细粒度的拆分
案例实践

GET /_analyze
{
  "text":"今天星期一,我今天去小滴课堂学习spring cloud项目大课",
  "analyzer":"ik_smart"
}


GET /_analyze
{
  "text":"今天星期一,我今天去小滴课堂学习spring cloud项目大课",
  "analyzer":"ik_max_word"
}

语法和应用

什么是Query DSL
- Query DSL（Domain-Specific Language）是一种用于构建搜索查询的强大的领域特定查询语言
- 类似我们关系性数据库的SQL查询语法,
- ES中用JSON结构化的方式定义和执行各种查询操作，在ES中进行高级搜索和过滤

基本语法

GET /索引库名/_search 
{ 
  "query":{ 
    "查询类型":{
    
    }
}

常见Query DSL查询语句和功能
- match 查询：用于执行全文搜索，它会将搜索查询与指定字段中的文本进行匹配
```
{
  "query": {
    "match": {
      "title": "elasticsearch"
    }
  }
}
```
- term 查询：用于精确匹配一个指定字段的关键词，不进行分词处理。
```
{
  "query": {
    "term": {
      "category": "books"
    }
  }
}
```
总结
- Query DSL提供了更多种类的查询和过滤语句，以满足不同的搜索需求。
- 可以根据具体的业务需求和数据结构，结合不同的查询方式来构建复杂的搜索和过滤操作

数据准备

创建索引

PUT /xdclass_shop_v1
{
  "settings": {
    "number_of_shards": 2,
    "number_of_replicas": 0
  },
  "mappings": {
    "properties": {
      "id": {
        "type": "keyword"
      },
      "title": {
        "type": "keyword"
      },
      "summary": {
        "type": "text"
      },
      "price": {
        "type": "float"
      }
    }
  }
}
# 导入数据
PUT /xdclass_shop_v1/_bulk
{ "index": { "_index": "xdclass_shop_v1" } }
{ "id": "1", "title": "Spring Boot","summary":"this is a summary Spring Boot video", "price": 9.99 }
{ "index": { "_index": "xdclass_shop_v1" } }
{ "id": "2", "title": "java","summary":"this is a summary java video", "price": 19.99 }
{ "index": { "_index": "xdclass_shop_v1" } }
{ "id": "3", "title": "Spring Cloud","summary":"this is a summary Spring Cloud video", "price": 29.99 }
{ "index": { "_index": "xdclass_shop_v1" } }
{ "id": "4", "title": "Spring_Boot", "summary":"this is a summary Spring_Boot video","price": 59.99 }
{ "index": { "_index": "xdclass_shop_v1" } }
{ "id": "5", "title": "SpringBoot","summary":"this is a summary SpringBoot video", "price": 0.99 }

match

查询全部数据（match_all）
- 是一种简单的查询，匹配索引中的所有文档
```
GET /xdclass_shop_v1/_search
{
  "query": {
    "match_all": {}
  }
}
```

有条件查询数据

match，对查询内容进行分词, 然后进行查询,多个词条之间是 or的关系
然后在与文档里面的分词进行匹配，匹配度越高分数越高越前面

GET /xdclass_shop_v1/_search
{
  "query": {
    "match": {
      "summary": "Spring"
    }
  }
}

#包括多个词
GET /xdclass_shop_v1/_search
{
  "query": {
    "match": {
      "summary": "Spring Java"
    }
  }
}

完整关键词查询
- term查询，不会将查询条件分词，直接与文档里面的分词进行匹配
- 虽然match也可以完成，但是match查询会多一步进行分词，浪费资源
```
#keyword类型字段，ES不进行分词
GET /xdclass_shop_v1/_search
{
  "query": {
    "term": {
      "title": {
        "value": "Spring Boot"
      }
    }
  }
}
```

获取指定字段

某些情况场景下，不需要返回全部字段，太废资源，可以指定source返回对应的字段

GET /xdclass_shop_v1/_search
{
"_source":["price","title"],
  "query": {
    "term": {
      "title": {
        "value": "Spring Boot"
      }
    }
  }
}

总结
- match在匹配时会对所查找的关键词进行分词，然后分词匹配查找；term会直接对关键词进行查找
- 一般业务里面需要模糊查找的时候，更多选择match，而精确查找时选择term查询

布尔-范围和分页-排序

常用Query DSL语法案例实战

range 查询
- 用于根据范围条件进行查询，例如指定价格在一定区间内的商品
- 范围符号
  - gte:大于等于
  - gt:大于
  - lte:小于等于
  - lt:小于

GET /xdclass_shop_v1/_search
{
  "query": {
    "range": {
      "price": {
        "gte": 5,
        "lte": 100
      }
    }
  }
}

分页查询
- 可以使用 from 和 size 参数进行分页查询
- 可以指定要跳过的文档数量（from）和需要返回的文档数量（size）

GET /xdclass_shop_v1/_search
{
  "size": 10,
  "from": 0,
  "query": {
    "match_all": {}
  }
}

查询结果排序
- sort字段可以进行排序 desc 和 asc

GET /xdclass_shop_v1/_search
{
  "size": 10,
  "from": 0,
  "sort": [
    {
      "price": "asc"
    }
  ],
  "query": {
    "match_all": {}
  }
}

bool 查询

通过组合多个查询条件，使用布尔逻辑（与、或、非）进行复杂的查询操作
语法格式
- "must"关键字用于指定必须匹配的条件，即所有条件都必须满足
- "must_not"关键字指定必须不匹配的条件，即所有条件都不能满足
- "should"关键字指定可选的匹配条件，即至少满足一个条件

{
  "query": {
    "bool": {
      "must": [
        // 必须匹配的条件
      ],
      "must_not": [
        // 必须不匹配的条件
      ],
      "should": [
        // 可选匹配的条件
      ],
      "filter": [
        // 过滤条件
      ]
    }
  }
}

案例实战

GET /xdclass_shop_v1/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "summary": "Cloud" }},
        { "range": { "price": { "gte": 5 }}}
      ]
    }
  }
}

查询过滤Filter

filter查询
- 来对搜索结果进行筛选和过滤，仅返回符合特定条件的文档，而不改变搜索评分
- Filter查询对结果进行缓存，提高查询性能，用于数字范围、日期范围、布尔逻辑、存在性检查等各种过滤操作。
- 语法格式
  - "filter"关键字用于指定一个过滤条件，可以是一个具体的过滤器，如term、range等，也可以是一个嵌套的bool过滤器
```
{
  "query": {
    "bool": {
      "filter": {
        // 过滤条件
      }
    }
  }
}
```

查询案例数据环境准备

创建索引库

PUT /product
{
  "settings": {
    "number_of_shards": 2,
    "number_of_replicas": 0
  },
  "mappings": {
   "properties": {
    "product_id": {
      "type": "integer"
    },
    "product_name": {
      "type": "text"
    },
    "category": {
      "type": "keyword"
    },
    "price": {
      "type": "float"
    },
    "availability": {
      "type": "boolean"
    }
  } }  
}

导入数据

通过一次 POST 请求实现批量插入
每个文档都由两部分组成：index 指令用于指定文档的元数据，product_id 为文档的唯一标识符
插入后查询 GET /product/_search

POST /product/_bulk
{ "index": { "_id": "1" } }
{ "product_id": 1, "product_name": "Product 1", "category": "books", "price": 19.99, "availability": true }
{ "index": { "_id": "2" } }
{ "product_id": 2, "product_name": "Product 2", "category": "electronics", "price": 29.99, "availability": true }
{ "index": { "_id": "3" } }
{ "product_id": 3, "product_name": "Product 3", "category": "books", "price": 9.99, "availability": false }
{ "index": { "_id": "4" } }
{ "product_id": 4, "product_name": "Product 4", "category": "electronics", "price": 49.99, "availability": true }
{ "index": { "_id": "5" } }
{ "product_id": 5, "product_name": "Product 5", "category": "fashion", "price": 39.99, "availability": true }

案例一：使用 term 过滤器查询 category 为 books 的产品：

GET /product/_search
{
  "query": {
    "bool": {
      "filter": {
        "term": {
          "category": "books"
        }
      }
    }
  }
}

案例二：使用 range 过滤器查询价格 price 在 30 到 50 之间的产品：

GET /product/_search
{
  "query": {
    "bool": {
      "filter": {
        "range": {
          "price": {
            "gte": 30,
            "lte": 50
          }
        }
      }
    }
  }
}

总结
- 过滤条件通常用于对结果进行筛选，并且比查询条件更高效
- 而bool查询可以根据具体需求组合多个条件、过滤器和查询子句

多字段匹配和短语搜索

多字段搜索匹配

业务查询，需要在多个字段上进行文本搜索，用 multi_match
在 match的基础上支持对多个字段进行文本查询匹配

语法格式

GET /index/_search
{
  "query": {
    "multi_match": {
      "query": "要搜索的文本",
      "fields": ["字段1", "字段2", ...]
    }
  }
}

# query：需要匹配的查询文本。
# fields：一个包含需要进行匹配的字段列表的数组。

短语搜索匹配

是Elasticsearch中提供的一种高级匹配查询类型，用于执行精确的短语搜索
相比于match查询，match_phrase会在匹配时考虑到单词之间的顺序和位置

语法格式

GET /index/_search
{
  "query": {
    "match_phrase": {
      "field_name": {
        "query": "要搜索的短语"
      }
    }
  }
}

# field_name：要进行匹配的字段名。
# query：要搜索的短语。

数据环境准备

# 创建索引库
PUT /product_v2
{
  "settings": {
    "number_of_shards": 2,
    "number_of_replicas": 0
  },
  "mappings": {
    "properties": {
      "product_name": {
        "type": "text"
      },
      "description": {
        "type": "text"
      },
      "category": {
        "type": "keyword"
      }
    }
  }
}
#批量插入数据
POST /product_v2/_bulk
{ "index": { "_index": "product_v2", "_id": "1" } }
{ "product_name": "iPhone 12", "description": "The latest iPhone model from Apple", "category": "electronics" }
{ "index": { "_index": "product_v2", "_id": "2" } }
{ "product_name": "Samsung Galaxy S21", "description": "High-performance Android smartphone", "category": "electronics" }
{ "index": { "_index": "product_v2", "_id": "3" } }
{ "product_name": "MacBook Pro", "description": "Powerful laptop for professionals", "category": "electronics" }
{ "index": { "_index": "product_v2", "_id": "4" } }
{ "product_name": "Harry Potter and the Philosopher's Stone", "description": "Fantasy novel by J.K. Rowling", "category": "books" }
{ "index": { "_index": "product_v2", "_id": "5" } }
{ "product_name": "The Great Gatsby", "description": "Classic novel by F. Scott Fitzgerald", "category": "books" }

多字段搜索案例实战
- 在 product_name 和 description 字段上执行了一个multi_match查询
- 将查询文本设置为 "iPhone"，对这两个字段进行搜索，并返回包含匹配到的文档，这个是OR的关系，会有最佳匹配
```
GET /product_v2/_search
{
  "query": {
    "multi_match": {
      "query": "iPhone",
      "fields": ["product_name", "description"]
    }
  }
}
```

短语搜索案例实战

使用match_phrase查询在description字段上执行了一个短语搜索将要搜索的短语设置为 "classic novel"。
使用match_phrase查询，Elasticsearch将会返回包含 "classic novel" 短语的文档

#match_phrase短语搜索
GET /product_v2/_search
{
  "query": {
    "match_phrase": {
      "description": "classic novel"
    }
  }
}

#match搜索，会进行分词
GET /product_v2/_search
{
  "query": {
    "match": {
      "description": "classic novel"
    }
  }
}

fuzzy模糊查询

什么是fuzzy模糊匹配
- fuzzy查询是Elasticsearch中提供的一种模糊匹配查询类型，用在搜索时容忍一些拼写错误或近似匹配
- 使用fuzzy查询，可以根据指定的编辑距离（即词之间不同字符的数量）来模糊匹配查询词
- 拓展：编辑距离
  - 是将一个术语转换为另一个术语所需的一个字符更改的次数。
  - 比如
    - 更改字符（box→fox)
    - 删除字符（black→lack）
    - 插入字符（sic→sick）
    - 转置两个相邻字符（dgo→dog）
- fuzzy模糊查询是拼写错误的简单解决方案，但具有很高的 CPU 开销和非常低的精准度
- 用法和match基本一致，Fuzzy query的查询不分词
- 基本语法格式
```
GET /index/_search
{
  "query": {
    "fuzzy": {
      "field_name": {
        "value": "要搜索的词",
        "fuzziness": "模糊度"
      }
    }
  }
}
```
- 解析
  - field_name：要进行模糊匹配的字段名。
  - value：要搜索的词
  - fuzziness参数指定了模糊度，常见值如下
    - 0，1，2
      - 指定数字，表示允许的最大编辑距离，较低的数字表示更严格的匹配，较高的数字表示更松散的匹配
      - fuziness的值，表示是针对每个词语而言的，而不是总的错误的数值
    - AUTO：Elasticsearch根据词的长度自动选择模糊度
      - 如果字符串的长度大于5，那 funziness 的值自动设置为2
      - 如果字符串的长度小于2，那么 fuziness 的值自动设置为 0
- 案例操作
```
# 指定模糊度2，更松散匹配
GET /xdclass_shop_v1/_search
{
  "query": {
    "fuzzy": {
      "summary": {
        "value": "clo",
        "fuzziness": "2"
      }
    }
  }
}

# 指定模糊度1，更严格匹配
GET /xdclass_shop_v1/_search
{
  "query": {
    "fuzzy": {
      "summary": {
        "value": "clo",
        "fuzziness": "1"
      }
    }
  }
}

# 使用自动检查，1个单词拼写错误
GET /xdclass_shop_v1/_search
{
  "query": {
    "fuzzy": {
      "summary": {
        "value": "Sprina",
        "fuzziness": "auto"
      }
    }
  }
}
```

搜索高亮显示

需求
- 日常搜索产品的时候，会有关键词显示不一样的颜色，方便用户直观看到区别
Elastic Search搜索引擎如何做到高亮显示
- 在 ES 中，高亮语法用于在搜索结果中突出显示与查询匹配的关键词
- 高亮显示是通过标签包裹匹配的文本来实现的，通常是 <em> 或其他 HTML 标签
- 基本用法：在 highlight 里面填写要高亮显示的字段，可以填写多个

案例实战

环境和数据准备

#创建索引库
PUT /xdclass_high_light_test
{
  "mappings": {
    "properties": {
      "title": {
          "type": "text",
          "analyzer": "ik_max_word"
        },
        "content": {
          "type": "text",
          "analyzer": "ik_max_word"
        }
     }
  }, 
  "settings": {
    "number_of_shards": 2,
    "number_of_replicas": 0
  }
}
#插入数据
PUT /xdclass_high_light_test/_doc/1
{
  "title": "小滴课堂2028年最新好看的电影推荐",
  "content": "每年都有新电影上线，2028年最新好看的电影有不少，小滴课堂上线了《架构大课》，《低代码平台》，《老王往事》精彩电影"
}


PUT /xdclass_high_light_test/_doc/2
{
  "title": "写下你认为好看的电影有哪些",
  "content": "每个人都看看很多电影，说下你近10年看过比较好的电影，比如《架构大课》，《海量数据项目大课》，《冰冰和老王的故事》"
}

单条件查询高亮显示

GET /xdclass_high_light_test/_search 
{
  "query": {
    "match": {
      "content": "电影"
    }
  },
  "highlight": {
    "fields": {
      "content": {}
    }
  }
}

组合多条件查询，highlight里面填写需要高亮的字段

GET /xdclass_high_light_test/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "title": "课堂"
          }
        },
        {
          "match": {
            "content": "老王"
          }
        }
      ]
    }
  },
  "highlight": {
    "fields": {
      "title": {},
      "content": {}
    }
  }
}

match查询，使用highlight属性，可以增加属性，修改高亮样式
- pre_tags:前置标签
- post_tags:后置标签
- fields:需要高亮的字段

案例实战

GET /xdclass_high_light_test/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "title": "课堂"
          }
        },
        {
          "match": {
            "content": "老王"
          }
        }
      ]
    }
  },
  "highlight": {
    "pre_tags": "<font color='yellow'>",
    "post_tags": "</font>",
    "fields": [{"title":{}},{"content":{}}]
  } 
}

搜索聚合查询

什么是聚合查询

对大量数据聚合统计处理，类似Mysql数据库操作里面的group by 分组、sum、avg、max等函数处理
是 Elasticsearch 中强大的功能之一，根据数据进行分组、过滤、计算和统计，提取有关数据集信息，进行数据分析
数据可视化大屏里面的饼状图、柱状图、折线图、仪表盘数据等都是聚合查询的关键应用

术语一：对数据集求最大、最小、和、平均值等指标的聚合，称为 指标聚合 metric

基本语法格式如下

GET /index/_search
{
  "size": 0,
  "aggs": {
    "aggregation_name": {
      "aggregation_type": {
        "aggregation_field": "field_name"
        // 可选参数
      }
    }
    // 可以添加更多的聚合
  }
}

# 解析
index：要执行聚合查询的索引名称。
size: 设置为 0 来仅返回聚合结果，而不返回实际的搜索结果，这里将hits改为0表示返回的原始数据变为0
aggs：指定聚合操作的容器。

aggregation_name：聚合名称，可以自定义。
aggregation_type：聚合操作的类型，例如 terms、avg、sum 等。
aggregation_field：聚合操作的目标字段，对哪些字段进行聚合

术语二：对数据集进行分组group by，然后在组上进行指标聚合，在 ES 中称为分桶，桶聚合bucketing

基本语法格式如下（先简单知道，后续会有进一步讲解）

GET /index/_search
{
  "size": 0,
  "aggs": {
    "aggregation_name": {
      "bucket_type": {
        "bucket_options": {
          "bucket_option_name": "bucket_option_value",
          ...
        },
        "aggs": {
          "sub_aggregation_name": {
            "sub_aggregation_type": {
              "sub_aggregation_options": {
                "sub_aggregation_option_name": "sub_aggregation_option_value",
                ...
              }
            }
          }
        }
      }
    }
  }
}
#解析
index: 替换为要执行聚合查询的索引名称。
aggregation_name: 替换为自定义的聚合名称。
bucket_type: 替换为特定的桶聚合类型（如 terms、date_histogram、range 等）。
bucket_option_name 和 bucket_option_value: 替换为特定桶聚合选项的名称和值。

sub_aggregation_name: 替换为子聚合的名称。
sub_aggregation_type: 替换为特定的子聚合类型（如 sum、avg、max、min 等）。
sub_aggregation_option_name 和 sub_aggregation_option_value: 替换为特定子聚合选项的名称和值

常见聚合用途和应用场景案例
- 聚合指标（Aggregation Metrics）：
  - Avg Aggregation：计算文档字段的平均值。
  - Sum Aggregation：计算文档字段的总和。
  - Min Aggregation：找到文档字段的最小值。
  - Max Aggregation：找到文档字段的最大值。
- 聚合桶（Aggregation Buckets）：
  - Terms Aggregation：基于字段值将文档分组到不同的桶中。
  - Date Histogram Aggregation：按日期/时间字段创建时间间隔的桶。
  - Range Aggregation：根据字段值的范围创建桶。
- 嵌套聚合（Nested Aggregations）、聚合过滤（Aggregation Filtering）。。。

案例实战

# 创建索引
PUT /sales
{
  "mappings": {
    "properties": {
      "product": {
        "type": "keyword"
      },
      "sales": {
        "type": "integer"
      }
    }
  }
}

# 批量插入数据
POST /sales/_bulk
{"index": {}}
{"product": "iPhone", "sales": 4}
{"index": {}}
{"product": "Samsung", "sales": 60}
{"index": {}}
{"product": "iPhone", "sales": 100}
{"index": {}}
{"product": "Samsung", "sales": 80}
{"index": {}}
{"product": "小滴手机", "sales": 50}
{"index": {}}
{"product": "小滴手机", "sales": 5000}
{"index": {}}
{"product": "小滴手机", "sales": 200}

执行聚合查询，分别按照商品名称（product）进行分组

GET /sales/_search
{
  "aggs":{//聚合操作
    "product_group":{//名称，随意起名
      "terms":{//分组
        "field":"product"//分组字段
      }
    }
  }
}

计算每组的销售总量，使用了 terms 聚合和 sum 聚合来实现

查询结果将返回每个产品的名称和销售总量

GET /sales/_search
{
  "size": 0,
  "aggs": {
    "product_sales": {
      "terms": {
        "field": "product"
      },
      "aggs": {
        "total_sales": {
          "sum": {
            "field": "sales"
          }
        }
      }
    }
  }
}

指标metric聚合

什么是指标聚合
- 对数据集求最大、最小、和、平均值等指标的聚合，称为 指标聚合 metric
- 比如 max、min、avg、sum等函数使用

案例实战

聚合查询 max 应用案例：

数据准备：假设有一个电商网站的销售记录索引，包含商品名称和销售价格字段

POST /sales_v1/_doc
{ "product_name": "手机", "price": 1000 }

POST /sales_v1/_doc
{ "product_name": "电视", "price": 1500 }

POST /sales_v1/_doc
{ "product_name": "小滴课堂老王的黑丝", "price": 4500 }

案例说明：使用 max 聚合查询来获取产品价格的最高值。

GET /sales_v1/_search
{
  "size": 0,
  "aggs": {
    "max_price": {
      "max": {
        "field": "price"
      }
    }
  }
}

聚合查询 - min 应用案例：

数据准备：一个学生考试成绩索引，包含学生姓名和考试分数字段。

POST /exam_scores/_doc
{ "student_name": "小滴课堂-大钊", "score" : 80 }

POST /exam_scores/_doc
{ "student_name": "老王", "score" : 90 }

POST /exam_scores/_doc
{ "student_name": "小滴课堂-D哥", "score" : 40 }

案例说明：使用 min 聚合查询来获取学生的最低考试分数。

GET /exam_scores/_search
{
  "size": 0,
  "aggs": {
    "min_score": {
      "min": {
        "field": "score"
      }
    }
  }
}

聚合查询 - avg 应用案例：
- 数据准备（同上）：一个学生考试成绩索引，包含学生姓名和考试分数字段。
- 使用 avg 聚合查询来计算学生的平均考试分数
```
GET /exam_scores/_search
{
  "size": 0,
  "aggs": {
    "avg_score": {
      "avg": {
        "field": "score"
      }
    }
  }
}
```

聚合查询 - sum 应用案例：

数据准备：假设有一个电商网站的销售记录索引，包含商品名称和销售数量字段。

POST /sales_order/_doc
{ "product_name": "手机", "sales_count" : 100 }

POST /sales_order/_doc
{ "product_name": "电视", "sales_count" : 50 }

POST /sales_order/_doc
{ "product_name": "小滴课堂永久会员", "sales_count" : 999 }

案例说明：使用 sum 聚合查询来计算销售记录的总销售数量。

GET /sales_order/_search
{
  "size": 0,
  "aggs": {
    "total_sales": {
      "sum": {
        "field": "sales_count"
      }
    }
  }
}

桶聚合语法和Terms

什么桶bucket聚合

对数据集进行分组group by，然后在组上进行指标聚合，在 ES 中称为分桶，桶聚合bucketing
基本语法格式如下

GET /index/_search
{
  "size": 0,
  "aggs": {
    "aggregation_name": {
      "bucket_type": {
        "bucket_options": {
          "bucket_option_name": "bucket_option_value",
          ...
        },
        "aggs": {
          "sub_aggregation_name": {
            "sub_aggregation_type": {
              "sub_aggregation_options": {
                "sub_aggregation_option_name": "sub_aggregation_option_value",
                ...
              }
            }
          }
        }
      }
    }
  }
}
#解析
index: 替换为要执行聚合查询的索引名称。
aggregation_name: 替换为自定义的聚合名称。
bucket_type: 替换为特定的桶聚合类型（如 terms、date_histogram、range 等）。
bucket_option_name 和 bucket_option_value: 替换为特定桶聚合选项的名称和值。

sub_aggregation_name: 替换为子聚合的名称。
sub_aggregation_type: 替换为特定的子聚合类型（如 sum、avg、max、min 等）。
sub_aggregation_option_name 和 sub_aggregation_option_value: 替换为特定子聚合选项的名称和值

案例实战

分桶聚合查询 - Terms 案例：

数据准备：假设有一个在线书店的图书销售记录索引，包含图书名称和销售数量字段。

#创建索引库
PUT /book_sales
{
  "mappings": {
    "properties": {
      "book_title": {
          "type": "keyword"
        },
        "sales_count": {
          "type": "integer"
        }
     }
  }, 
  "settings": {
    "number_of_shards": 2,
    "number_of_replicas": 0
  }
}

# 批量插入数据
POST /book_sales/_bulk
{ "index": {} }
{ "book_title": "Elasticsearch in Action", "sales_count" : 100 }
{ "index": {} }
{ "book_title": "小滴课堂微服务最佳实践", "sales_count" : 50 }
{ "index": {} }
{ "book_title": "海量数据项目大课", "sales_count" : 80 }
{ "index": {} }
{ "book_title": "小滴课堂面试宝典", "sales_count" : 120 }
{ "index": {} }
{ "book_title": "数据结构与算法之美", "sales_count" : 90 }
{ "index": {} }
{ "book_title": "Python编程快速上手", "sales_count" : 70 }
{ "index": {} }
{ "book_title": "小滴课堂面试宝典", "sales_count" : 110 }
{ "index": {} }
{ "book_title": "小滴课堂Java核心技术", "sales_count" : 200 }
{ "index": {} }
{ "book_title": "深入理解计算机系统", "sales_count" : 150 }
{ "index": {} }
{ "book_title": "小滴课堂Java核心技术", "sales_count" : 80 }

案例说明：使用 terms 聚合查询将图书按销售数量进行分桶，并获取每个分桶内的销售数量总和。

GET /book_sales/_search
{
  "size": 0,
  "aggs": {
    "book_buckets": {
      "terms": {
        "field": "book_title",
        "size": 10
      },
      "aggs": {
        "total_sales": {
          "sum": {
            "field": "sales_count"
          }
        }
      }
    }
  }
}

桶聚合Date Histogram

分桶聚合查询 - Date Histogram

将日期类型的字段按照固定的时间间隔进行分桶，并对每个时间间隔内的文档进行进一步的操作和计算

基本语法如下

GET /index/_search
{
  "size": 0,
  "aggs": {
    "date_histogram_name": {
      "date_histogram": {
        "field": "date_field_name",
        "interval": "interval_expression"
      },
      "aggs": {
        "sub_aggregation": {
          "sub_aggregation_type": {}
        }
      }
    }
  }
}

#解析
index：替换为要执行聚合查询的索引名称。
date_histogram_name：替换为自定义的 date_histogram 聚合名称。
date_field_name：替换为要聚合的日期类型字段名。
interval_expression：指定用于分桶的时间间隔。时间间隔可以是一个有效的日期格式（如 1d、1w、1M），也可以是一个数字加上一个时间单位的组合（如 7d 表示 7 天，1h 表示 1 小时）。
sub_aggregation：指定在每个日期桶内进行的子聚合操作。
sub_aggregation_type：替换单独子聚合操作的类型，可以是任何有效的子聚合类型。

数据准备：一个电商网站的订单索引，包含订单日期和订单金额字段。

POST /order_history/_bulk
{ "index": {} }
{ "order_date": "2025-01-01", "amount" : 100 ,"book_title": "小滴课堂Java核心技术"}
{ "index": {} }
{ "order_date": "2025-02-05", "amount" : 150, "book_title": "小滴课堂面试宝典" }
{ "index": {} }
{ "order_date": "2025-03-02", "amount" : 500 ,"book_title": "小滴课堂Java核心技术"}
{ "index": {} }
{ "order_date": "2025-05-02", "amount" : 250 , "book_title": "小滴课堂面试宝典"}
{ "index": {} }
{ "order_date": "2025-05-05", "amount" : 10 ,"book_title": "小滴课堂微服务最佳实践"}
{ "index": {} }
{ "order_date": "2025-02-18", "amount" : 290 , "book_title": "小滴课堂微服务最佳实践"}

案例说明：使用 date_histogram 聚合查询将订单按日期进行分桶，并计算每个分桶内的订单金额总和。

GET /order_history/_search
{
  "size": 0,
  "aggs": {
    "sales_per_month": {
      "date_histogram": {
        "field": "order_date",
        "calendar_interval": "month",
        "format": "yyyy-MM"
      },
      "aggs": {
        "total_sales": {
          "sum": {
            "field": "amount"
          }
        }
      }
    }
  }
}

桶聚合Range

分桶聚合查询 - Range

将字段的值划分为不同的范围，并将每个范围内的文档分配给相应的桶，对这些范围进行各种操作和计算。

语法介绍

GET /index/_search
{
  "size": 0,
  "aggs": {
    "range_name": {
      "range": {
        "field": "field_name",
        "ranges": [
          { "key": "range_key_1", "from": from_value_1, "to": to_value_1 },
          { "key": "range_key_2", "from": from_value_2, "to": to_value_2 },
          ...
        ]
      },
      "aggs": {
        "sub_aggregation": {
          "sub_aggregation_type": {}
        }
      }
    }
  }
}

#解析
index：替换为要执行聚合查询的索引名称。
range_name：替换为自定义的 range 聚合名称。
field_name：替换为要聚合的字段名。
ranges：指定范围数组，每个范围使用 key、from 和 to 参数进行定义。
key：范围的唯一标识符。
from：范围的起始值（包含）。
to：范围的结束值（不包含）。
sub_aggregation：指定在每个范围内进行的子聚合操作。
sub_aggregation_type：替换单独子聚合操作的类型，可以是任何有效的子聚合类型。

数据准备：一个在线商店的商品索引，包括商品名称和价格字段

POST /product_v4/_bulk
{ "index": {} }
{ "product_name": "小滴课堂永久会员", "price" : 2000 }
{ "index": {} }
{ "product_name": "JVM专题课程", "price" : 200 }
{ "index": {} }
{ "product_name": "SpringBoot3.X最佳实践", "price" : 300 }
{ "index": {} }
{ "product_name": "高并发项目大课", "price" : 1500 }
{ "index": {} }
{ "product_name": "海量数据项目大课", "price" : 4120 }
{ "index": {} }
{ "product_name": "监控告警Prometheus最佳实践", "price" : 180 }
{ "index": {} }
{ "product_name": "全栈工程师学习路线", "price" : 250 }
{ "index": {} }
{ "product_name": "自动化测试平台大课", "price" : 4770 }
{ "index": {} }
{ "product_name": "小滴课堂-老王分手最佳实践", "price" : 400 }
{ "index": {} }
{ "product_name": "小滴课堂-大钊会所按摩往事", "price" : 150 }

案例说明：使用 range 聚合查询将商品按价格范围进行分桶，并计算每个分桶内的商品数量。

如果没写key，则会默认生成

GET /product_v4/_search
{
  "size": 0,
  "aggs": {
    "price_ranges": {
      "range": {
        "field": "price",
        "ranges": [
          { "to": 100 },
          { "from": 100, "to": 200 },
          { "from": 200 }
        ]
      },
      "aggs":{
        "total_price":{
          "sum":{
           "field":"price"
          }
        }
      }
    }
  }
}

SpringBoot3.X整合ES

ElasticSearch是搜索引擎，作为服务端程序，提供了HTTP的Restful接口接入
因此多个不同的语言都可以轻松接入ES搜索功能
ES官方针对java推出多个客户端进行接入ES，也分两种
- 更旧版的ES会用TransportClient（7.0版本标记过期）
- Java Low Level REST Client（有继续迭代维护）
  - 基于低级别的 REST 客户端，通过发送原始 HTTP 请求与 Elasticsearch 进行通信。
  - 自己拼接好的字符串，并且自己解析返回的结果；兼容所有的Elasticsearch版本
- Java High Level REST Client（7.1版本标记过期）
  - 基于低级别 REST 客户端，提供了更高级别的抽象，简化了与 Elasticsearch 的交互。
  - 提供了更易用的 API，封装了底层的请求和响应处理逻辑，提供了更友好和可读性更高的代码。
  - 自动处理序列化和反序列化 JSON 数据，适用于大多数常见的操作，如索引、搜索、聚合等。
  - 对于较复杂的高级功能和自定义操作，可能需要使用低级别 REST 客户端或原生的 Elasticsearch REST API
- Java API Client（8.X版本开始推荐使用）
  - Elasticsearch在7.1版本之前使用的Java客户端是Java REST Client
  - 从7.1版本开始Elastic官方将Java REST Client标记为弃用（deprecated），推荐使用新版Java客户端Java API Client
  - 新版的java API Client是一个用于与Elasticsearch服务器进行通信的Java客户端库
  - 封装了底层的Transport通信，并提供了同步和异步调用、流式和函数式调用等方法
  - 官网文档地址
    - https://www.elastic.co/guide/en/elasticsearch/client/java-api-client/8.5/introduction.html

整合（springData）

基于 Spring Data 的标准化数据访问技术，简化了与 Elasticsearch 的集成。
提供了丰富的 CRUD 操作和查询方法，简化了数据访问，包括自动化的索引管理和映射
Spring Data Elasticsearch 对于一些高级功能和复杂查询可能不够灵活，需要额外定制处理
什么是Spring Data框架
- 是一个用于简化数据访问和持久化的开发框架，提供了一组统一的 API 和抽象
- 与各种数据存储技术（如关系型数据库、NoSQL 数据库、Elasticsearch 等）进行交互变得更加容易
- 官网：https://spring.io/projects/spring-data

springBoot3.x整合SpringData框架

依赖

<!-- spring-data-elasticsearch-->
<dependency>
      <groupId>org.springframework.boot</groupId>
      <artifactId>spring-boot-starter-data-elasticsearch</artifactId>
 </dependency>

    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>3.2.0-M3</version>
        <relativePath/> <!-- lookup parent from repository -->
    </parent>
    <properties>
        <java.version>17</java.version>
    </properties>
        <dependencies>
        <!--spring boot and web-->
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>
        <!--spring boot test-->
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-test</artifactId>
            <scope>test</scope>
        </dependency>
        <!--测试组件-->
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.12</version>
        </dependency>
        <!--lombok-->
        <dependency>
            <groupId>org.projectlombok</groupId>
            <artifactId>lombok</artifactId>
        </dependency>
        <!--mqtt-->
<!--        <dependency>-->
<!--            <groupId>org.springframework.integration</groupId>-->
<!--            <artifactId>spring-integration-mqtt</artifactId>-->
<!--        </dependency>-->
        <!--spring data elasticsearch-->
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-data-elasticsearch</artifactId>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-maven-plugin</artifactId>
            </plugin>
        </plugins>
    </build>
    <repositories>
        <repository>
            <id>spring-milestones</id>
            <name>Spring Milestones</name>
            <url>https://repo.spring.io/milestone</url>
            <snapshots>
                <enabled>false</enabled>
            </snapshots>
        </repository>
    </repositories>
    <pluginRepositories>
        <pluginRepository>
            <id>spring-milestones</id>
            <name>Spring Milestones</name>
            <url>https://repo.spring.io/milestone</url>
            <snapshots>
                <enabled>false</enabled>
            </snapshots>
        </pluginRepository>
    </pluginRepositories>

配置

spring.elasticsearch.uris=http://127.0.0.1:9200
spring.elasticsearch.username=elastic
spring.elasticsearch.password=hXtO*Lzi2GGJ5wUmUA2c

索引库操作

什么是ElasticsearchTemplate
- 是 Spring Data Elasticsearch 提供的一个核心类，是 ElasticsearchClient 的一个具体实现
- 用于在 Spring Boot 中操作 Elasticsearch 进行数据的存取和查询
- 提供了一组方法来执行各种操作，如保存、更新、删除和查询文档，执行聚合操作等
ElasticsearchTemplate 的一些常用方法
- save(Object): 保存一个对象到 Elasticsearch 中。
- index(IndexQuery): 使用 IndexQuery 对象执行索引操作。
- delete(String, String): 删除指定索引和类型的文档。
- get(String, String): 获取指定索引和类型的文档。
- update(UpdateQuery): 使用 UpdateQuery 对象执行更新操作。
- search(SearchQuery, Class): 执行搜索查询，并将结果映射为指定类型的对象。
- count(SearchQuery, Class): 执行搜索查询，并返回结果的计数

ElasticsearchTemplate 常见注解配置（都是属于spring data elasticsearch）

@Id 指定主键
@Document指定实体类和索引对应关系
```
indexName：索引名称
```

@Field指定普通属性

type 对应Elasticsearch中属性类型,使用FiledType枚举快速获取。

text 类型能被分词

keywords 不能被分词

index  是否创建索引，作为搜索条件时index必须为true

analyzer 指定分词器类型。

DTO

@Data
@Builder
@NoArgsConstructor
@AllArgsConstructor
@Document(indexName = "video")
public class VideoDTO {
    @Id
    @Field(type = FieldType.Text, index = false)
    private Long id;
    @Field(type = FieldType.Text)
    private String title;

    @Field(type = FieldType.Text)
    private String description;

    @Field(type = FieldType.Keyword)
    private String category;

    @Field(type = FieldType.Integer)
    private Integer duration;

    @Field(type = FieldType.Date, format = DateFormat.date_hour_minute_second)
    private LocalDateTime createTime;
    
    public VideoDTO(Long id, String title, String description, Integer duration,String category) {
        this.id = id;
        this.title = title;
        this.description = description;
        this.duration = duration;
        this.createTime = LocalDateTime.now();
        this.category = category;
    }
	}
}

测试

@SpringBootTest
@RunWith(SpringRunner.class)
@Slf4j
public class EsTest {
    @Autowired
    private ElasticsearchTemplate restTemplate;
    /**
     * 判断索引是否存在索引
     */
    @Test
    public void existsIndex() {
        IndexOperations indexOperations = restTemplate.indexOps(VideoDTO.class);
        boolean exists = indexOperations.exists();
        System.out.println(exists);
    }

    /**
     * 创建索引
     */
    @Test
    public void createIndex() {
        // spring data es所有索引操作都在这个接口
        IndexOperations indexOperations = restTemplate.indexOps(VideoDTO.class);
        // 是否存在，存在则删除
        if(indexOperations.exists()){
            indexOperations.delete();
        }

        // 创建索引
        indexOperations.create();
        log.info("测试--索引创建成功");

        //设置映射: 在正式开发中，几乎不会使用框架创建索引或设置映射，这是架构或者管理员的工作，不适合使用代码实现
        restTemplate.indexOps(VideoDTO.class).putMapping();
    }

    /**
     * 删除索引
     */
    @Test
    public void deleteIndex() {
        IndexOperations indexOperations = restTemplate.indexOps(VideoDTO.class);
        boolean delete = indexOperations.delete();
        System.out.println(delete);
    }

    /**
     * 添加
     */
    @Test
    public void insert(){
        VideoDTO videoDTO = new VideoDTO();
        videoDTO.setId(1L);
        videoDTO.setTitle("小滴课堂架构大课和Spring Cloud");
        videoDTO.setCreateTime(LocalDateTime.now());
        videoDTO.setDuration(100);
        videoDTO.setCategory("后端");
        videoDTO.setDescription("这个是综合大型课程，包括了jvm，redis，新版spring boot3.x，架构，监控，性能优化，算法，高并发等多方面内容");

        VideoDTO saved = restTemplate.save(videoDTO);
        System.out.println(saved);
    }

    /**
     * 更新
     */
    @Test
    public void update(){
        VideoDTO videoDTO = new VideoDTO();
        videoDTO.setId(1L);
        videoDTO.setTitle("小滴课堂架构大课和Spring Cloud V2");
        videoDTO.setCreateTime(LocalDateTime.now());
        videoDTO.setDuration(102);
        videoDTO.setCategory("后端");
        videoDTO.setDescription("这个是综合大型课程，包括了jvm，redis，新版spring boot3.x，架构，监控，性能优化，算法，高并发等多方面内容");

        VideoDTO saved = restTemplate.save(videoDTO);
        System.out.println(saved);
    }

    /**
     * 批量添加
     */
    @Test
    public void batchInsert() {
        List<VideoDTO> list = new ArrayList<>();
        list.add(new VideoDTO(2L, "老王录制的按摩课程", "主要按摩和会所推荐", 123, "后端"));
        list.add(new VideoDTO(3L, "冰冰的前端性能优化", "前端高手系列", 100042, "前端"));
        list.add(new VideoDTO(4L, "海量数据项目大课", "D哥的后端+大数据综合课程", 5432345, "后端"));
        list.add(new VideoDTO(5L, "小滴课堂永久会员", "可以看海量专题课程，IT技术持续充电平台", 6542, "后端"));
        list.add(new VideoDTO(6L, "大钊-前端低代码平台", "高效开发底层基础平台，效能平台案例", 53422, "前端"));
        list.add(new VideoDTO(7L, "自动化测试平台大课", "微服务架构下的spring cloud架构大课，包括jvm,效能平台", 6542, "后端"));


        Iterable<VideoDTO> result = restTemplate.save(list);
        System.out.println(result);
    }

    /**
     * 主键查询
     */
    @Test
    public void  searchById(){
        VideoDTO videoDTO = restTemplate.get("3", VideoDTO.class);
        assert videoDTO != null;
        System.out.println(videoDTO);
    }

    /**
     * 删除ById
     */
    @Test
    public void deleteById() {
        String delete = restTemplate.delete("2", VideoDTO.class);
        System.out.println(delete);
    }
}

多案例搜索

新版的ElasticSearch的Query接口
- Query是Spring Data Elasticsearch的接口，有多种具体实现，新版官方文档缺少，这边看源码给案例实战
  - CriteriaQuery
    - 创建Criteria来搜索数据，而无需了解 Elasticsearch 查询的语法或基础知识
    - 允许用户通过简单地连接和组合，指定搜索文档必须满足的对象来构建查询
  - StringQuery
    - 将Elasticsearch查询作为JSON字符串，更适合对Elasticsearch查询的语法比较了解的人
    - 也更方便使用kibana或postman等客户端工具行进调试
  - NativeQuery
    - 复杂查询或无法使用CriteriaAPI 表达的查询时使用的类，例如在构建查询和使用聚合的场景

NativeQuery搜索

新版的搜索语法案例，查询采用新版的lambda表达式语法，更简洁

搜索全部

  /**
   * 查询所有
   */
  @Test
  public void searchAll(){

    SearchHits<VideoDTO> search = restTemplate.search(Query.findAll(), VideoDTO.class);
    List<SearchHit<VideoDTO>> searchHits = search.getSearchHits();
    // 获得searchHits,进行遍历得到content
    List<VideoDTO> videoDTOS = new ArrayList<>();
    searchHits.forEach(hit -> {
      videoDTOS.add(hit.getContent());
    });
    System.out.println(videoDTOS);
  }

匹配搜索

/**
   * match查询
   */
  @Test
  public void matchQuery(){

    Query query = NativeQuery.builder().withQuery(q -> q
        .match(m -> m
            .field("description") //字段
            .query("spring") //值
        )).build();
    SearchHits<VideoDTO> searchHits = restTemplate.search(query, VideoDTO.class);

    // 获得searchHits,进行遍历得到content
    List<VideoDTO> videoDTOS = new ArrayList<>();
    searchHits.forEach(hit -> {
      videoDTOS.add(hit.getContent());
    });
    System.out.println(videoDTOS);
  }

分页查询

    /**
     * 分页查询
     */
    @Test
    public void pageSearch() {
        Query query = NativeQuery.builder().withQuery(Query.findAll())
                .withPageable(Pageable.ofSize(3).withPage(0)).build();

        SearchHits<VideoDTO> searchHits = restTemplate.search(query, VideoDTO.class);
        // 获得searchHits,进行遍历得到content
        List<VideoDTO> videoDTOS = new ArrayList<>();
        searchHits.forEach(hit -> {
            videoDTOS.add(hit.getContent());
        });
        System.out.println(videoDTOS);
    }

排序

#ascending()：默认的，正序

#descending()：倒叙

   /**
     * 排序查询，根据时长降序排列
     */
    @Test
    public void sortSearch() {
        Query query = NativeQuery.builder().withQuery(Query.findAll())
                .withPageable(Pageable.ofSize(10).withPage(0))
                .withSort(Sort.by("duration").descending()).build();

        SearchHits<VideoDTO> searchHits = restTemplate.search(query, VideoDTO.class);
        // 获得searchHits,进行遍历得到content
        List<VideoDTO> videoDTOS = new ArrayList<>();
        searchHits.forEach(hit -> {
            videoDTOS.add(hit.getContent());
        });
        System.out.println(videoDTOS);
    }

原始StringQuery搜索

什么是StringQuery
- 将Elasticsearch查询作为JSON字符串，更适合对Elasticsearch查询的语法比较了解的人
- 也更方便使用kibana或postman等客户端工具行进调试

案例实战

案例一：布尔must查询，搜索标题有架构关键词，描述有 spring关键字，时长范围是 10～6000之间的

原始DSL查询

GET /video/_search
{
  "query": {
    "bool": {
      "must": [{
        "match": {
          "title": "架构"
        }
      }, {
        "match": {
          "description": "spring"
        }
      }, {
        "range": {
          "duration": {
            "gte": 10,
            "lte": 6000
          }
        }
      }]
    }
  }
}

SpringBoot+SpringData查询

  @Test
  public void stringQuery() {

        //搜索标题有 架构 关键词，描述有 spring关键字，时长范围是 10～6000之间的
        String dsl = """
                   {"bool":{"must":[{"match":{"title":"架构"}},{"match":{"description":"spring"}},{"range":{"duration":{"gte":10,"lte":6000}}}]}}
                """;
        Query query = new StringQuery(dsl);

        List<SearchHit<VideoDTO>> searchHitList = restTemplate.search(query, VideoDTO.class).getSearchHits();

        // 获得searchHits,进行遍历得到content
        List<VideoDTO> videoDTOS = new ArrayList<>();
        searchHitList.forEach(hit -> {
            videoDTOS.add(hit.getContent());
        });
        System.out.println(videoDTOS);
    }

聚合搜索

聚合搜索案例
- 方案一：可以使用原始DSL进行处理
- 方案二：使用NativeQuery完成聚合搜索

统计不同分类下的视频数量

GET /video/_search
{
  "size": 1,
  "aggs": {
    "category_group": {
      "terms": {
        "field": "category"
      }
    }
  }
}

		/**
     * 聚合查询
     */
    @Test
    void aggQuery() {
        Query query = NativeQuery.builder()
                .withAggregation("category_group", Aggregation.of(a -> a
                        .terms(ta -> ta.field("category").size(2))))
                .build();

        SearchHits<VideoDTO> searchHits = restTemplate.search(query, VideoDTO.class);

        //获取聚合数据
        ElasticsearchAggregations aggregationsContainer = (ElasticsearchAggregations) searchHits.getAggregations();
        Map<String, ElasticsearchAggregation> aggregations = Objects.requireNonNull(aggregationsContainer).aggregationsAsMap();

        //获取对应名称的聚合
        ElasticsearchAggregation aggregation = aggregations.get("category_group");
        Buckets<StringTermsBucket> buckets = aggregation.aggregation().getAggregate().sterms().buckets();

        //打印聚合信息
        buckets.array().forEach(bucket -> {
            System.out.println("组名："+bucket.key().stringValue() + ", 值" + bucket.docCount());
        });

        // 获得searchHits,进行遍历得到content
        List<VideoDTO> videoDTOS = new ArrayList<>();
        searchHits.forEach(hit -> {
            videoDTOS.add(hit.getContent());
        });
        System.out.println(videoDTOS);
    }

posted @ 2024-02-23 16:43 xietingweia 阅读(584) 评论(1) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

xietingwei

SpringBoot3.x整合ElasticSearch8.x

ElasticSearch

docker部署

容器编排（docker-compose.yml）

---部署---

基本操作

ES8.x：URL的组成

索引Index

文档Document

Mapping和常见字段类型

分词器

IK中文分词器

语法和应用

match

布尔-范围和分页-排序

查询过滤Filter

多字段匹配和短语搜索

fuzzy模糊查询

搜索高亮显示

搜索聚合查询

指标metric聚合

桶聚合语法和Terms

桶聚合Date Histogram

桶聚合Range

SpringBoot3.X整合ES

整合（springData）

索引库操作

多案例搜索

NativeQuery搜索

原始StringQuery搜索

聚合搜索

公告