ES - 入门

初始ES

安装 elasticsearch

# 1
docker network create es-net


#2导入数据 
docker load -i es.tar

#3 运行
docker run -d \
	--name es \
    -e "ES_JAVA_OPTS=-Xms512m -Xmx512m" \
    -e "discovery.type=single-node" \
    -v es-data:/usr/share/elasticsearch/data \
    -v es-plugins:/usr/share/elasticsearch/plugins \
    --privileged \
    --network es-net \
    -p 9200:9200 \
    -p 9300:9300 \
elasticsearch:7.12.1

参数介绍：
-d: 后台运行
-e "ES_JAVA_OPTS=-Xms512m -Xmx512m": 设置环境，内存大小
-e "discovery.type=single-node": 非集群模式
-v es-data:/usr/share/elasticsearch/data：挂载逻辑卷，绑定es的数据目录
-v es-plugins:/usr/share/elasticsearch/plugins：挂载逻辑卷，绑定es的插件目录
--privileged：授予逻辑卷访问权
--network es-net ：加入一个名为es-net的网络中
-p 9200:9200：端口映射配置

当问：http://192.168.184.152:9200/

代表搭建成功

安装 kibana

# 1
docker load -i kibana.tar

# 2
docker run -d \
--name kibana \
-e ELASTICSEARCH_HOSTS=http://es:9200 \
--network=es-net \
-p 5601:5601  \
kibana:7.12.1

--network=es-net: 加入一个名为es-net的网络中，与elaseticsearch 在同一个网络中
-e ELASTICSEARCH_HOSTS=http://es:9200": 设置elaseticsearch 的地址，因为Kibana 与 elaseticsearch 在同一个网络中，因此可以使用容器名直接访问elaseticsearch
-p 5601:5601: 端口映射配置

kibana启动一般比较慢，需要多等待一会，可以通过命令：
shell docker logs -f kibana

查看运行日志，当查看到下面的日志，说明成功：

需要注意的是启动Kibana前要先启动 ES

DevTool 可以直接写DSL 语句：

安装 IK 分词器

默认的标准分词器对中文分词不太友好，可以使用IK分词器

下载地址：https://github.com/medcl/elasticsearch-analysis-ik

IK 分词器包含两种模式：

ik_smart: 最少切分
ik_max_word: 最细切分

离线安装:
1.查看es 插件数据卷目录

docker volume inspect es-plugins

将下载好的IK分词器解压后上传到：/var/lib/docker/volumes/es-plugins/_data

3.重启容器

docker restart es

测试最少切分：

POST /_analyze
{
  "text":"程序员旺财学习JAVA太开心了",
  "analyzer":"ik_smart"
}

结果：

{
  "tokens" : [
    {
      "token" : "程序员",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "旺",
      "start_offset" : 3,
      "end_offset" : 4,
      "type" : "CN_CHAR",
      "position" : 1
    },
    {
      "token" : "财",
      "start_offset" : 4,
      "end_offset" : 5,
      "type" : "CN_CHAR",
      "position" : 2
    },
    {
      "token" : "学习",
      "start_offset" : 5,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 3
    },
    {
      "token" : "java",
      "start_offset" : 7,
      "end_offset" : 11,
      "type" : "ENGLISH",
      "position" : 4
    },
    {
      "token" : "太",
      "start_offset" : 11,
      "end_offset" : 12,
      "type" : "CN_CHAR",
      "position" : 5
    },
    {
      "token" : "开心",
      "start_offset" : 12,
      "end_offset" : 14,
      "type" : "CN_WORD",
      "position" : 6
    },
    {
      "token" : "了",
      "start_offset" : 14,
      "end_offset" : 15,
      "type" : "CN_CHAR",
      "position" : 7
    }
  ]
}

测试最细切分:

POST /_analyze
{
  "text":"程序员旺财学习JAVA太开心了",
  "analyzer":"ik_max_word"
}

结果：

{
  "tokens" : [
    {
      "token" : "程序员",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "程序",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "员",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "CN_CHAR",
      "position" : 2
    },
    {
      "token" : "旺",
      "start_offset" : 3,
      "end_offset" : 4,
      "type" : "CN_CHAR",
      "position" : 3
    },
    {
      "token" : "财",
      "start_offset" : 4,
      "end_offset" : 5,
      "type" : "CN_CHAR",
      "position" : 4
    },
    {
      "token" : "学习",
      "start_offset" : 5,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 5
    },
    {
      "token" : "java",
      "start_offset" : 7,
      "end_offset" : 11,
      "type" : "ENGLISH",
      "position" : 6
    },
    {
      "token" : "太",
      "start_offset" : 11,
      "end_offset" : 12,
      "type" : "CN_CHAR",
      "position" : 7
    },
    {
      "token" : "开心",
      "start_offset" : 12,
      "end_offset" : 14,
      "type" : "CN_WORD",
      "position" : 8
    },
    {
      "token" : "了",
      "start_offset" : 14,
      "end_offset" : 15,
      "type" : "CN_CHAR",
      "position" : 9
    }
  ]
}

IK 分词器的扩展和停用词典

IK 分词器维护了一个词库，但是针对一些网络流行语，比如奥里给，白嫖等词，默认情况下并不会识别为一个词，
此时我们就可以扩展词库

要拓展IK分词器的词库，只需要修改一个ik分词器目录中config没目录中的IKAnalyzer.cfg.xml文件

文件内容:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
        <comment>IK Analyzer 扩展配置</comment>
        <!--用户可以在这里配置自己的扩展字典 -->
        <entry key="ext_dict">ext.dic</entry>
         <!--用户可以在这里配置自己的扩展停止词字典-->
        <entry key="ext_stopwords"></entry>
        <!--用户可以在这里配置远程扩展字典 -->
        <!-- <entry key="remote_ext_dict">words_location</entry> -->
        <!--用户可以在这里配置远程扩展停止词字典-->
        <!-- <entry key="remote_ext_stopwords">words_location</entry> -->
</properties>

在当前目录下，新建文件：ext.dit 并保存以下内容:

传智教育
奥里给
白嫖
旺财

针对一起停用词，可以将其维护在 stopword.dic(默认就有的文件)，比如：

新增ext.dic 和修改完stopword.dic 后需重启 es

重启完毕后，测试：

POST /_analyze
{
  "text":"传智教育的课程可以白嫖，而且就业率高达95%，奥里给！",
  "analyzer":"ik_smart"
}

结果：

{
  "tokens" : [
    {
      "token" : "传智教育",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "课程",
      "start_offset" : 5,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "可以",
      "start_offset" : 7,
      "end_offset" : 9,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "白嫖",
      "start_offset" : 9,
      "end_offset" : 11,
      "type" : "CN_WORD",
      "position" : 3
    },
    {
      "token" : "而且",
      "start_offset" : 12,
      "end_offset" : 14,
      "type" : "CN_WORD",
      "position" : 4
    },
    {
      "token" : "就业率",
      "start_offset" : 14,
      "end_offset" : 17,
      "type" : "CN_WORD",
      "position" : 5
    },
    {
      "token" : "高达",
      "start_offset" : 17,
      "end_offset" : 19,
      "type" : "CN_WORD",
      "position" : 6
    },
    {
      "token" : "95",
      "start_offset" : 19,
      "end_offset" : 21,
      "type" : "ARABIC",
      "position" : 7
    },
    {
      "token" : "奥里给",
      "start_offset" : 23,
      "end_offset" : 26,
      "type" : "CN_WORD",
      "position" : 8
    }
  ]
}

结果就是识别到了白嫖、奥里给等新兴词语，停用词也不会进行分词

操作索引库

mapping 属性

mapping 是对索引库中文档的约束，类似数据库中的schema，常见的mapp属性包括：

type：字段数据类型，常见的简单类型有：
- 字符串: text(可分词的文本)、keyword(精确值，不用分词，例如：品牌、国家、ip地址)
- 数值：long、integer、short、byte、double、float
- 布尔： boolean
- 日期: date
- 对象: object
index: 是否创建索引，默认为true
analyze：使用哪种分词器
properties: 该字段的子字段

创建索引库

ES中通过Restful 请求操作索引库、文档。请求内容用DSL语句来表示。创建索引库和mapping的DSL语法如下：

示例：

# 创建索引库
PUT product
{
  "mappings": {
    "properties": {
      "info": {
        "type": "text",
        "analyzer": "ik_smart" # 指定分词器
      },
      "email": {
        "type": "keyword",
        "index": false  # 不需要参数搜索，因此不用创建倒排索引
      },
      "name": {
        "type": "object",
        "properties": {  # 嵌套字段
          "firstName": { 
            "type": "keyword"
          },
          "lastName": {
            "type": "keyword"
          }
        }
      }
    }
  }
}

删除、查询、修改索引库

# 删除索引库
delete /product

# 查询索引库
put /product

# 修改索引库，只能添加新字段
PUT /emp/_mapping 
{
  "properties":{
    "age":{
      "type":"integer"
    }
  }
}

修改字段时会报错，如下将age 字段的类型修改为long：

文档操作

新增、查询、删除

新增语法：

示例：

# 新增文档
POST /emp/_doc/1
{
  "age":23,
  "info":"国庆放假好开心",
  "email":"xxx@test.com",
  "name":{
    "firstName":"张",
    "lastName":"三"
  }
}


# 返回：
{
  "_index" : "emp",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 6,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 7,
  "_primary_term" : 9
}


# 查询文档
GET /emp/_doc/1

# 返回：
{
  "_index" : "emp",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 6,
  "_seq_no" : 7,
  "_primary_term" : 9,
  "found" : true,
  "_source" : {
    "age" : 23,
    "info" : "国庆放假好开心",
    "email" : "xxx@test.com",
    "name" : {
      "firstName" : "张",
      "lastName" : "三"
    }
  }
}


# 删除文档
DELETE /emp/_doc/1

修改文档

方式一：全量修改，当文档ID存在时，先删除后修改。当文档ID不存在时，直接新增

# 修改存在的文档
POST /emp/_doc/1
{
  "age":23,
  "info":"国庆放假好开心",
  "email":"ZhangSan@test.com",
  "name":{
    "firstName":"张",
    "lastName":"三"
  }
}

# 返回：
{
  "_index" : "emp",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 7,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 8,
  "_primary_term" : 9
}

# 修改不存在的文档，直接新增
POST /emp/_doc/3
{
  "age":23,
  "info":"中秋放假好开心",
  "email":"wangcai@test.com",
  "name":{
    "firstName":"wang",
    "lastName":"cai"
  }
}

方式二：局部修改(增量修改)
语法：

# 局部修改
POST /emp/_update/3
{
  "doc":{
    "age": 24
  }
}

# 返回：
{
  "_index" : "emp",
  "_type" : "_doc",
  "_id" : "3",
  "_version" : 2,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 13,
  "_primary_term" : 9
}

RestClient 操作索引库

官方文档：https://www.elastic.co/guide/en/elasticsearch/client/java-api-client/current/index.html

准备工作：

1.创建表导入数据到mysql

CREATE TABLE `tb_hotel`  (
  `id` bigint(20) NOT NULL COMMENT '酒店id',
  `name` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL COMMENT '酒店名称',
  `address` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL COMMENT '酒店地址',
  `price` int(10) NOT NULL COMMENT '酒店价格',
  `score` int(2) NOT NULL COMMENT '酒店评分',
  `brand` varchar(32) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL COMMENT '酒店品牌',
  `city` varchar(32) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL COMMENT '所在城市',
  `star_name` varchar(16) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT '酒店星级，1星到5星，1钻到5钻',
  `business` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT '商圈',
  `latitude` varchar(32) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL COMMENT '纬度',
  `longitude` varchar(32) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL COMMENT '经度',
  `pic` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT '酒店图片',
  PRIMARY KEY (`id`) USING BTREE
) ENGINE = InnoDB CHARACTER SET = utf8mb4 COLLATE = utf8mb4_general_ci ROW_FORMAT = Compact;

2.准备建立索引库的mapping


PUT /hotel
{
  "mappings": {
    "properties": {
      "id": {
        "type": "keyword"
      },
      "name": {
        "type": "text",
        "ayalzer": "ik_max_work",
        "copy_to": "all"
      },
      "address": {
        "type": "keyword",
        "index": false
      },
      "price": {
        "type": "integer"
      },
      "scope": {
        "type": "integer"
      },
      "brand": {
        "type": "keyword",
        "copy_to": "all"
      },
      "city": {
        "type": "keyword"
      },
      "starName": {
        "type": "keyword"
      },
      "business": {
        "type": "keyword"
      },
      "location": {
        "type": "gen_point"
      },
      "pic": {
        "type": "keyword",
        "index": false
      },
      "all":{
        "type":"text",
        "analyzer": "ik_max_word"
      }
    }
  }
}

注意点：

不参与搜索可以将字段的index 设置为 false
copy_to：将多个字段联合在一起作为索引, 可以提升查询效率
ES中支持两种地理坐标数据类型：
- geo_point: 由维度(latitude) 和经度(longitude) 确定的一个点. 例如："32.8752345,120.2981576"
- geo_shape: 有多个geo_point 组成的复杂几何图形，例如一条直线

初始化

1.引入依赖

        <dependency>
            <groupId>org.elasticsearch.client</groupId>
            <artifactId>elasticsearch-rest-high-level-client</artifactId>
            <version>7.6.2</version>
        </dependency>

2.编写测试类


@SpringBootTest
class HotelDemoApplicationTests {

    private RestHighLevelClient client;

    @Test
    public void testInit(){
        System.out.println(client);
    }


    @BeforeEach
    void setUp(){
        this.client = new RestHighLevelClient(RestClient.builder(
                HttpHost.create("http://192.168.184.152:9200")
        ));
    }

    @AfterEach
    void tearDown() throws IOException {
      this.client.close();
    }
}

索引库增删改查

    private static final String  MAPPING_TEMPLATE = "{\n" +
            "  \"mappings\": {\n" +
            "    \"properties\": {\n" +
            "      \"id\": {\n" +
            "        \"type\": \"keyword\"\n" +
            "      },\n" +
            "      \"name\": {\n" +
            "        \"type\": \"text\",\n" +
            "        \"analyzer\": \"ik_max_word\",\n" +
            "        \"copy_to\": \"all\"\n" +
            "      },\n" +
            "      \"address\": {\n" +
            "        \"type\": \"keyword\",\n" +
            "        \"index\": false\n" +
            "      },\n" +
            "      \"price\": {\n" +
            "        \"type\": \"integer\"\n" +
            "      },\n" +
            "      \"scope\": {\n" +
            "        \"type\": \"integer\"\n" +
            "      },\n" +
            "      \"brand\": {\n" +
            "        \"type\": \"keyword\",\n" +
            "        \"copy_to\": \"all\"\n" +
            "      },\n" +
            "      \"city\": {\n" +
            "        \"type\": \"keyword\"\n" +
            "      },\n" +
            "      \"starName\": {\n" +
            "        \"type\": \"keyword\"\n" +
            "      },\n" +
            "      \"business\": {\n" +
            "        \"type\": \"keyword\"\n" +
            "      },\n" +
            "      \"location\": {\n" +
            "        \"type\": \"geo_point\"\n" +
            "      },\n" +
            "      \"pic\": {\n" +
            "        \"type\": \"keyword\",\n" +
            "        \"index\": false\n" +
            "      },\n" +
            "      \"all\":{\n" +
            "        \"type\":\"text\",\n" +
            "        \"analyzer\": \"ik_max_word\"\n" +
            "      }\n" +
            "    }\n" +
            "  }\n" +
            "}";
    @Test
    public void testCreataIndex() throws IOException {
        //1，获取请求对象
        CreateIndexRequest request = new CreateIndexRequest("hotel");
        //2.封装数据  
        request.source(MAPPING_TEMPLATE, XContentType.JSON);  
        //3.发送请求
        client.indices().create(request, RequestOptions.DEFAULT);
    }
    
    //删除索引库
    @Test
    public void deleteIndex() throws IOException {
        DeleteIndexRequest request = new DeleteIndexRequest("hotel");
        client.indices().delete(request, RequestOptions.DEFAULT);
    }
    
  
    @Test    //查询索引库
    public void getIndex() throws IOException {
        GetIndexRequest getIndexRequest = new GetIndexRequest("hotel");
        boolean exists = client.indices().exists(getIndexRequest, RequestOptions.DEFAULT);
        System.err.println(exists ? "索引库存在" : "索引库不存在");
    }

RestClient 操作文档

封装MYSQL的字段与索引库的映射字段名不一致，需重新封装：

import java.io.IOException;

@SpringBootTest
class HotelDocTests {

    @Autowired
    private IHotelService iHotelService;

    private RestHighLevelClient client;


    //测试新增文档
    @Test
    public void addDoc() throws IOException {
        //1.查询mysql数据库数据
        Hotel hotel = iHotelService.getById(395434L);
        //2.转换为索引库映射的对象
        HotelDoc hotelDoc = new HotelDoc(hotel);
        //3.创建请求
        IndexRequest request = new IndexRequest("hotel").id(hotelDoc.getId().toString());
        //4.封装 json 数据 ，source方法 第一个参数为要新增文档的json字符串
        request.source(JSON.toJSONString(hotel) , XContentType.JSON);
        //4。发送请求
        client.index(request, RequestOptions.DEFAULT);

    }

    @BeforeEach
    void setUp(){
        this.client = new RestHighLevelClient(RestClient.builder(
                HttpHost.create("http://192.168.184.152:9200")
        ));
    }

    @AfterEach
    void tearDown() throws IOException {
      this.client.close();
    }

}

结果：

查询文档：

@Test  //测试查询文档
public void testGetDoc() throws IOException {
    GetRequest request = new GetRequest("hotel", "395434");
    GetResponse response = client.get(request, RequestOptions.DEFAULT);
    String sourceAsString = response.getSourceAsString();
    //序列化为java 对象
    HotelDoc hotelDoc = JSON.parseObject(sourceAsString, HotelDoc.class);
    System.out.println(hotelDoc);
}

更新及删除文档：

@Test  // 更新文档
public void testUpdateDoc() throws IOException {

    UpdateRequest request = new UpdateRequest("hotel","395434");
    request.doc(
        "address","东三环北路东方路2号",
            "price","451"
    );

    client.update(request, RequestOptions.DEFAULT);
}

@Test  //删除文档
public void testDeleteDoc() throws IOException {
    DeleteRequest deleteRequest = new DeleteRequest("hotel","395434");

    client.delete(deleteRequest, RequestOptions.DEFAULT);
}

批量插入：

@Test
public void testBatchInsertDoc() throws IOException {
    BulkRequest request = new BulkRequest();

    List<Hotel> list = iHotelService.list();
    for (Hotel hotel : list) {
        HotelDoc hotelDoc = new HotelDoc(hotel);
        request.add(new IndexRequest("hotel")
                .id(hotelDoc.getId().toString())
                .source(JSON.toJSONString(hotelDoc),XContentType.JSON));
    }

    client.bulk(request, RequestOptions.DEFAULT);
}

批量查询：
GET /hotel/_search

DSL 查询语法

查询所有：查询出所有数据，一般测试使用。例如:match_all
全文检索(full text) 查询：利用分词器对用户输入内容分词，然后去倒排索引库中匹配。例如：
- match_query
- multi_match_query
精确查询：根据精确词条值查找数据，一般是查找keyword、数值、日期、boolean等类型字段。例如：
- ids
- range
- term
地理(geo) 查询：根据经纬度查询。例如：
- geo_distance
- geo_bounding_box
复合(compound)查询: 符合查询可以将上述各种查询条件组合起来，合并查询条件。例如：
- bool
- function_score

全文检索

match_query:

GET /hotel/_search
{
  "query": {
    "match": {
      "all": "外滩七天"
    }
  }
}

multi_match_query:

GET /hotel/_search
{
  "query": {
    "multi_match": {
      "query": "外滩七天",
      "fields": ["brand","name","business"]
    }
  }
}

精确查询

# 精确term 查询
GET /hotel/_search
{
  "query": {
    "term": {
      "city": {
        "value": "深圳"
      }
    }
  }
}

# 精确range 查询
GET /hotel/_search
{
  "query": {
    "range": {
      "price": {
        "gte": 1000,
        "lte": 2000
      }
    }
  }
}

FunctionScoreQuery

语法介绍：

案例需求：

将以下最后一个酒店：如家酒店·neo(上海外滩城隍庙小南门地铁站店) 搜索结果提升为第一名：

GET /hotel/_search
{
  "query": {
    "function_score": {
      "query": {
        "match": {
          "all": "外滩"
        }
      },
      "functions": [
        {
          "filter": {"term": {   # 过滤，只要如家酒店
            "brand": "如家"
          }},
          "weight": 10
          
        }
      ],
      "boost_mode": "multiply" 
    }
  }
}

最终的算分结果为： 3.8000445 * 10 = 38.000446, 搜索结果也变为第一

BooleanQuery

布尔查询是一个或多个查询子句的组合。子查询的组合方式有：

must：必须匹配每个子查询，参与算分，参与类似"与"
should: 选择性匹配子查询0，参与算分类似"或"
must_not: 必须不匹配，不参与算分，类似"非"
filter：必须匹配，不参与算分

案例1：
搜索名字包含如家，价格不高于400，在坐标31.21，121.5 周围10km 范围内的酒店

GET /hotel/_search
{
  "query": {
    "bool": {
      "must": [
        {"match": {
          "name": "如家"
        }}
      ],
      "must_not": [
        {"range": {
          "price": {
            "gt": 400
          }
        }}
      ],
      "filter": [
        {
          "geo_distance": {
            "distance": "10km",
            "location": {
              "lat": 31.21,
              "lon": 121.5
            }
          }
        }
      ]
    }
  }
}

案例2：
品牌为如家或者1天，价格不高于400，在坐标31.21，121.5 周围10km 范围内的酒店

GET /hotel/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "term": {
            "brand": {
              "value": "如家"
            }
          }
        },
        {
          "term": {
            "brand": {
              "value": "7天"
            }
          }
        }
      ],
      "must_not": [
        {
          "range": {
            "price": {
              "gt": 400
            }
          }
        }
      ],
      "filter": [
        {
          "geo_distance": {
            "distance": "10km",
            "location": {
              "lat": 31.21,
              "lon": 121.5
            }
          }
        }
      ]
    }
  }
}

排序

es 支持对搜索结果排序，默认是根据相关度算分(_socre) 来排序。可以排序字段类型有：keyword类型、数值类型、地理坐标类型、日期类型等。

案例1：对酒店数据按照用户评价降序排序，评价相同的按照价格升序排序

GET /hotel/_search
{
  "query": {
    "match_all": {}
  }
  , "sort": [
    {
      "score": {
        "order": "desc"
      },
      "price": {
        "order": "asc"
      }
    }
  ]
}

案例2：找到经纬度:121.612282,31.034661 周围的酒店，距离升序排序

GET /hotel/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "_geo_distance": {
        "location": {
          "lat": 31.034661,
          "lon": 121.612282
        },
        "order": "asc",
        "unit":"km"
      }
    }
  ]
}

分页

ES 默认情况下只返回 top 10 的数据。而如果要查询更多数据就需要修改分页参数了
ES 中通过修改 from、size 参数来控制要返回的分页结果

# 分页查询
# 查询第二页数据，每页10条，from = （page-1）* size
GET /hotel/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "price":  "desc"
    }
  ],
  "from": 10,
  "size": 10
}

查询第一百页数据：

深度分页问题：

ES 是分布式的，所以会面临深度分页的问题。例如按price排序后，或许第1000页的数据：
1.首先在每个数据分片都排序查询前1000条文档
2.然后将所有节点的结果聚合，在内存中重新排序选出前1000条文档
3.最后从这1000条中，选取从990开始的10条文档

如果搜索页数过深，或者结果集(from + size) 越大，对内存和CPU的消耗也越高。
因此ES设定结果集查询的上限是10000

高亮

高亮：就是在搜搜结果中把搜索关键字突出显示。

原理：

将搜索结果中的关键字用标签标记出来
在页面中给标签添加css样式

语法：

注意：
1.默认搜索的字段是与高亮的字段报错一致，如果不一致不会进行高亮
例如：

GET /hotel/_search
{
  "query": {
    "match": {
      "brand":"如家"
    }
  },
  "highlight": {
    "fields": {
      "name": {}
    }
  }
}

结果并未高亮：

name添加 "require_field_match": "false" 即可

require_field_match": "false"

GET /hotel/_search
{
  "query": {
    "match": {
      "brand":"如家"
    }
  },
  "highlight": {
    "fields": {
      "name": {"require_field_match": "false"}
    }
  }
}

posted @ 2023-09-13 22:27 chuangzhou 阅读(5) 评论(0) 编辑收藏举报

刷新页面返回顶部

认真的活在当下