ES - 入门

初始ES

安装 elasticsearch

# 1
docker network create es-net


#2导入数据 
docker load -i es.tar

#3 运行
docker run -d \
	--name es \
    -e "ES_JAVA_OPTS=-Xms512m -Xmx512m" \
    -e "discovery.type=single-node" \
    -v es-data:/usr/share/elasticsearch/data \
    -v es-plugins:/usr/share/elasticsearch/plugins \
    --privileged \
    --network es-net \
    -p 9200:9200 \
    -p 9300:9300 \
elasticsearch:7.12.1

参数介绍:
-d: 后台运行
-e "ES_JAVA_OPTS=-Xms512m -Xmx512m": 设置环境,内存大小
-e "discovery.type=single-node": 非集群模式
-v es-data:/usr/share/elasticsearch/data:挂载逻辑卷,绑定es的数据目录
-v es-plugins:/usr/share/elasticsearch/plugins:挂载逻辑卷,绑定es的插件目录
--privileged:授予逻辑卷访问权
--network es-net :加入一个名为es-net的网络中
-p 9200:9200:端口映射配置

当问:http://192.168.184.152:9200/

代表搭建成功

安装 kibana

# 1
docker load -i kibana.tar

# 2
docker run -d \
--name kibana \
-e ELASTICSEARCH_HOSTS=http://es:9200 \
--network=es-net \
-p 5601:5601  \
kibana:7.12.1

--network=es-net: 加入一个名为es-net的网络中,与elaseticsearch 在同一个网络中
-e ELASTICSEARCH_HOSTS=http://es:9200": 设置elaseticsearch 的地址,因为Kibana 与 elaseticsearch 在同一个网络中,因此可以使用容器名直接访问elaseticsearch
-p 5601:5601: 端口映射配置

kibana启动一般比较慢,需要多等待一会,可以通过命令:
shell docker logs -f kibana

查看运行日志,当查看到下面的日志,说明成功:

需要注意的是启动Kibana前 要先启动 ES

DevTool 可以直接写DSL 语句:

安装 IK 分词器

默认的标准分词器对中文分词不太友好,可以使用IK分词器

下载地址:https://github.com/medcl/elasticsearch-analysis-ik

IK 分词器包含两种模式:

  • ik_smart: 最少切分
  • ik_max_word: 最细切分

离线安装:
1.查看es 插件数据卷目录

docker volume inspect es-plugins

  1. 将下载好的IK分词器解压后上传到:/var/lib/docker/volumes/es-plugins/_data

3.重启容器

docker restart es

测试最少切分:

POST /_analyze
{
  "text":"程序员旺财学习JAVA太开心了",
  "analyzer":"ik_smart"
}

结果:

{
  "tokens" : [
    {
      "token" : "程序员",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "旺",
      "start_offset" : 3,
      "end_offset" : 4,
      "type" : "CN_CHAR",
      "position" : 1
    },
    {
      "token" : "财",
      "start_offset" : 4,
      "end_offset" : 5,
      "type" : "CN_CHAR",
      "position" : 2
    },
    {
      "token" : "学习",
      "start_offset" : 5,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 3
    },
    {
      "token" : "java",
      "start_offset" : 7,
      "end_offset" : 11,
      "type" : "ENGLISH",
      "position" : 4
    },
    {
      "token" : "太",
      "start_offset" : 11,
      "end_offset" : 12,
      "type" : "CN_CHAR",
      "position" : 5
    },
    {
      "token" : "开心",
      "start_offset" : 12,
      "end_offset" : 14,
      "type" : "CN_WORD",
      "position" : 6
    },
    {
      "token" : "了",
      "start_offset" : 14,
      "end_offset" : 15,
      "type" : "CN_CHAR",
      "position" : 7
    }
  ]
}

测试最细切分:

POST /_analyze
{
  "text":"程序员旺财学习JAVA太开心了",
  "analyzer":"ik_max_word"
}

结果:

{
  "tokens" : [
    {
      "token" : "程序员",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "程序",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "员",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "CN_CHAR",
      "position" : 2
    },
    {
      "token" : "旺",
      "start_offset" : 3,
      "end_offset" : 4,
      "type" : "CN_CHAR",
      "position" : 3
    },
    {
      "token" : "财",
      "start_offset" : 4,
      "end_offset" : 5,
      "type" : "CN_CHAR",
      "position" : 4
    },
    {
      "token" : "学习",
      "start_offset" : 5,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 5
    },
    {
      "token" : "java",
      "start_offset" : 7,
      "end_offset" : 11,
      "type" : "ENGLISH",
      "position" : 6
    },
    {
      "token" : "太",
      "start_offset" : 11,
      "end_offset" : 12,
      "type" : "CN_CHAR",
      "position" : 7
    },
    {
      "token" : "开心",
      "start_offset" : 12,
      "end_offset" : 14,
      "type" : "CN_WORD",
      "position" : 8
    },
    {
      "token" : "了",
      "start_offset" : 14,
      "end_offset" : 15,
      "type" : "CN_CHAR",
      "position" : 9
    }
  ]
}

IK 分词器的扩展和停用词典

IK 分词器维护了一个词库,但是针对一些网络流行语,比如奥里给,白嫖等词,默认情况下并不会识别为一个词,
此时我们就可以扩展词库

要拓展IK分词器的词库,只需要修改一个ik分词器目录中config没目录中的IKAnalyzer.cfg.xml文件

文件内容:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
        <comment>IK Analyzer 扩展配置</comment>
        <!--用户可以在这里配置自己的扩展字典 -->
        <entry key="ext_dict">ext.dic</entry>
         <!--用户可以在这里配置自己的扩展停止词字典-->
        <entry key="ext_stopwords"></entry>
        <!--用户可以在这里配置远程扩展字典 -->
        <!-- <entry key="remote_ext_dict">words_location</entry> -->
        <!--用户可以在这里配置远程扩展停止词字典-->
        <!-- <entry key="remote_ext_stopwords">words_location</entry> -->
</properties>

在当前目录下,新建文件:ext.dit 并保存以下内容:

传智教育
奥里给
白嫖
旺财

针对一起停用词,可以将其维护在 stopword.dic(默认就有的文件),比如:

新增ext.dic 和 修改完stopword.dic 后需重启 es

重启完毕后,测试:

POST /_analyze
{
  "text":"传智教育的课程可以白嫖,而且就业率高达95%,奥里给!",
  "analyzer":"ik_smart"
}

结果:

{
  "tokens" : [
    {
      "token" : "传智教育",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "课程",
      "start_offset" : 5,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "可以",
      "start_offset" : 7,
      "end_offset" : 9,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "白嫖",
      "start_offset" : 9,
      "end_offset" : 11,
      "type" : "CN_WORD",
      "position" : 3
    },
    {
      "token" : "而且",
      "start_offset" : 12,
      "end_offset" : 14,
      "type" : "CN_WORD",
      "position" : 4
    },
    {
      "token" : "就业率",
      "start_offset" : 14,
      "end_offset" : 17,
      "type" : "CN_WORD",
      "position" : 5
    },
    {
      "token" : "高达",
      "start_offset" : 17,
      "end_offset" : 19,
      "type" : "CN_WORD",
      "position" : 6
    },
    {
      "token" : "95",
      "start_offset" : 19,
      "end_offset" : 21,
      "type" : "ARABIC",
      "position" : 7
    },
    {
      "token" : "奥里给",
      "start_offset" : 23,
      "end_offset" : 26,
      "type" : "CN_WORD",
      "position" : 8
    }
  ]
}

结果就是识别到了白嫖、奥里给等新兴词语,停用词也不会进行分词

操作索引库

mapping 属性

mapping 是对索引库中文档的约束,类似数据库中的schema,常见的mapp属性包括:

  • type:字段数据类型,常见的简单类型有:

    • 字符串: text(可分词的文本)、keyword(精确值,不用分词,例如:品牌、国家、ip地址)
    • 数值:long、integer、short、byte、double、float
    • 布尔: boolean
    • 日期: date
    • 对象: object
  • index: 是否创建索引,默认为true

  • analyze:使用哪种分词器

  • properties: 该字段的子字段

创建索引库

ES中通过Restful 请求操作索引库、文档。请求内容用DSL语句来表示。创建索引库和mapping的DSL语法如下:

示例:

# 创建索引库
PUT product
{
  "mappings": {
    "properties": {
      "info": {
        "type": "text",
        "analyzer": "ik_smart" # 指定分词器
      },
      "email": {
        "type": "keyword",
        "index": false  # 不需要参数搜索,因此不用创建倒排索引
      },
      "name": {
        "type": "object",
        "properties": {  # 嵌套字段
          "firstName": { 
            "type": "keyword"
          },
          "lastName": {
            "type": "keyword"
          }
        }
      }
    }
  }
}

删除、查询、修改索引库

# 删除索引库
delete /product

# 查询索引库
put /product

# 修改索引库,只能添加新字段
PUT /emp/_mapping 
{
  "properties":{
    "age":{
      "type":"integer"
    }
  }
}

修改字段时会报错,如下将age 字段的类型修改为long:

文档操作

新增、查询、删除

新增语法:

示例:

# 新增文档
POST /emp/_doc/1
{
  "age":23,
  "info":"国庆放假好开心",
  "email":"xxx@test.com",
  "name":{
    "firstName":"张",
    "lastName":"三"
  }
}


# 返回:
{
  "_index" : "emp",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 6,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 7,
  "_primary_term" : 9
}


# 查询文档
GET /emp/_doc/1

# 返回:
{
  "_index" : "emp",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 6,
  "_seq_no" : 7,
  "_primary_term" : 9,
  "found" : true,
  "_source" : {
    "age" : 23,
    "info" : "国庆放假好开心",
    "email" : "xxx@test.com",
    "name" : {
      "firstName" : "张",
      "lastName" : "三"
    }
  }
}


# 删除文档
DELETE /emp/_doc/1

修改文档

方式一:全量修改,当文档ID存在时,先删除后修改。当文档ID不存在时,直接新增

# 修改存在的文档
POST /emp/_doc/1
{
  "age":23,
  "info":"国庆放假好开心",
  "email":"ZhangSan@test.com",
  "name":{
    "firstName":"张",
    "lastName":"三"
  }
}

# 返回:
{
  "_index" : "emp",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 7,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 8,
  "_primary_term" : 9
}

# 修改不存在的文档,直接新增
POST /emp/_doc/3
{
  "age":23,
  "info":"中秋放假好开心",
  "email":"wangcai@test.com",
  "name":{
    "firstName":"wang",
    "lastName":"cai"
  }
}

方式二:局部修改(增量修改)
语法:

# 局部修改
POST /emp/_update/3
{
  "doc":{
    "age": 24
  }
}

# 返回:
{
  "_index" : "emp",
  "_type" : "_doc",
  "_id" : "3",
  "_version" : 2,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 13,
  "_primary_term" : 9
}

RestClient 操作索引库

官方文档:https://www.elastic.co/guide/en/elasticsearch/client/java-api-client/current/index.html

准备工作:

1.创建表导入数据到mysql

CREATE TABLE `tb_hotel`  (
  `id` bigint(20) NOT NULL COMMENT '酒店id',
  `name` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL COMMENT '酒店名称',
  `address` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL COMMENT '酒店地址',
  `price` int(10) NOT NULL COMMENT '酒店价格',
  `score` int(2) NOT NULL COMMENT '酒店评分',
  `brand` varchar(32) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL COMMENT '酒店品牌',
  `city` varchar(32) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL COMMENT '所在城市',
  `star_name` varchar(16) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT '酒店星级,1星到5星,1钻到5钻',
  `business` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT '商圈',
  `latitude` varchar(32) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL COMMENT '纬度',
  `longitude` varchar(32) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL COMMENT '经度',
  `pic` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT '酒店图片',
  PRIMARY KEY (`id`) USING BTREE
) ENGINE = InnoDB CHARACTER SET = utf8mb4 COLLATE = utf8mb4_general_ci ROW_FORMAT = Compact;

2.准备建立索引库的mapping


PUT /hotel
{
  "mappings": {
    "properties": {
      "id": {
        "type": "keyword"
      },
      "name": {
        "type": "text",
        "ayalzer": "ik_max_work",
        "copy_to": "all"
      },
      "address": {
        "type": "keyword",
        "index": false
      },
      "price": {
        "type": "integer"
      },
      "scope": {
        "type": "integer"
      },
      "brand": {
        "type": "keyword",
        "copy_to": "all"
      },
      "city": {
        "type": "keyword"
      },
      "starName": {
        "type": "keyword"
      },
      "business": {
        "type": "keyword"
      },
      "location": {
        "type": "gen_point"
      },
      "pic": {
        "type": "keyword",
        "index": false
      },
      "all":{
        "type":"text",
        "analyzer": "ik_max_word"
      }
    }
  }
}

注意点:

  • 不参与搜索可以将字段的index 设置为 false
  • copy_to:将多个字段联合在一起作为索引, 可以提升查询效率
  • ES中支持两种地理坐标数据类型:
    • geo_point: 由维度(latitude) 和 经度(longitude) 确定的一个点. 例如:"32.8752345,120.2981576"
    • geo_shape: 有多个geo_point 组成的复杂几何图形,例如一条直线

初始化

1.引入依赖

        <dependency>
            <groupId>org.elasticsearch.client</groupId>
            <artifactId>elasticsearch-rest-high-level-client</artifactId>
            <version>7.6.2</version>
        </dependency>

2.编写测试类


@SpringBootTest
class HotelDemoApplicationTests {

    private RestHighLevelClient client;

    @Test
    public void testInit(){
        System.out.println(client);
    }


    @BeforeEach
    void setUp(){
        this.client = new RestHighLevelClient(RestClient.builder(
                HttpHost.create("http://192.168.184.152:9200")
        ));
    }

    @AfterEach
    void tearDown() throws IOException {
      this.client.close();
    }
}

索引库增删改查

    private static final String  MAPPING_TEMPLATE = "{\n" +
            "  \"mappings\": {\n" +
            "    \"properties\": {\n" +
            "      \"id\": {\n" +
            "        \"type\": \"keyword\"\n" +
            "      },\n" +
            "      \"name\": {\n" +
            "        \"type\": \"text\",\n" +
            "        \"analyzer\": \"ik_max_word\",\n" +
            "        \"copy_to\": \"all\"\n" +
            "      },\n" +
            "      \"address\": {\n" +
            "        \"type\": \"keyword\",\n" +
            "        \"index\": false\n" +
            "      },\n" +
            "      \"price\": {\n" +
            "        \"type\": \"integer\"\n" +
            "      },\n" +
            "      \"scope\": {\n" +
            "        \"type\": \"integer\"\n" +
            "      },\n" +
            "      \"brand\": {\n" +
            "        \"type\": \"keyword\",\n" +
            "        \"copy_to\": \"all\"\n" +
            "      },\n" +
            "      \"city\": {\n" +
            "        \"type\": \"keyword\"\n" +
            "      },\n" +
            "      \"starName\": {\n" +
            "        \"type\": \"keyword\"\n" +
            "      },\n" +
            "      \"business\": {\n" +
            "        \"type\": \"keyword\"\n" +
            "      },\n" +
            "      \"location\": {\n" +
            "        \"type\": \"geo_point\"\n" +
            "      },\n" +
            "      \"pic\": {\n" +
            "        \"type\": \"keyword\",\n" +
            "        \"index\": false\n" +
            "      },\n" +
            "      \"all\":{\n" +
            "        \"type\":\"text\",\n" +
            "        \"analyzer\": \"ik_max_word\"\n" +
            "      }\n" +
            "    }\n" +
            "  }\n" +
            "}";
    @Test
    public void testCreataIndex() throws IOException {
        //1,获取请求对象
        CreateIndexRequest request = new CreateIndexRequest("hotel");
        //2.封装数据  
        request.source(MAPPING_TEMPLATE, XContentType.JSON);  
        //3.发送请求
        client.indices().create(request, RequestOptions.DEFAULT);
    }
    
    //删除索引库
    @Test
    public void deleteIndex() throws IOException {
        DeleteIndexRequest request = new DeleteIndexRequest("hotel");
        client.indices().delete(request, RequestOptions.DEFAULT);
    }
    
  
    @Test    //查询索引库
    public void getIndex() throws IOException {
        GetIndexRequest getIndexRequest = new GetIndexRequest("hotel");
        boolean exists = client.indices().exists(getIndexRequest, RequestOptions.DEFAULT);
        System.err.println(exists ? "索引库存在" : "索引库不存在");
    }

RestClient 操作文档

封装MYSQL的字段与索引库的映射字段名不一致,需重新封装:

import java.io.IOException;

@SpringBootTest
class HotelDocTests {

    @Autowired
    private IHotelService iHotelService;

    private RestHighLevelClient client;


    //测试新增文档
    @Test
    public void addDoc() throws IOException {
        //1.查询mysql数据库数据
        Hotel hotel = iHotelService.getById(395434L);
        //2.转换为索引库映射的对象
        HotelDoc hotelDoc = new HotelDoc(hotel);
        //3.创建请求
        IndexRequest request = new IndexRequest("hotel").id(hotelDoc.getId().toString());
        //4.封装 json 数据 ,source方法 第一个参数为要新增文档的json字符串
        request.source(JSON.toJSONString(hotel) , XContentType.JSON);
        //4。发送请求
        client.index(request, RequestOptions.DEFAULT);

    }

    @BeforeEach
    void setUp(){
        this.client = new RestHighLevelClient(RestClient.builder(
                HttpHost.create("http://192.168.184.152:9200")
        ));
    }

    @AfterEach
    void tearDown() throws IOException {
      this.client.close();
    }

}

结果:

查询文档:

@Test  //测试查询文档
public void testGetDoc() throws IOException {
    GetRequest request = new GetRequest("hotel", "395434");
    GetResponse response = client.get(request, RequestOptions.DEFAULT);
    String sourceAsString = response.getSourceAsString();
    //序列化为java 对象
    HotelDoc hotelDoc = JSON.parseObject(sourceAsString, HotelDoc.class);
    System.out.println(hotelDoc);
}

更新及删除文档:

@Test  // 更新文档
public void testUpdateDoc() throws IOException {

    UpdateRequest request = new UpdateRequest("hotel","395434");
    request.doc(
        "address","东三环北路东方路2号",
            "price","451"
    );

    client.update(request, RequestOptions.DEFAULT);
}

@Test  //删除文档
public void testDeleteDoc() throws IOException {
    DeleteRequest deleteRequest = new DeleteRequest("hotel","395434");

    client.delete(deleteRequest, RequestOptions.DEFAULT);
}

批量插入:

@Test
public void testBatchInsertDoc() throws IOException {
    BulkRequest request = new BulkRequest();

    List<Hotel> list = iHotelService.list();
    for (Hotel hotel : list) {
        HotelDoc hotelDoc = new HotelDoc(hotel);
        request.add(new IndexRequest("hotel")
                .id(hotelDoc.getId().toString())
                .source(JSON.toJSONString(hotelDoc),XContentType.JSON));
    }

    client.bulk(request, RequestOptions.DEFAULT);
}

批量查询:
GET /hotel/_search

DSL 查询语法

  • 查询所有: 查询出所有数据,一般测试使用。例如:match_all
  • 全文检索(full text) 查询: 利用分词器对用户输入内容分词,然后去倒排索引库中匹配。例如:
    • match_query
    • multi_match_query
  • 精确查询: 根据精确词条值查找数据,一般是查找keyword、数值、日期、boolean等类型字段。例如:
    • ids
    • range
    • term
  • 地理(geo) 查询: 根据经纬度查询。例如:
    • geo_distance
    • geo_bounding_box
  • 复合(compound)查询: 符合查询可以将上述各种查询条件组合起来,合并查询条件。例如:
    • bool
    • function_score

全文检索

match_query:

GET /hotel/_search
{
  "query": {
    "match": {
      "all": "外滩七天"
    }
  }
}

multi_match_query:

GET /hotel/_search
{
  "query": {
    "multi_match": {
      "query": "外滩七天",
      "fields": ["brand","name","business"]
    }
  }
}

精确查询

# 精确term 查询
GET /hotel/_search
{
  "query": {
    "term": {
      "city": {
        "value": "深圳"
      }
    }
  }
}

# 精确range 查询
GET /hotel/_search
{
  "query": {
    "range": {
      "price": {
        "gte": 1000,
        "lte": 2000
      }
    }
  }
}

相关性算分

复合查询:可以将其他简单查询组合起来,实现更复杂的搜索逻辑,例如:

  • function score:算分函数查询,可以控制文档相关性算分,控制文档排名,例如百度竞价

当我们利用match 查询时, 文档结果会根据与搜索词条的关联度打分(_score),返回结果时按照分值降序排列。

例如,我们搜索"虹桥如家",结果如下:

FunctionScoreQuery

语法介绍:

案例需求:

  • 将以下最后一个酒店: 如家酒店·neo(上海外滩城隍庙小南门地铁站店) 搜索结果提升为第一名:

GET /hotel/_search
{
  "query": {
    "function_score": {
      "query": {
        "match": {
          "all": "外滩"
        }
      },
      "functions": [
        {
          "filter": {"term": {   # 过滤,只要如家酒店
            "brand": "如家"
          }},
          "weight": 10
          
        }
      ],
      "boost_mode": "multiply" 
    }
  }
}

最终的算分结果为: 3.8000445 * 10 = 38.000446, 搜索结果也变为第一

BooleanQuery

布尔查询是一个或多个查询子句的组合。子查询的组合方式有:

  • must: 必须匹配每个子查询,参与算分,参与类似"与"
  • should: 选择性匹配子查询0,参与算分类似"或"
  • must_not: 必须不匹配,不参与算分,类似"非"
  • filter: 必须匹配,不参与算分

案例1:
搜索名字包含如家,价格不高于400,在坐标31.21,121.5 周围10km 范围内的酒店

GET /hotel/_search
{
  "query": {
    "bool": {
      "must": [
        {"match": {
          "name": "如家"
        }}
      ],
      "must_not": [
        {"range": {
          "price": {
            "gt": 400
          }
        }}
      ],
      "filter": [
        {
          "geo_distance": {
            "distance": "10km",
            "location": {
              "lat": 31.21,
              "lon": 121.5
            }
          }
        }
      ]
    }
  }
}

案例2:
品牌为如家或者1天,价格不高于400,在坐标31.21,121.5 周围10km 范围内的酒店

GET /hotel/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "term": {
            "brand": {
              "value": "如家"
            }
          }
        },
        {
          "term": {
            "brand": {
              "value": "7天"
            }
          }
        }
      ],
      "must_not": [
        {
          "range": {
            "price": {
              "gt": 400
            }
          }
        }
      ],
      "filter": [
        {
          "geo_distance": {
            "distance": "10km",
            "location": {
              "lat": 31.21,
              "lon": 121.5
            }
          }
        }
      ]
    }
  }
}

排序

es 支持对搜索结果排序,默认是根据相关度算分(_socre) 来排序。可以排序字段类型有:keyword类型、数值类型、地理坐标类型、日期类型等。

案例1:对酒店数据按照用户评价降序排序,评价相同的按照价格升序排序

GET /hotel/_search
{
  "query": {
    "match_all": {}
  }
  , "sort": [
    {
      "score": {
        "order": "desc"
      },
      "price": {
        "order": "asc"
      }
    }
  ]
}

案例2:找到经纬度:121.612282,31.034661 周围的酒店,距离升序排序

GET /hotel/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "_geo_distance": {
        "location": {
          "lat": 31.034661,
          "lon": 121.612282
        },
        "order": "asc",
        "unit":"km"
      }
    }
  ]
}

分页

ES 默认情况下只返回 top 10 的数据。而如果要查询更多数据就需要修改分页参数了
ES 中通过修改 from、size 参数来控制要返回的分页结果

# 分页查询
# 查询第二页数据,每页10条,from = (page-1)* size
GET /hotel/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "price":  "desc"
    }
  ],
  "from": 10,
  "size": 10
}

查询第一百页数据:

深度分页问题:

ES 是分布式的,所以会面临深度分页的问题。例如按price排序后,或许第1000页的数据:
1.首先在每个数据分片都排序查询前1000条文档
2.然后将所有节点的结果聚合,在内存中重新排序选出前1000条文档
3.最后从这1000条中,选取从990开始的10条文档

如果搜索页数过深,或者结果集(from + size) 越大,对内存和CPU的消耗也越高。
因此ES设定结果集查询的上限是10000

高亮

高亮: 就是在搜搜结果中把搜索关键字突出显示。

原理:

  • 将搜索结果中的关键字用标签标记出来
  • 在页面中给标签添加css样式

语法:

注意:
1.默认搜索的字段是与高亮的字段报错一致,如果不一致不会进行高亮
例如:

GET /hotel/_search
{
  "query": {
    "match": {
      "brand":"如家"
    }
  },
  "highlight": {
    "fields": {
      "name": {}
    }
  }
}

结果并未高亮:

name添加 "require_field_match": "false" 即可

require_field_match": "false"

GET /hotel/_search
{
  "query": {
    "match": {
      "brand":"如家"
    }
  },
  "highlight": {
    "fields": {
      "name": {"require_field_match": "false"}
    }
  }
}

posted @ 2023-09-13 22:27  chuangzhou  阅读(5)  评论(0编辑  收藏  举报