使用 DSL 和 Java 操作 ElasticSearch

前面已经搭建好了单机版的 ElasticSearch 和 Kibana,接下来就可以通过 DSL 和 Java 代码操作 ElasticSearch。对于 ElasticSearch 来说,DSL(domain specific language )语言其实就是将 restful 请求和 Json 字符串相结合。Java 代码主要采用官方提供的 RestHighLevelClient 的 API 方法操作 ElasticSearch。本篇博客主要介绍有关索引库操作、文档操作、文档查询、聚合查询的相关细节内容,在博客最后会提供源代码下载。

ElasticSearch 官方帮助文档:https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html


一、搭建工程

新建一个 SpringBoot 工程,结构如下所示:

image

ElasticSearchConfig 主要是配置 RestHighLevelClient 实例对象,用于操作 ElasticSearch

MyHotel 实体类定义了索引库结构,用于作为数据载体,从索引库中获取数据进行展示

DSL 文件夹里面,主要存放用于操作 ElasticSearch 的常用 DSL 语句

JSON 文件夹里面,主要存放 RestHighLevelClient 向 ElasticSearch 发送的 DSL 语句中的 Json 内容

编写了 4 个测试类,分别用于演示 RestHighLevelClient 对索引库、文档、文档查询、聚合查询的操作代码

首先看一下 pom 文件的内容:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
         http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.jobs</groupId>
    <artifactId>springboot_elasticsearch</artifactId>
    <version>1.0</version>

    <properties>
        <maven.compiler.source>8</maven.compiler.source>
        <maven.compiler.target>8</maven.compiler.target>
        <!--这里强制覆盖 es 依赖的版本号,
        从 7.15 版本开始,官方废弃了 RestHighLevelClient 类
        因此这里使用废弃 RestHighLevelClient 前的最后一个版本的依赖包
        -->
        <elasticsearch.version>7.14.2</elasticsearch.version>
    </properties>
    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>2.4.5</version>
    </parent>
    <dependencies>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter</artifactId>
        </dependency>
        <!--引入 rest high level client 操作 elasticsearch-->
        <dependency>
            <groupId>org.elasticsearch.client</groupId>
            <artifactId>elasticsearch-rest-high-level-client</artifactId>
        </dependency>
        <dependency>
            <groupId>org.projectlombok</groupId>
            <artifactId>lombok</artifactId>
            <optional>true</optional>
        </dependency>
        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>fastjson</artifactId>
            <version>1.2.8</version>
        </dependency>
        <dependency>
            <groupId>org.apache.commons</groupId>
            <artifactId>commons-lang3</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-test</artifactId>
            <scope>test</scope>
        </dependency>
    </dependencies>
    <build>
        <plugins>
            <plugin>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-maven-plugin</artifactId>
                <version>2.4.5</version>
            </plugin>
        </plugins>
    </build>
</project>

主要引入了 elasticsearch-rest-high-level-client 这个依赖,由于 SpringBoot 自带的操作 ElasticSearch 依赖版本较低,这里可以在 properties 配置中增加 elasticsearch.version 的配置,覆盖 SpringBoot 自带的ElasticSearch 依赖版本。

本博客在 application.yml 中自定义了 ElasticSearch 的连接信息配置:

# 自定义的 elasticsearch 配置内容
elasticsearch:
  username: elastic
  password: tdGiSi*fhwW0F60*i*Jc
  # 连接的服务器 url,多个 url 之间用英文逗号分隔
  urls: http://192.168.136.128:9200

在 ElasticSearchConfig 中读取配置,实例化 RestHighLevelClient 对象放入 Spring 容器中:

package com.jobs.config;

import org.apache.http.Header;
import org.apache.http.HttpHeaders;
import org.apache.http.HttpHost;
import org.apache.http.auth.AuthScope;
import org.apache.http.auth.UsernamePasswordCredentials;
import org.apache.http.client.CredentialsProvider;
import org.apache.http.impl.client.BasicCredentialsProvider;
import org.apache.http.message.BasicHeader;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestClientBuilder;
import org.elasticsearch.client.RestHighLevelClient;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

import java.util.ArrayList;

@Configuration
public class ElasticSearchConfig {

    //用户名
    @Value("${elasticsearch.username}")
    private String username;

    //密码
    @Value("${elasticsearch.password}")
    private String password;

    //对于 yml 中的配置项,如果配置值是以英文逗号分隔,可直接转换为数组
    @Value("${elasticsearch.urls}")
    private String[] urls;

    @Bean
    public RestHighLevelClient restHighLevelClient() {
        //设置用户名和密码
        CredentialsProvider credentialsProvider = new BasicCredentialsProvider();
        credentialsProvider.setCredentials(AuthScope.ANY,
                new UsernamePasswordCredentials(username, password));

        //从 urls 数组中创建出多个 HttpHost 数组
        ArrayList<HttpHost> hostlist = new ArrayList<>();
        for (String url : urls) {
            hostlist.add(HttpHost.create(url));
        }
        HttpHost[] hosts = hostlist.toArray(new HttpHost[hostlist.size()]);
        RestClientBuilder builder = RestClient.builder(hosts);
        builder.setHttpClientConfigCallback(s -> s.setDefaultCredentialsProvider(credentialsProvider));
        //使用 RestHighLevelClient 操作 ElasticSearch 8 版本时,需要加上以下 header 后,操作索引文档才能不报错
        builder.setDefaultHeaders(new Header[]{
                new BasicHeader(HttpHeaders.ACCEPT,
                        "application/vnd.elasticsearch+json;compatible-with=7"),
                new BasicHeader(HttpHeaders.CONTENT_TYPE,
                        "application/vnd.elasticsearch+json;compatible-with=7")});
        RestHighLevelClient client = new RestHighLevelClient(builder);
        return client;
    }
}

下面列出 MyHotel 实体类的细节,该类主要用于承载从 ElasticSearch 查询出来的数据,打印和展示出来。

package com.jobs.pojo;

import lombok.Data;

@Data
public class MyHotel {
    private Integer id;
    //酒店名称
    private String name;
    //酒店地址
    private String address;
    //住宿价格
    private Integer price;
    //品牌
    private String brand;
    //所属城市
    private String city;
    //经纬度
    private String location;
    //距离
    private Double distance;
}

二、索引库操作

索引库操作的 DSL 语句都存储在 IndexDSL.txt 文件中,IndexTest 类是相关的 Java 代码:

# 创建索引库
PUT /myhotel
{
  "mappings": {
    "properties": {
      "id":{
        "type": "keyword"
      },
      "name":{
        "type": "text",
        "analyzer": "ik_max_word",
        "copy_to": "all"
      },
      "address":{
        "type": "keyword",
        "index": false
      },
      "price":{
        "type": "integer"
      },
      "brand":{
        "type": "keyword",
        "copy_to": "all"
      },
      "city":{
        "type": "keyword"
      },
      "location":{
        "type": "geo_point"
      },
      "all":{
        "type": "text",
        "analyzer": "ik_max_word"
      }
    }
  }
}

# 查询索引库
GET /myhotel

# 修改索引库,添加新字段(索引库创建后,只能新增字段,无法进行其它操作)
PUT /myhotel/_mapping
{
  "properties": {
    "addScore": {
      "type": "integer",
      "index": false
    }
  }
}

# 删除索引库
DELETE /myhotel
package com.jobs;

import org.elasticsearch.action.admin.indices.delete.DeleteIndexRequest;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.client.indices.CreateIndexRequest;
import org.elasticsearch.client.indices.GetIndexRequest;
import org.elasticsearch.client.indices.GetIndexResponse;
import org.elasticsearch.client.indices.PutMappingRequest;
import org.elasticsearch.cluster.metadata.MappingMetadata;
import org.elasticsearch.common.xcontent.XContentType;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.util.ResourceUtils;
import com.alibaba.fastjson.JSON;
import com.alibaba.fastjson.serializer.SerializerFeature;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.util.Map;

@SpringBootTest
public class IndexTest {

    @Autowired
    private RestHighLevelClient client;

    //创建索引库,如果不存在就创建
    @Test
    void createIndexTest() throws IOException {
        CreateIndexRequest createIndexRequest = new CreateIndexRequest("myhotel");
        //读取 resourses/JSON/CreateMyHotelJson.txt 文件内容
        File file = ResourceUtils.getFile("classpath:JSON/CreateMyHotelJson.txt");
        String createJson;
        try (BufferedReader br = new BufferedReader(new FileReader(file))) {
            createJson = br.readLine();
        }
        createIndexRequest.source(createJson, XContentType.JSON);
        client.indices().create(createIndexRequest, RequestOptions.DEFAULT);
    }

    //判断索引库是否存在,true 表示存在,false 表示不存在
    @Test
    void existsIndexTest() throws IOException {
        GetIndexRequest request = new GetIndexRequest("myhotel");
        boolean isExists = client.indices().exists(request, RequestOptions.DEFAULT);
        System.out.println(isExists ? "索引库已存在" : "索引库不存在");
    }

    //对索引库,只能新增字段,无法进行其它的操作
    @Test
    void updateIndexTest() throws Exception {
        PutMappingRequest putMappingRequest = new PutMappingRequest("myhotel");
        //读取 resourses/JSON/UpdateMyHotelJson.txt 文件内容
        File file = ResourceUtils.getFile("classpath:JSON/UpdateMyHotelJson.txt");
        String createJson;
        try (BufferedReader br = new BufferedReader(new FileReader(file))) {
            createJson = br.readLine();
        }
        putMappingRequest.source(createJson, XContentType.JSON);
        client.indices().putMapping(putMappingRequest, RequestOptions.DEFAULT);
        System.out.println("索引库新增字段成功");
    }

    //查看索引库的数据结构 Json
    @Test
    void getIndexTest() throws IOException {
        GetIndexRequest request = new GetIndexRequest("myhotel");
        GetIndexResponse response = client.indices().get(request, RequestOptions.DEFAULT);
        Map<String, MappingMetadata> mappings = response.getMappings();
        String result = JSON.toJSONString(mappings, SerializerFeature.PrettyFormat);
        System.out.println(result);
    }

    //删除索引库
    @Test
    void deleteIndexTest() throws IOException {
        DeleteIndexRequest request = new DeleteIndexRequest("myhotel");
        client.indices().delete(request, RequestOptions.DEFAULT);
        System.out.println("删除索引库成功");
    }
}

三、文档操作

文档的简单操作的 DSL 语句都存储在 DocDSL.txt 文件中,DocTest 类是相关的 Java 代码:

# 添加文档/全量更新文档
POST /myhotel/_doc/1
{
   "id": 1,
   "name": "北京海航大厦万豪酒店",
   "address": "霄云路甲26号",
   "price": 1302,
   "brand": "万豪",
   "city": "北京",
   "location": "39.959861, 116.467363"
}

# 使用 PUT 也可以添加文档/全量更新文档
PUT /myhotel/_doc/1
{
   "id": 1,
   "name": "北京海航大厦万豪酒店",
   "address": "霄云路甲26号",
   "price": 1302,
   "brand": "万豪",
   "city": "北京",
   "location": "39.959861, 116.467363"
}

# 根据文档 id 查询文档
GET /myhotel/_doc/1

# 根据文档 id 修改文档中的一部分字段
POST /myhotel/_update/1
{
  "doc": {
    "name": "侯胖胖任肥肥合资控股酒店",
    "price": 666
  }
}

# 删除文档
DELETE /myhotel/_doc/1
package com.jobs;

import com.alibaba.fastjson.JSON;
import com.alibaba.fastjson.JSONObject;
import com.jobs.pojo.MyHotel;
import org.apache.commons.lang3.StringUtils;
import org.elasticsearch.action.bulk.BulkRequest;
import org.elasticsearch.action.delete.DeleteRequest;
import org.elasticsearch.action.get.GetRequest;
import org.elasticsearch.action.get.GetResponse;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.update.UpdateRequest;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.xcontent.XContentType;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.util.ResourceUtils;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;

@SpringBootTest
public class DocTest {

    @Autowired
    private RestHighLevelClient client;

    //添加文档,如果文档 id 已存在,则删除原来的文档,然后再新增,因此该方法也可以用作全量修改
    @Test
    void addDocTest() throws IOException {
        //读取 resourses/JSON/CreateDocument.txt 文件内容
        File file = ResourceUtils.getFile("classpath:JSON/CreateDocument.txt");
        String json;
        try (BufferedReader br = new BufferedReader(new FileReader(file))) {
            json = br.readLine();
        }

        JSONObject jsonObj = JSON.parseObject(json);
        IndexRequest request = new IndexRequest("myhotel").id(jsonObj.getString("id"));
        request.source(json, XContentType.JSON);
        client.index(request, RequestOptions.DEFAULT);
    }

    //根据文档 id 查询文档内容
    @Test
    void getDocByIdTest() throws IOException {
        GetRequest request = new GetRequest("myhotel", "1");
        GetResponse response = client.get(request, RequestOptions.DEFAULT);
        //获取并打印文档的 json 内容
        String json = response.getSourceAsString();
        System.out.println(json);
        //转换成 MyHotel 并打印
        MyHotel myHotel = JSON.parseObject(json, MyHotel.class);
        System.out.println(myHotel);
    }

    //修改文档的部分字段
    @Test
    void updateDocPartTest() throws IOException {
        UpdateRequest request = new UpdateRequest("myhotel", "1");
        //这里修改【名称】和【价格】,字段名和字段值都是使用英文逗号分隔
        request.doc("name", "侯胖胖任肥肥合资控股酒店", "price", "666");
        client.update(request, RequestOptions.DEFAULT);
    }

    //删除文档
    @Test
    void deleteDocTest() throws IOException {
        DeleteRequest request = new DeleteRequest("myhotel", "1");
        client.delete(request, RequestOptions.DEFAULT);
    }

    //批量添加样例数据
    //BulkRequest 可以添加各种请求,如 IndexRequest,UpdateRequest,DeleteRequest
    @Test
    void bulkRequestTest() throws IOException {
        //读取 resourses/JSON/DemoJsonData.txt 文件内容
        File file = ResourceUtils.getFile("classpath:JSON/DemoJsonData.txt");
        //使用 BulkRequest 批量请求对象
        BulkRequest request = new BulkRequest();
        try (BufferedReader br = new BufferedReader(new FileReader(file))) {
            String json = br.readLine();
            JSONObject jsonObj;
            while (StringUtils.isNotBlank(json)) {
                jsonObj = JSON.parseObject(json);
                //为 BulkRequest 添加请求对象
                request.add(new IndexRequest("myhotel")
                        .id(jsonObj.getString("id"))
                        .source(json, XContentType.JSON));
                json = br.readLine();
            }
        }
        client.bulk(request, RequestOptions.DEFAULT);
    }
}

四、文档查询

文档查询操作的 DSL 语句都存储在 SearchDSL.txt 文件中,SearchTest 类是相关的 Java 代码:

# 查询所有数据
GET /myhotel/_search

# 查询所有数据
GET /myhotel/_search
{
  "query": {
    "match_all": {}
  }
}

# 查询单个字段
GET /myhotel/_search
{
  "query": {
    "match": {
      "all": "朝阳如家"
    }
  }
}

# 查询多个字段
GET /myhotel/_search
{
  "query": {
    "multi_match": {
      "query": "朝阳如家",
      "fields": ["brand","name"]
    }
  }
}

# term 精确查询
GET /myhotel/_search
{
  "query": {
    "term": {
      "city": {"value": "北京"}
    }
  }
}

# 范围查询
GET /myhotel/_search
{
  "query": {
    "range": {
      "price": {
        "gte": 100,
        "lte": 300
      }
    }
  }
}

# 多条件 bool 查询
GET /myhotel/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "name": "如家"
          }
        }
      ],
      "must_not": [
        {
          "range": {
            "price": {
              "gt": 400
            }
          }
        }
      ],
      "filter": [
        {
          "geo_distance": {
            "distance": "5km",
            "location": {
              "lat": 31.21,
              "lon": 121.5
            }
          }
        }
      ]
    }
  }
}

# 地理坐标距离查询
GET /myhotel/_search
{
  "query": {
    "geo_distance": {
      "distance": "5km",
      "location": {
        "lat": 31.21,
        "lon": 121.5
      }
    }
  }
}

# 算分查询
GET /myhotel/_search
{
  "query": {
    "function_score": {
      "query": {
        "match": {
          "all": "朝阳"
        }
      },
      "functions": [
        {
          "filter": {
            "term": {
              "brand": "如家"
            }
          },
          "weight": 10
        }
      ],
      "boost_mode": "sum"
    }
  }
}

# 查询结果排序
GET /myhotel/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "price": "asc"
    },
    {
      "id": "asc"
    }
  ]
}

# 查询结果按照距离排序
GET /myhotel/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "_geo_distance": {
        "location": {
          "lat": 31.034661,
          "lon": 121.612282
        },
        "order": "asc",
        "unit": "km"
      }
    }
  ]
}

# 查询结果分页
GET /myhotel/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "price":  "asc"
    }
  ],
  "from": 20,
  "size": 10
}

# 查询结果高亮显示
GET /myhotel/_search
{
  "query": {
    "match": {
      "all": "北广场"
    }
  },
  "highlight": {
    "fields": {
      "name": {
        "require_field_match": "false"
      }
    }
  }
}
package com.jobs;

import com.alibaba.fastjson.JSON;
import com.jobs.pojo.MyHotel;
import org.apache.commons.lang3.StringUtils;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.geo.GeoPoint;
import org.elasticsearch.common.lucene.search.function.CombineFunction;
import org.elasticsearch.common.unit.DistanceUnit;
import org.elasticsearch.index.query.BoolQueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.index.query.functionscore.FunctionScoreQueryBuilder;
import org.elasticsearch.index.query.functionscore.ScoreFunctionBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.SearchHits;
import org.elasticsearch.search.fetch.subphase.highlight.HighlightBuilder;
import org.elasticsearch.search.fetch.subphase.highlight.HighlightField;
import org.elasticsearch.search.sort.SortBuilders;
import org.elasticsearch.search.sort.SortOrder;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;

import java.io.IOException;
import java.util.Map;

@SpringBootTest
public class SearchTest {

    @Autowired
    private RestHighLevelClient client;

    //查询所有数据,elasticsearch 默认分页,每页 10 条,查询出第 1 页的结果
    @Test
    void matchAllTest() throws IOException {
        SearchRequest request = new SearchRequest("myhotel");
        request.source().query(QueryBuilders.matchAllQuery());
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);
        processResponse(response);
    }

    //多字段查询,尽量少用,因为字段越多,性能越差,
    //建议将多字段使用 copyto 合并到一个新字段,针对新字段进行查询
    @Test
    void MutiMatchTest() throws IOException {
        SearchRequest request = new SearchRequest("myhotel");
        request.source().query(QueryBuilders.multiMatchQuery("朝阳如家", "name", "brand"));
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);
        processResponse(response);
    }

    //单字段查询(之前创建索引库时,将 name 和 brand 使用 copyto 合并到了 all 字段中)
    //因此这里针对 all 字段的查询,相当于针对 name 和 brand 两个字段的查询
    @Test
    void matchSingleTest() throws IOException {
        SearchRequest request = new SearchRequest("myhotel");
        request.source().query(QueryBuilders.matchQuery("all", "朝阳如家"));
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);
        processResponse(response);
    }

    //term 精确查询
    @Test
    void matchTermTest() throws IOException {
        SearchRequest request = new SearchRequest("myhotel");
        request.source().query(QueryBuilders.termQuery("city", "北京"));
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);
        processResponse(response);
    }

    //range 范围查询
    @Test
    void matchRangeTest() throws IOException {
        SearchRequest request = new SearchRequest("myhotel");
        request.source().query(QueryBuilders.rangeQuery("price").gte(200).lte(500));
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);
        processResponse(response);
    }

    //多条件 bool 组合查询
    //must 和 should 参与评分,filter 和 mustNot 不存与评分
    //最常使用的是 must 和 filter 这两个组合
    @Test
    void boolTest() throws IOException {
        SearchRequest request = new SearchRequest("myhotel");
        request.source().query(
                QueryBuilders.boolQuery()
                        .must(QueryBuilders.termQuery("city", "北京"))
                        .filter(QueryBuilders.rangeQuery("price").gte(200).lte(500)));
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);
        processResponse(response);
    }

    //分页查询,对查询结果进行排序
    @Test
    void sortAndPageTest() throws IOException {
        int page = 3, size = 6;
        SearchRequest request = new SearchRequest("myhotel");
        request.source().query(QueryBuilders.matchAllQuery());
        //排序
        request.source().sort("price", SortOrder.ASC);
        //分页
        request.source().from((page - 1) * size).size(size);
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);
        processResponse(response);
    }

    //查询指定中心点 5 公里范围内的数据,查询结果按照距离由小到大排序
    @Test
    void matchDistanceTest() throws IOException {
        //中心点:纬度,经度
        String myPoint = "31.21,121.5";
        SearchRequest request = new SearchRequest("myhotel");
        //查询在中心点在 5 公里范围内的数据
        request.source().query(
                QueryBuilders.geoDistanceQuery("location")
                        .point(new GeoPoint(myPoint))
                        .distance(5, DistanceUnit.KILOMETERS));
        //按照与中心点的距离,从小到到大升序排列
        request.source().sort(SortBuilders
                .geoDistanceSort("location", new GeoPoint(myPoint))
                .order(SortOrder.ASC).unit(DistanceUnit.KILOMETERS));
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);
        processResponse(response, "location");
    }

    //算法函数查询,给自己认为比较重要的数据增加分数,在结果中可以靠前排列
    //需要注意:使用算分函数查询时,不要指定排序字段,因为一旦指定排序字段,就不再进行算分。
    @Test
    void funcTest() throws IOException {
        SearchRequest request = new SearchRequest("myhotel");
        //设置一些查询条件,比如查询上海价格在 200 到 600 的酒店,
        BoolQueryBuilder boolQuery = QueryBuilders.boolQuery();
        boolQuery.must(QueryBuilders.termQuery("city", "上海"));
        boolQuery.filter(QueryBuilders.rangeQuery("price").gte(200).lte(600));
        //在查询出来的酒店中,我想让如家酒店的算法和排名靠前一些,
        //给【速8酒店】的分数,增加10分,采用【原始分+函数分】之和的方式,计算得分,分值越大,排名越靠前
        FunctionScoreQueryBuilder functionScoreQuery =
                QueryBuilders.functionScoreQuery(boolQuery,
                        new FunctionScoreQueryBuilder.FilterFunctionBuilder[]{
                                new FunctionScoreQueryBuilder.FilterFunctionBuilder(
                                        QueryBuilders.termQuery("brand", "速8"),
                                        ScoreFunctionBuilders.weightFactorFunction(10))})
                        .boostMode(CombineFunction.SUM);
        request.source().query(functionScoreQuery);
        //对于 elasticsearch 来说,默认就是按照算分排序的
        //算分函数,可以让自定义控制相关数据的分数算法
        //如果一旦指定了排序字段的话,elasitcsearch 就不再进行算分,不会按照算分排序了。
        //request.source().sort("price", SortOrder.ASC);
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);
        processResponse(response);
    }

    //关键字高亮查询
    @Test
    void highlightTest() throws IOException {
        SearchRequest request = new SearchRequest("myhotel");
        request.source().query(QueryBuilders.matchQuery("all", "朝阳如家"));
        //当查询字段与高亮字段不一致时,需要使用 requireFieldMatch 为 false 才能高亮显示
        request.source().highlighter(new HighlightBuilder().field("name").requireFieldMatch(false));
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);
        processResponse(response);
    }


    private void processResponse(SearchResponse response) {
        processResponse(response, null);
    }

    //处理和展示结果,sortField 表示对结果进行排序的字段
    private void processResponse(SearchResponse response, String sortField) {
        SearchHits searchHits = response.getHits();
        //获取查询到的总条数
        long total = searchHits.getTotalHits().value;
        System.out.println("查询到的总数为:" + total);
        //获取查询结果
        SearchHit[] hits = searchHits.getHits();
        if (hits.length > 0) {
            for (SearchHit hit : hits) {
                String json = hit.getSourceAsString();
                MyHotel myhotel = JSON.parseObject(json, MyHotel.class);
                //如果有高亮结果的话,获取和处理高亮的数据
                Map<String, HighlightField> map = hit.getHighlightFields();
                if (map != null && map.size() > 0) {
                    //想获取名称中高亮的结果
                    HighlightField highlightField = map.get("name");
                    if (highlightField != null &&
                            highlightField.getFragments().length > 0) {
                        String hName = highlightField.getFragments()[0].toString();
                        //替换对象中的名称数据
                        myhotel.setName(hName);
                    }
                }

                //如果有距离排序的话,获取距离数据
                if (StringUtils.isNotBlank(sortField) &&
                        sortField.equalsIgnoreCase("location")) {
                    Object[] sortValues = hit.getSortValues();
                    if (sortValues != null && sortValues.length > 0) {
                        myhotel.setDistance((double) sortValues[0]);
                    }
                }

                System.out.println(myhotel);
            }
        }
    }
}

五、聚合查询

常用聚合查询操作的 DSL 语句都存储在 AggDSL.txt 文件中,AggTest 类是相关的 Java 代码:

# 按照 brand 进行聚合统计,默认按照统计值降序排列
GET /myhotel/_search
{
  # size 值为 0 表示不要返回具体的每条数据,
  # 查询结果中只需要返回统计数据即可
  "size": 0,
  "aggs": {
    "brandAgg": {
      "terms": {
        "field": "brand",
        "size": 20
      }
    }
  }
}

# 设置排序规则,按照统计值升序排列
# 不指定排序的规则的话,默认按照统计值降序排列
GET /myhotel/_search
{
  "size": 0,
  "aggs": {
    "brandAgg": {
      "terms": {
        "field": "brand",
        "size": 20,
        "order": {
          "_count": "asc"
        }
      }
    }
  }
}

# 先查询出价格小于200的数据,根据查询的结果再按照 brand 进行聚合统计
GET /myhotel/_search
{
  "query": {
    "range": {
      "price": {
        "lte": 200
      }
    }
  },
  "size": 0,
  "aggs": {
    "brandAgg": {
      "terms": {
        "field": "brand",
        "size": 20
      }
    }
  }
}

# 嵌套聚合,stats 可以聚合出 min,max,sum,avg
GET /myhotel/_search
{
  "size": 0,
  "aggs": {
    "brandAgg": {
      "terms": {
        "field": "brand",
        "size": 20,
        "order": {
          "priceAgg.min": "asc"
        }
      },
      "aggs": {
        "priceAgg": {
          "stats": {
            "field": "price"
          }
        }
      }
    }
  }
}

# 嵌套聚合,换一种写法求 min,max,sum,avg
# 此种写法,可以自定义查询部分聚合信息,比如:只查询 avg
GET /myhotel/_search
{
  "size": 0,
  "aggs": {
    "brandAgg": {
      "terms": {
        "field": "brand",
        "size": 20,
        "order": {
          "minAgg": "asc"
        }
      },
      "aggs": {
        "minAgg": {
          "min": {
            "field": "price"
          }
        },
        "maxAgg":{
          "max": {
            "field": "price"
          }
        },
        "sumAgg":{
          "sum": {
            "field": "price"
          }
        },
        "avgAgg":{
          "avg": {
            "field": "price"
          }
        }
      }
    }
  }
}
package com.jobs;

import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.aggregations.AggregationBuilders;
import org.elasticsearch.search.aggregations.Aggregations;
import org.elasticsearch.search.aggregations.BucketOrder;
import org.elasticsearch.search.aggregations.bucket.terms.Terms;
import org.elasticsearch.search.aggregations.bucket.terms.TermsAggregationBuilder;
import org.elasticsearch.search.aggregations.metrics.*;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;

import java.io.IOException;
import java.util.List;

@SpringBootTest
public class AggTest {

    @Autowired
    private RestHighLevelClient client;

    //对价格在 200 到 500 之间的酒店,按照品牌统计数量,取前 10 条统计信息
    //默认情况下,是按照统计的数字,从大到小倒序排列
    @Test
    void aggTest() throws IOException {
        SearchRequest request = new SearchRequest("myhotel");
        request.source().query(QueryBuilders.rangeQuery("price").gte(200).lte(500));
        //表示不要返回文档记录数据
        request.source().size(0);
        //表示返回聚合统计数据,默认情况下按照统计数量的倒序排列
        request.source().aggregation(AggregationBuilders
                //给统计聚合的字段,自定义一个字段名称,后面需要使用
                .terms("brandAgg").field("brand").size(10));
        //如果想按照统计数量的升序排列的话,代码如下:
        //request.source().aggregation(AggregationBuilders
        //        .terms("brandAgg").field("brand").size(10)
        //        .order(BucketOrder.count(true)));
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);
        //获取返回的聚合数统计数据
        Aggregations aggregations = response.getAggregations();
        Terms brandAgg = aggregations.get("brandAgg");
        List<? extends Terms.Bucket> buckets = brandAgg.getBuckets();
        if (buckets.size() > 0) {
            for (Terms.Bucket bucket : buckets) {
                String brandName = bucket.getKeyAsString();
                long docCount = bucket.getDocCount();
                System.out.println("brand:" + brandName + ",count:" + docCount);
            }
        }
    }

    //对价格在 200 到 500 之间的酒店,
    //按照品牌统计总数,最小价格,最大价格,总价格之和,平均价格
    //并且查询的结果数据,按照最小价格升序排列
    @Test
    void aggStatsTest() throws IOException {
        SearchRequest request = new SearchRequest("myhotel");
        request.source().query(QueryBuilders.rangeQuery("price").gte(200).lte(500));
        //表示不要返回文档记录数据
        request.source().size(0);
        TermsAggregationBuilder termsAggregationBuilder =
                //给统计聚合的字段,自定义一个字段名称,后面需要使用
                AggregationBuilders.terms("brandAgg").field("brand")
                        //查询结果按照 priceAgg.min 升序排列
                        .order(BucketOrder.aggregation("priceAgg.min", true))
                        //给子聚合统计的字段,自定义一个字段名称,后面需要使用
                        .subAggregation(AggregationBuilders.stats("priceAgg").field("price"));
        //表示返回聚合统计数据,默认情况下按照统计数量的倒序排列
        request.source().aggregation(termsAggregationBuilder);
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);
        //获取返回的聚合数统计数据
        Aggregations aggregations = response.getAggregations();
        Terms brandAgg = aggregations.get("brandAgg");
        List<? extends Terms.Bucket> buckets = brandAgg.getBuckets();
        if (buckets.size() > 0) {
            for (Terms.Bucket bucket : buckets) {
                String brandName = bucket.getKeyAsString();
                long docCount = bucket.getDocCount();
                //获取子聚合统计数据
                Stats priceAgg = bucket.getAggregations().get("priceAgg");
                String min = priceAgg.getMinAsString();
                String max = priceAgg.getMaxAsString();
                String sum = priceAgg.getSumAsString();
                String avg = priceAgg.getAvgAsString();
                System.out.println("brand:" + brandName + ",count:" + docCount +
                        ",最小:" + min + ",最大:" + max + ",总和:" + sum + ",平均:" + avg);
            }
        }
    }

    //换一种写法
    //对价格在 200 到 500 之间的酒店,
    //按照品牌统计总数,最小价格,最大价格,总价格之和,平均价格
    //并且查询的结果数据,按照最小价格升序排列
    @Test
    void aggregationTest() throws IOException {
        SearchRequest request = new SearchRequest("myhotel");
        request.source().query(QueryBuilders.rangeQuery("price").gte(200).lte(500));
        //表示不要返回文档记录数据
        request.source().size(0);
        TermsAggregationBuilder termsAggregationBuilder =
                //给统计聚合的字段,自定义一个字段名称,后面需要使用
                AggregationBuilders.terms("brandAgg").field("brand")
                        //查询结果按照 minAgg 统计值升序排列
                        .order(BucketOrder.aggregation("minAgg", true));

        MinAggregationBuilder minAgg = AggregationBuilders.min("minAgg").field("price");
        MaxAggregationBuilder maxAgg = AggregationBuilders.max("maxAgg").field("price");
        SumAggregationBuilder sumAgg = AggregationBuilders.sum("sumAgg").field("price");
        AvgAggregationBuilder avgAgg = AggregationBuilders.avg("avgAgg").field("price");

        termsAggregationBuilder.subAggregation(minAgg);
        termsAggregationBuilder.subAggregation(maxAgg);
        termsAggregationBuilder.subAggregation(sumAgg);
        termsAggregationBuilder.subAggregation(avgAgg);

        //表示返回聚合统计数据,默认情况下按照统计数量的倒序排列
        request.source().aggregation(termsAggregationBuilder);
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);
        //获取返回的聚合数统计数据
        Aggregations aggregations = response.getAggregations();
        Terms brandAgg = aggregations.get("brandAgg");
        List<? extends Terms.Bucket> buckets = brandAgg.getBuckets();
        if (buckets.size() > 0) {
            for (Terms.Bucket bucket : buckets) {
                String brandName = bucket.getKeyAsString();
                long docCount = bucket.getDocCount();
                //获取子聚合统计数据
                Aggregations bucketAgg = bucket.getAggregations();
                Min min = bucketAgg.get("minAgg");
                Max max = bucketAgg.get("maxAgg");
                Sum sum = bucketAgg.get("sumAgg");
                Avg avg = bucketAgg.get("avgAgg");
                System.out.println("brand:" + brandName + ",count:" + docCount +
                        ",最小:" + min.getValueAsString() +
                        ",最大:" + max.getValueAsString() +
                        ",总和:" + sum.getValueAsString() +
                        ",平均:" + avg.getValueAsString());
            }
        }
    }
}

本盘博客的源代码下载地址为:https://files.cnblogs.com/files/blogs/699532/springboot_elasticsearch.zip

posted @ 2023-10-19 14:37  乔京飞  阅读(12518)  评论(0编辑  收藏  举报