ES QueryDSL以及Java中使用&matchQuery和termQuery的区别

  参考:https://www.wenjiangs.com/doc/iwlst1pcp

1. DSL简单介绍

官方介绍如下:

  Elasticsearch provides a full Query DSL (Domain Specific Language) based on JSON to define queries. Think of the Query DSL as an AST (Abstract Syntax Tree) of queries, consisting of two types of clauses:

Leaf query clauses
  Leaf query clauses look for a particular value in a particular field, such as the match, term or range queries. These queries can be used by themselves.
Compound query clauses
  Compound query clauses wrap other leaf or compound queries and are used to combine multiple queries in a logical fashion (such as the bool or dis_max query), or to alter their behaviour (such as the constant_score query).
  Query clauses behave differently depending on whether they are used in query context or filter context.

2.数据构造

1. 创建索引类型

1. 创建一个账号索引,字段如下:

PUT /accounts
{
    "mappings": {
        "properties": {
            "userid": {
                "type": "long"
            },
            "username": {
                "type": "keyword"
            },
            "fullname": {
                "type": "text"
            },
            "sex": {
                "type": "double"
            },
            "birth": {
                "type": "date"
            }
        }
    }
}

2. 创建一个订单索引

PUT /orders
{
    "mappings": {
        "properties": {
            "orderid": {
                "type": "long"
            },
            "ordernum": {
                "type": "keyword"
            },
            "username": {
                "type": "keyword"
            },
            "description": {
                "type": "text"
            },
            "createTime": {
                "type": "date"
            },
            "amount": {
                "type": "double"
            }
        }
    }
}

2. 查看索引字段

liqiang@root MINGW64 ~/Desktop
$ curl -X GET http://localhost:9200/accounts/_mapping?pretty=true
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   375  100   375    0     0  12096      0 --:--:-- --:--:-- --:--:--  366k{
  "accounts" : {
    "mappings" : {
      "properties" : {
        "birth" : {
          "type" : "date"
        },
        "fullname" : {
          "type" : "text"
        },
        "sex" : {
          "type" : "double"
        },
        "userid" : {
          "type" : "long"
        },
        "username" : {
          "type" : "keyword"
        }
      }
    }
  }
}


liqiang@root MINGW64 ~/Desktop
$ curl -X GET http://localhost:9200/orders/_mapping?pretty=true
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   448  100   448    0     0  14451      0 --:--:-- --:--:-- --:--:--  437k{
  "orders" : {
    "mappings" : {
      "properties" : {
        "amount" : {
          "type" : "double"
        },
        "createTime" : {
          "type" : "date"
        },
        "description" : {
          "type" : "text"
        },
        "orderid" : {
          "type" : "long"
        },
        "ordernum" : {
          "type" : "keyword"
        },
        "username" : {
          "type" : "keyword"
        }
      }
    }
  }
}

3. 创建十条数据

1.创建用户数据

    private static void createDocument() throws UnknownHostException, IOException, InterruptedException {
        // on startup
        Settings settings = Settings.builder().put("cluster.name", "my-application").build();
        TransportClient client = new PreBuiltTransportClient(settings)
                .addTransportAddress(new TransportAddress(InetAddress.getByName("127.0.0.1"), 9300));

        for (int i = 0; i < 10; i++) {
            XContentBuilder builder = XContentFactory.jsonBuilder().startObject().field("username", "zhangsan" + i)
                    .field("fullname", "张三" + i).field("sex", i % 2 == 0 ? 1 : 2).field("userid", (i + 1))
                    .field("birth", new Date()).endObject();
            // 存到users索引中的user类型中
            IndexResponse response = client.prepareIndex("accounts", "_doc").setSource(builder).get();

            // 打印保存信息
            String _id = response.getId();
            System.out.println("_id " + _id);

            Thread.sleep(1 * 1000);
        }

        // on shutdown
        client.close();
    }

结果:

_id BpeN0nMBntNcepW152XL
_id B5eN0nMBntNcepW17WVO
_id CJeN0nMBntNcepW18mWF
_id CZeN0nMBntNcepW192XD
_id CpeN0nMBntNcepW1_GXZ
_id C5eO0nMBntNcepW1AWWe
_id DJeO0nMBntNcepW1BmUf
_id DZeO0nMBntNcepW1CmXE
_id DpeO0nMBntNcepW1D2Xh
_id D5eO0nMBntNcepW1FGVL

 

在kibana中使用Discover搜索数据如下:

2.创建订单数据 

    private static void createDocument() throws UnknownHostException, IOException, InterruptedException {
        // on startup
        Settings settings = Settings.builder().put("cluster.name", "my-application").build();
        TransportClient client = new PreBuiltTransportClient(settings)
                .addTransportAddress(new TransportAddress(InetAddress.getByName("127.0.0.1"), 9300));

        for (int i = 0; i < 10; i++) {
            XContentBuilder builder = XContentFactory.jsonBuilder().startObject().field("amount", i)
                    .field("createTime", new Date()).field("description", "订单描述" + i).field("orderid", (i + 1))
                    .field("ordernum", "order" + i).field("username", "zhangsan" + (i % 5)).endObject();
            // 存到users索引中的user类型中
            IndexResponse response = client.prepareIndex("orders", "_doc").setSource(builder).get();

            // 打印保存信息
            String _id = response.getId();
            System.out.println("_id " + _id);

            Thread.sleep(1 * 1000);
        }

        // on shutdown
        client.close();
    }

结果:

_id EJfo0nMBntNcepW15mUP
_id EZfo0nMBntNcepW16mW3
_id Epfo0nMBntNcepW172VR
_id E5fo0nMBntNcepW182Xr
_id FJfo0nMBntNcepW1-WXO
_id FZfo0nMBntNcepW1_mU5
_id Fpfp0nMBntNcepW1AmV2
_id F5fp0nMBntNcepW1BmXi
_id GJfp0nMBntNcepW1C2VR
_id GZfp0nMBntNcepW1D2WO

 

kibana查看数据:

(1)kibana的Management-》Index patterns-》Create index pattern

 

 (2)Discover 查看数据

4. 创建9条news数据

(1)字段映射如下 =content字段采用ik分词器进行分词

{properties={creator={type=text, fields={keyword={ignore_above=256, type=keyword}}}, createTime={type=date}, description={type=double}, id={type=long}, title={search_analyzer=ik_smart, analyzer=ik_max_word, type=text}, type={type=text, fields={keyword={ignore_above=256, type=keyword}}}, content={search_analyzer=ik_smart, analyzer=ik_max_word, type=text}}}

(2) 数据如下:

{"creator":"creator1","createTime":"2020-08-27T02:52:24.491Z","type":"java","title":"java记录","content":"这里是java记录"}
{"creator":"creator2","createTime":"2020-08-27T02:52:31.677Z","type":"vue","title":"vue记录","content":"这里是vue记录"}
{"creator":"creator3","createTime":"2020-08-27T02:52:31.915Z","type":"js","title":"js记录","content":"这里是js记录"}
{"creator":"creator4","createTime":"2020-08-27T02:52:32.067Z","type":"es","title":"js记录","content":"这里是js记录"}
{"creator":"creator7","createTime":"2020-08-27T02:52:33.733Z","type":"vue","title":"vue记录","content":"这里是vue记录"}
{"creator":"creator6","createTime":"2020-08-27T02:52:32.395Z","type":"java","title":"java记录","content":"这里是java记录"}
{"creator":"creator0","createTime":"2020-08-27T02:52:14.353Z","type":"杂文","title":"杂文记录","content":"这里是杂文记录"}
{"creator":"creator5","createTime":"2020-08-27T02:52:32.202Z","type":"杂文","title":"杂文记录","content":"这里是杂文记录"}
{"creator":"creator8","createTime":"2020-08-27T02:52:34.030Z","type":"js","title":"js记录","content":"JS是真的强"}

3. kibana中使用DSL查询

 1.query and filter

The fullname field contains the word 张三
The username field contains the word "张三2"
The term field contains the exact value 1
The birth field contains a date from 1 Jan 2015 onwards

GET /_search
{
  "query": { 
    "bool": { 
      "must": [
        { "match": { "fullname":   "张三"        }},
        { "match": { "username": "zhangsan2" }}
      ],
      "filter": [ 
        { "term":  { "sex": 1 }},
        { "range": { "birth": { "gte": "2015-01-01" }}}
      ]
    }
  }
}

结果:

{
  "took" : 20,
  "timed_out" : false,
  "_shards" : {
    "total" : 6,
    "successful" : 6,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 2.0854702,
    "hits" : [
      {
        "_index" : "accounts",
        "_type" : "_doc",
        "_id" : "CJeN0nMBntNcepW18mWF",
        "_score" : 2.0854702,
        "_source" : {
          "username" : "zhangsan2",
          "fullname" : "张三2",
          "sex" : 1,
          "userid" : 3,
          "birth" : "2020-08-09T09:29:44.832Z"
        }
      }
    ]
  }
}

。。。

4.Java中DSL查询

=====下面的query都是基于orders、news索引=====

1. matchAllQuery 查询所有-文档的分数都为1.0F

    private static void matchAllQuery() throws UnknownHostException {
        // on startup
        Settings settings = Settings.builder().put("cluster.name", "my-application").build();
        TransportClient client = new PreBuiltTransportClient(settings)
                .addTransportAddress(new TransportAddress(InetAddress.getByName("127.0.0.1"), 9300));

        // 1.构造查询结果
        MatchAllQueryBuilder matchAllQuery = QueryBuilders.matchAllQuery();
        SearchResponse searchResponse = client.prepareSearch("orders").setTypes("_doc").setQuery(matchAllQuery).get();
        // 2. 打印查询结果
        SearchHits hits = searchResponse.getHits(); // 获取命中次数,查询结果有多少对象
        System.out.println("查询结果有:" + hits.getTotalHits() + "条");
        Iterator<SearchHit> iterator = hits.iterator();
        while (iterator.hasNext()) {
            SearchHit searchHit = iterator.next(); // 每个查询对象
            System.out.println(searchHit.getSourceAsString()); // 获取字符串格式打印
        }

        // on shutdown
        client.close();
    }

 结果:

查询结果有:10 hits条
{"amount":0,"createTime":"2020-08-09T11:09:05.259Z","description":"订单描述0","orderid":1,"ordernum":"order0","username":"zhangsan0"}
{"amount":1,"createTime":"2020-08-09T11:09:06.611Z","description":"订单描述1","orderid":2,"ordernum":"order1","username":"zhangsan1"}
{"amount":2,"createTime":"2020-08-09T11:09:07.789Z","description":"订单描述2","orderid":3,"ordernum":"order2","username":"zhangsan2"}
{"amount":3,"createTime":"2020-08-09T11:09:08.966Z","description":"订单描述3","orderid":4,"ordernum":"order3","username":"zhangsan3"}
{"amount":4,"createTime":"2020-08-09T11:09:10.468Z","description":"订单描述4","orderid":5,"ordernum":"order4","username":"zhangsan4"}
{"amount":5,"createTime":"2020-08-09T11:09:11.605Z","description":"订单描述5","orderid":6,"ordernum":"order5","username":"zhangsan0"}
{"amount":6,"createTime":"2020-08-09T11:09:12.692Z","description":"订单描述6","orderid":7,"ordernum":"order6","username":"zhangsan1"}
{"amount":7,"createTime":"2020-08-09T11:09:13.823Z","description":"订单描述7","orderid":8,"ordernum":"order7","username":"zhangsan2"}
{"amount":8,"createTime":"2020-08-09T11:09:14.958Z","description":"订单描述8","orderid":9,"ordernum":"order8","username":"zhangsan3"}
{"amount":9,"createTime":"2020-08-09T11:09:16.043Z","description":"订单描述9","orderid":10,"ordernum":"order9","username":"zhangsan4"}

2. Full text queries 全文搜索==会进行分词(主要针对text类型的字段,会对查询语句进行分词分析后搜索)

高级别的全文搜索通常用于在全文字段(例如:一封邮件的正文)上进行全文搜索。它们了解如何分析查询的字段,并在执行之前将每个字段的分析器(或搜索分析器)应用于查询字符串

1.match query 匹配查询

  用于执行全文查询的标准查询,包括模糊匹配和词组或邻近程度的查询

匹配查询的行为受到两个参数的控制:

(1)operator:表示单个字段如何匹配查询条件的分词。默认是 or,可选项为and。例如:

GET /_search
{
    "query": {
        "match" : {
            "message" : "this is a test"
        }
    }
}

默认为or,伪代码可以理解为:

if (doc.message contains "this" or doc.message contains "is" or doc.message contains "a" or doc.message contains "test") 
return doc

如果为and,伪代码可以理解为:

if (doc.message contains "this" and doc.message contains "is" and doc.message contains "a" and doc.message contains "test") 
return doc

(2)minimum_should_match:表示字段匹配的数量,可以理解为相似度

例如:

    private static void matchQuery() throws UnknownHostException {
        // on startup
        Settings settings = Settings.builder().put("cluster.name", "my-application").build();
        TransportClient client = new PreBuiltTransportClient(settings)
                .addTransportAddress(new TransportAddress(InetAddress.getByName("127.0.0.1"), 9300));

        QueryBuilder qb = QueryBuilders.matchQuery("content", // field 字段
                "java有点强" // text
        );
        SearchResponse searchResponse = client.prepareSearch("news").setTypes("_doc").setQuery(qb).get();

        // 2. 打印查询结果
        SearchHits hits = searchResponse.getHits(); // 获取命中次数,查询结果有多少对象
        System.out.println("查询结果有:" + hits.getTotalHits() + "条");
        Iterator<SearchHit> iterator = hits.iterator();

        while (iterator.hasNext()) {
            SearchHit searchHit = iterator.next(); // 每个查询对象
            System.out.println(searchHit.getSourceAsString()); // 获取字符串格式打印
        }

        // on shutdown
        client.close();
    }

结果:

查询结果有:3 hits条
{"creator":"creator1","createTime":"2020-08-27T02:52:24.491Z","type":"java","title":"java记录","content":"这里是java记录"}
{"creator":"creator8","createTime":"2020-08-27T02:52:34.030Z","type":"js","title":"js记录","content":"JS是真的强"}
{"creator":"creator6","createTime":"2020-08-27T02:52:32.395Z","type":"java","title":"java记录","content":"这里是java记录"}

 

指定操作符为and,并且设定最小匹配度:

QueryBuilder qb = QueryBuilders.matchQuery("content", "这里是js").operator(Operator.AND).minimumShouldMatch("50%");

结果:

{"creator":"creator3","createTime":"2020-08-27T02:52:31.915Z","type":"js","title":"js记录","content":"这里是js记录"}
{"creator":"creator4","createTime":"2020-08-27T02:52:32.067Z","type":"es","title":"js记录","content":"这里是js记录"}

2. matchPhraseQuery基于彼此邻近搜索词

  match_phrase 查询首先将查询字符串解析成一个词项列表,然后对这些词项进行搜索,但只保留那些包含全部搜索词项,且位置与搜索词项相同的文档。

QueryBuilder qb = QueryBuilders.matchPhraseQuery("content", "这里记录");

结果查询不到数据。

 

可以加slop参数,比如下面设为3则认为词相差在3个位置以内也认为是临近词。

QueryBuilder qb = QueryBuilders.matchPhraseQuery("content", "这里记录").slop(3);

结果:

{"creator":"creator0","createTime":"2020-08-27T02:52:14.353Z","type":"杂文","title":"杂文记录","content":"这里是杂文记录"}
{"creator":"creator5","createTime":"2020-08-27T02:52:32.202Z","type":"杂文","title":"杂文记录","content":"这里是杂文记录"}
{"creator":"creator6","createTime":"2020-08-27T02:52:32.395Z","type":"java","title":"java记录","content":"这里是java记录"}
{"creator":"creator1","createTime":"2020-08-27T02:52:24.491Z","type":"java","title":"java记录","content":"这里是java记录"}
{"creator":"creator2","createTime":"2020-08-27T02:52:31.677Z","type":"vue","title":"vue记录","content":"这里是vue记录"}
{"creator":"creator3","createTime":"2020-08-27T02:52:31.915Z","type":"js","title":"js记录","content":"这里是js记录"}
{"creator":"creator4","createTime":"2020-08-27T02:52:32.067Z","type":"es","title":"js记录","content":"这里是js记录"}
{"creator":"creator7","createTime":"2020-08-27T02:52:33.733Z","type":"vue","title":"vue记录","content":"这里是vue记录"}

3. 多字段查询(multi_match query)

可以用来对多个字段的版本进行匹配查询

        // 第一个参数是text,后面是可变参数的fields
        QueryBuilder qb = QueryBuilders.multiMatchQuery("java和JS真的强", "content", "title");

 

4.查询语句查询(query_string query)

  与lucene查询语句的语法结合的更加紧密的一种查询,允许你在一个查询语句中使用多个 特殊条件关键字(如:AND|OR|NOT )对多个字段进行查询。

        // +表示必须,-表示必须没有
        QueryBuilder qb = QueryBuilders.queryStringQuery("+js -强").field("content");

结果:

{"creator":"creator3","createTime":"2020-08-27T02:52:31.915Z","type":"js","title":"js记录","content":"这里是js记录"}
{"creator":"creator4","createTime":"2020-08-27T02:52:32.067Z","type":"es","title":"js记录","content":"这里是js记录"}

 

3. Term level queries==精确查找,不会进行分词

  通常用于结构化数据,如数字、日期和枚举,而不是全文字段。或者,在分析过程之前,它允许你绘制低级查询。

1.  term query

  Find documents which contain the exact term specified in the field specified.

        TermQueryBuilder termQuery = QueryBuilders.termQuery("orderid", 1);

结果:

查询结果有:1 hits条
{"amount":0,"createTime":"2020-08-09T11:09:05.259Z","description":"订单描述0","orderid":1,"ordernum":"order0","username":"zhangsan0"}

 

补充:termQuery也可以用于text字段,只是作为一个词去查询,不会再次分析查询语句

比如,用"这里是"去搜索,

QueryBuilder qb = QueryBuilders.termQuery("content", "这里是");

结果:

{"creator":"creator0","createTime":"2020-08-27T02:52:14.353Z","type":"杂文","title":"杂文记录","content":"这里是杂文记录"}
{"creator":"creator5","createTime":"2020-08-27T02:52:32.202Z","type":"杂文","title":"杂文记录","content":"这里是杂文记录"}
{"creator":"creator6","createTime":"2020-08-27T02:52:32.395Z","type":"java","title":"java记录","content":"这里是java记录"}
{"creator":"creator1","createTime":"2020-08-27T02:52:24.491Z","type":"java","title":"java记录","content":"这里是java记录"}
{"creator":"creator2","createTime":"2020-08-27T02:52:31.677Z","type":"vue","title":"vue记录","content":"这里是vue记录"}
{"creator":"creator3","createTime":"2020-08-27T02:52:31.915Z","type":"js","title":"js记录","content":"这里是js记录"}
{"creator":"creator4","createTime":"2020-08-27T02:52:32.067Z","type":"es","title":"js记录","content":"这里是js记录"}
{"creator":"creator7","createTime":"2020-08-27T02:52:33.733Z","type":"vue","title":"vue记录","content":"这里是vue记录"}

从结果看出,只查出包含"结果是"这个词的doc,我们分析这里是java记录"分词效果:

POST /_analyze
{
  "analyzer":"ik_max_word",
  "text": "这里是java记录"
}

分析如下:

{
  "tokens" : [
    {
      "token" : "这里是",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "这里",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "是",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "CN_CHAR",
      "position" : 2
    },
    {
      "token" : "java",
      "start_offset" : 3,
      "end_offset" : 7,
      "type" : "ENGLISH",
      "position" : 3
    },
    {
      "token" : "记录",
      "start_offset" : 7,
      "end_offset" : 9,
      "type" : "CN_WORD",
      "position" : 4
    }
  ]
}

 

2. terms query-文档的分数都是1.0F
  Find documents which contain any of the exact terms specified in the field specified.

        TermsQueryBuilder termsQuery = QueryBuilders.termsQuery("orderid", "1", "2");

结果:

查询结果有:2 hits条
{"amount":0,"createTime":"2020-08-09T11:09:05.259Z","description":"订单描述0","orderid":1,"ordernum":"order0","username":"zhangsan0"}
{"amount":1,"createTime":"2020-08-09T11:09:06.611Z","description":"订单描述1","orderid":2,"ordernum":"order1","username":"zhangsan1"}

 

3. range query
  Find documents where the field specified contains values (dates, numbers, or strings) in the range specified.

        RangeQueryBuilder includeUpper = QueryBuilders.rangeQuery("amount").from(5).to(10).includeLower(true)
                .includeUpper(false);

参数解释:

include lower value means that from is gt when false or gte when true
include upper value means that to is lt when false or lte when true

结果:

查询结果有:5 hits条
{"amount":5,"createTime":"2020-08-09T11:09:11.605Z","description":"订单描述5","orderid":6,"ordernum":"order5","username":"zhangsan0"}
{"amount":6,"createTime":"2020-08-09T11:09:12.692Z","description":"订单描述6","orderid":7,"ordernum":"order6","username":"zhangsan1"}
{"amount":7,"createTime":"2020-08-09T11:09:13.823Z","description":"订单描述7","orderid":8,"ordernum":"order7","username":"zhangsan2"}
{"amount":8,"createTime":"2020-08-09T11:09:14.958Z","description":"订单描述8","orderid":9,"ordernum":"order8","username":"zhangsan3"}
{"amount":9,"createTime":"2020-08-09T11:09:16.043Z","description":"订单描述9","orderid":10,"ordernum":"order9","username":"zhangsan4"}

 

上面等价于

        RangeQueryBuilder includeUpper = QueryBuilders.rangeQuery("amount").gte("5").lt("10");

 

4. exists query
   Find documents where the field specified contains any non-null value.

        ExistsQueryBuilder existsQuery = QueryBuilders.existsQuery("createTime");

 

5. prefix query
  Find documents where the field specified contains terms which being with the exact prefix specified.

PrefixQueryBuilder prefixQuery = QueryBuilders.prefixQuery("description", "描述");


6. wildcard query(通配符查询)
  Find documents where the field specified contains terms which match the pattern specified, where the pattern supports single character wildcards (?) and multi-character wildcards (*)

        // description 以描述开始的
        WildcardQueryBuilder wildcardQuery = QueryBuilders.wildcardQuery("description", "描述*");

 

7 regexp query (正则查询)
  Find documents where the field specified contains terms which match the regular expression specified.

        RegexpQueryBuilder regexpQuery = QueryBuilders.regexpQuery("ordernum", "order.*");

 

8. fuzzy query (模糊查询)
  Find documents where the field specified contains terms which are fuzzily similar to the specified term. Fuzziness is measured as a Levenshtein edit distance of 1 or 2.

FuzzyQueryBuilder fuzzyQuery = QueryBuilders.fuzzyQuery("ordernum", "order");

 

9. ids query
  Find documents with the specified type and IDs.

IdsQueryBuilder addIds = QueryBuilders.idsQuery().addIds("EJfo0nMBntNcepW15mUP", "EZfo0nMBntNcepW16mW3");

4. compound queries 复合查询

1. constant_score query  改变查询结果的分数
  A query which wraps another query, but executes it in filter context. All matching documents are given the same “constant” _score.

        ConstantScoreQueryBuilder boost = QueryBuilders
                .constantScoreQuery(QueryBuilders.termQuery("ordernum", "order4")).boost(2F);

结果:

查询结果有:1 hits条
2.0  {"amount":4,"createTime":"2020-08-27T13:42:41.559Z","description":"订单描述4","orderid":5,"ordernum":"order4","username":"zhangsan4"}

 

2. bool query  
  The default query for combining multiple leaf or compound query clauses, as must, should, must_not, or filter clauses. The must and should clauses have their scores combined — the more matching clauses, the better — while the must_not and filter clauses are executed in filter context. 

  组合多个叶子并发查询或复合查询条件的默认查询类型,例如must, should, must_not, 以及 filter 条件。 在 must 和 should 子句他们的分数相结合-匹配条件越多,预期越好-而 must_not 和 filter 子句在过滤器上下文中执行。

must:所有的语句都 必须(must) 匹配,与mysql中的 AND 等价, 并且参与计算分值。

must_nout:所有的语句都 不能(must not) 匹配,与mysql的 NOT 等价。不会影响评分;它的作用只是将不相关的文档排除

filter:返回的文档必须满足filter子句的条件。但是不会像Must一样参与计算分值。filter返回的文档的_score都是0。

should:至少有一个语句要匹配,与 mysql的OR 等价。在一个Bool查询中,如果没有must或者filter,有一个或者多个should子句,那么只要满足一个就可以返回;如果定义了must或者filter,should会增加权重,提高分数

 

  如果一个查询既有filter又有should,那么至少包含一个should子句。bool查询也支持禁用协同计分选项disable_coord。一般计算分值的因素取决于所有的查询条件。bool查询也是采用more_matches_is_better的机制,因此满足must和should子句的文档将会合并起来计算分值。

例如:用news索引使用filter查询所有-分数清零,然后用should查询类型为java的,提高权重

        BoolQueryBuilder filter = QueryBuilders.boolQuery();
        filter.filter(QueryBuilders.matchAllQuery());// 查询所有,分数为0
        filter.should(QueryBuilders.termQuery("type", "java")); // 类型为java的分数提升 

例如: 用orders索引

1) 使用must-termsQuery的分数都为0

        BoolQueryBuilder filter = QueryBuilders.boolQuery()
                .must(QueryBuilders.termsQuery("ordernum", "order4", "order5", "order6"));

结果: 

查询结果有:3 hits条
1.0 {"amount":4,"createTime":"2020-08-27T13:42:41.559Z","description":"订单描述4","orderid":5,"ordernum":"order4","username":"zhangsan4"}
1.0 {"amount":5,"createTime":"2020-08-27T13:42:42.662Z","description":"订单描述5","orderid":6,"ordernum":"order5","username":"zhangsan0"}
1.0 {"amount":6,"createTime":"2020-08-27T13:42:43.932Z","description":"订单描述6","orderid":7,"ordernum":"order6","username":"zhangsan1"}

(2)  用should提升username为zhangsan0增加权重1F,username为zhangsan1增加权重0.1F。也就是实现zhangsan0-1-4的排序

        BoolQueryBuilder filter = QueryBuilders.boolQuery()
                .must(QueryBuilders.termsQuery("ordernum", "order4", "order5", "order6"))
                .should(QueryBuilders.termQuery("username", "zhangsan0")) // 默认是1F
                .should(QueryBuilders.termQuery("username", "zhangsan1").boost(0.1F));

结果:

查询结果有:3 hits条

2.4816046{"amount":5,"createTime":"2020-08-27T13:42:42.662Z","description":"订单描述5","orderid":6,"ordernum":"order5","username":"zhangsan0"}

1.1481605{"amount":6,"createTime":"2020-08-27T13:42:43.932Z","description":"订单描述6","orderid":7,"ordernum":"order6","username":"zhangsan1"}

1.0{"amount":4,"createTime":"2020-08-27T13:42:41.559Z","description":"订单描述4","orderid":5,"ordernum":"order4","username":"zhangsan4"}

 

补充:评分计算

   bool 查询会为每个文档计算相关度评分 _score ,再将所有匹配的 must 和 should 语句的分数 _score 求和,最后除以 must 和 should 语句的总数。

补充:控制精度

  所有 must 语句必须匹配,所有 must_not 语句都必须不匹配,但有多少 should 语句应该匹配呢?默认情况下,没有 should 语句是必须匹配的,只有一个例外:那就是当没有 must 语句的时候,至少有一个 should 语句必须匹配。

  我们可以通过 minimum_should_match 参数控制需要匹配的 should 语句的数量,它既可以是一个绝对的数字,又可以是个百分比

例如:

GET /my_index/my_type/_search
{
  "query": {
    "bool": {
      "should": [
        { "match": { "title": "brown" }},
        { "match": { "title": "fox"   }},
        { "match": { "title": "dog"   }}
      ],
      "minimum_should_match": 2 
    }
  }
}

   minimum_should_match也可以用百分比表示,例如"70%"

 

3. dis_max query 混合查询

  支持多并发查询的查询,并可返回与任意查询条件子句匹配的任何文档类型。与 bool 查询可以将所有匹配查询的分数相结合使用的方式不同的是,dis_max 查询只使用最佳匹配查询条件的分数。

        DisMaxQueryBuilder tieBreaker = QueryBuilders.disMaxQuery().add(QueryBuilders.termQuery("ordernum", "order0"))
                .add(QueryBuilders.termQuery("ordernum", "order1")).boost(1.2f).tieBreaker(0.7f);

结果:

{"amount":0,"createTime":"2020-08-09T11:09:05.259Z","description":"订单描述0","orderid":1,"ordernum":"order0","username":"zhangsan0"}
{"amount":1,"createTime":"2020-08-09T11:09:06.611Z","description":"订单描述1","orderid":2,"ordernum":"order1","username":"zhangsan1"}

4.  boosting query  改变权重

  Return documents which match a positive query, but reduce the score of documents which also match a negative query.

  希望包含了某项内容的结果不是不出现,而是排序靠后。boostingQuery的第一个参数是增加权重,第二个参数是降低权重。

        BoostingQueryBuilder negativeBoost = QueryBuilders
                .boostingQuery(QueryBuilders.termsQuery("orderid", "1", "2"), QueryBuilders.termQuery("orderid", "1"))
                .negativeBoost(0.2f);

结果:

查询结果有:2 hits条
{"amount":1,"createTime":"2020-08-09T11:09:06.611Z","description":"订单描述1","orderid":2,"ordernum":"order1","username":"zhangsan1"}
{"amount":0,"createTime":"2020-08-09T11:09:05.259Z","description":"订单描述0","orderid":1,"ordernum":"order0","username":"zhangsan0"}

5. Joining queries

1. nested query (嵌套查询)
  Documents may contains fields of type nested. These fields are used to index arrays of objects, where each object can be queried (with the nested query) as an independent document.

 

2. has_child and has_parent queries  父子查询

  A parent-child relationship can exist between two document types within a single index. The has_child query returns parent documents whose child documents match the specified query, while the has_parent query returns child documents whose parent document matches the specified query.

 

6. Specialized queries

1.more_like_this query 相似度查询=可以用于相关文档推荐等场景
   This query finds documents which are similar to the specified text, document, or collection of documents.

  这个查询能检索到与指定文本、文档或者文档集合相似的文档。

 (1) 根据字段以及关键字进行相似度搜索

        String[] fields = { "content" }; // fields
        String[] texts = { "这里是java记录" }; // 需要分析的文本
        MoreLikeThisQueryBuilder qb = QueryBuilders.moreLikeThisQuery(fields, texts, null).minTermFreq(1)
                .maxQueryTerms(12).minimumShouldMatch("70%");

(2)第二种是根据ES中现有的文档来进行相似度匹配

        // 根据ES中的文档进行相似度查询,第一个参数是index,第二个是文档ID
        Item item = new Item("news", "CyLULXQBRGkNEPJ3ya9x");
        Item[] items = { item };
        MoreLikeThisQueryBuilder qb = QueryBuilders.moreLikeThisQuery(items).minTermFreq(1).maxQueryTerms(12)
                .minimumShouldMatch("70%");

重要的参数解释:

 1) 构造方法上的三个参数:

    /**
     * A more like this query that finds documents that are "like" the provided texts or documents
     * which is checked against the fields the query is constructed with.
     *
     * @param fields the field names that will be used when generating the 'More Like This' query.
     * @param likeTexts the text to use when generating the 'More Like This' query.
     * @param likeItems the documents to use when generating the 'More Like This' query.
     */
    public static MoreLikeThisQueryBuilder moreLikeThisQuery(String[] fields, String[] likeTexts, Item[] likeItems) {
        return new MoreLikeThisQueryBuilder(fields, likeTexts, likeItems);
    }

fields 是匹配的字段,默认是所有字段

likeTexts  是匹配的文本

likeItems  ES中的文档信息,传递此参数会根据文档信息进行相似度查询

2) 匹配参数如下:

max_query_terms:The maximum number of query terms that will be selected. Increasing this value gives greater accuracy at the expense of query execution speed. Defaults to 25.

min_term_freq:The minimum term frequency below which the terms will be ignored from the input document. Defaults to 2.

min_doc_freq:The minimum document frequency below which the terms will be ignored from the input document. Defaults to 5.

max_doc_freq:The maximum document frequency above which the terms will be ignored from the input document. This could be useful in order to ignore highly frequent words such as stop words. Defaults to unbounded (0).

min_word_length:The minimum word length below which the terms will be ignored. The old name min_word_len is deprecated. Defaults to 0.

max_word_length:The maximum word length above which the terms will be ignored. The old name max_word_len is deprecated. Defaults to unbounded (0).

stop_words:An array of stop words. Any word in this set is considered "uninteresting" and ignored. If the analyzer allows for stop words, you might want to tell MLT to explicitly ignore them, as for the purposes of document similarity it seems reasonable to assume that "a stop word is never interesting".

analyzer:The analyzer that is used to analyze the free form text. Defaults to the analyzer associated with the first field in fields.

 

minimum_should_match:After the disjunctive query has been formed, this parameter controls the number of terms that must match. The syntax is the same as the minimum should match. (Defaults to "30%").

 

补充:ES提供了一个类LevenshteinDistance,可以作为两个字符序列之间差异的字符串度量标准,Levenshtein Distance是将一个单词转换为另一个单词所需的单字符编辑(插入、删除或替换)的最小数量。

        LevenshteinDistance ld = new LevenshteinDistance();
        float distance = ld.getDistance("这里是java记录", "这里是java记录");
        System.out.println(distance);
        float distance2 = ld.getDistance("这里是java记录", "这里是js记录");
        System.out.println(distance2);

结果:

1.0
0.6666666

 

2.percolate query
  This query finds percolator queries based on documents.


3.wrapper query
  A query that accepts other queries as json or yaml string.

 

补充:matchAllQuery、termsQuery 返回的文档的分数score都是1.

 

posted @ 2020-08-28 15:15  QiaoZhi  阅读(2438)  评论(0编辑  收藏  举报