elasticsearch-高级进阶篇6高级检索-多字段检索mutil_match

1、多字段检索

1.1 多字段检索（multi_match）是啥？

概念：多字段检索，是组合查询的另一种形态，考试的时候如果考察多字段检索，并不一定必须使用multi_match，使用bool query，只要结果正确亦可，除非题目中明确要求（目前没有强制要求过）

语法：

GET <index>/_search
{
  "query": {
    "multi_match": {
      "query": "<query keyword>",
      "type": "<multi_match_type>",
      "fields": [
        "<field_a>",
        "<field_b>"
      ]
    }
  }
}

1.2 multi_match和_source区别

multi_match：从哪些字段中检索，指的是查询条件
_source：查询的结果包含哪些字段，指的是元数据

打个形象的比喻，在MySQL中，Select * From Table Where a=x and b = x，那么multi_match即指的是a和b两个字段，而_source指的是*。

1.3 best_fields：

1.3.1 概念：

侧重于“字段”维度，单个字段的得分权重大，对于同一个query，单个field匹配更多的term，则优先排序。

1.3.2 用法：

注意，best_fields是multi_match中type的默认值

GET product/_search
{
  "query": {
    "multi_match" : {
      "query":      "super charge",
      "type":       "best_fields", // 默认
      "fields":     [ "name^2", "desc" ],
      "tie_breaker": 0.3
    }
  }
}

1.3.3 案例

针对于以下查询，包含两个查询条件：分别是条件1和条件2

GET product/_search
{
  "query": {
    "dis_max": {
      "queries": [
        { "match": { "name": "chiji shouji" }},   #条件1
        { "match": { "desc": "chiji shouji" }}        #条件2
      ]
    }
  }
}

假设上述查询的执行得到以下结果，best_fields策略强调hits中单个字段的评分权重。打个比方：每一条hit代表一个奥运会的参加国，每个字段代表该国家的参赛运动员，但是限定每个国家只能派出一名运动员，其成绩就代表该国家的成绩，最后以该运动员的成绩代表国家进行排名。所谓“best_fields”就是说最好的字段嘛，用最好的字段的评分代表当前文档的最终评分，即侧重字段权重。在这个例子中，多个查询条件并未起到关键性作用。

1.3.4 tie_breaker参数

在best_fields策略中给其他剩余字段设置的权重值，取值范围 [0,1]，其中 0 代表使用 dis_max 最佳匹配语句的普通逻辑，1表示所有匹配语句同等重要。最佳的精确值需要根据数据与查询调试得出，但是合理值应该与零接近（处于 0.1 - 0.4 之间），这样就不会颠覆 dis_max （Disjunction Max Query）最佳匹配性质的根本。

在上面例子中，如果一个国家仅仅由一个运动员的成绩来决定，显然不是很有代表性，因为一个国家可能整体实力很弱，但是有一个运动员（假设叫做阿尔法）就是特别的出类拔萃，世界第一！但是其他人都很弱，这时他就不能代表整个国家的实力了。反而可能另一个国家，虽然国内成绩最好的运动员没有“阿尔法”的成绩好，但是这个国家包揽了世界的第二名到第十名，并且实力比较接近，那这样这个国家的整体实力仍然是可以排第一的。所以我们不应该让第一名完全代表一个国家的成绩，一个更好的做法是：每个国家的最终成绩由所有运动员的成绩经过计算得来，每个运动员的成绩都可能影响总成绩，但是这个国家排名第一的运动员的成绩占的权重最大。这种做法更容易凸显一个国家的整体实力，这个整体实力就等价于我们搜索结果排名中的相关度。

用法：

GET product/_search
{
  "query": {
    "dis_max": {
      "queries": [
        { "match": { "name": "super charge" }},
        { "match": { "desc": "super charge" }}
      ],
      "tie_breaker": 0.3 # 代表次要评分字段的权重是 0.3
    }
  }
}

1.3.5 类比

以下两个查询等价

查询1

GET product/_search
{
  "query": {
    "dis_max": {
      "queries": [
        {
          "match": {
            "name": {
              "query": "chiji shouji",
              "boost": 2    # name字段评分两倍权重
            }
          }
        },
        {
          "match": {
            "desc": "chiji shouji"
          }
        }
      ],
      "tie_breaker": 0.3
    }
  }
}

查询2

GET product/_search
{
  "query": {
    "multi_match" : {
      "query":      "super charge",
      "type":       "best_fields", // 默认
      "fields":     [ "name^2", "desc" ], # name字段评分两倍权重
      "tie_breaker": 0.3
    }
  }
}

1.4 most_fields：

1.4.1 概念

侧重于“查询”维度，单个查询条件的得分权重大，如果一次请求中，对于同一个doc，匹配到某个term的field越多，则越优先排序。

1.4.2 类比

以下两个查询脚本等价

查询1：

# 下面查询中包含两个查询条件
GET product/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "name": "chiji shouji"
          }
        },
        {
          "match": {
            "desc": "chiji shouji"
          }
        }
      ]
    }
  }
}

查询2

GET product/_search
{
  "query": {
    "multi_match": {
      "query": "chiji shouji",
      "type": "most_fields",
      "fields": [
        "name",
        "desc"
      ]
    }
  }
}

1.5 cross_fields:

注意：理解“cross_fields”的概念之前，需要对ES的评分规则有基本的了解，戳：评分，学习ES评分的基本原理

1.5.1 概念

将任何与任一查询匹配的文档作为结果返回，但只将最佳匹配的评分作为查询的评分结果返回

1.5.2 用法

以下查询语义：

吴必须包含在 name.姓或者 name.名里
磊必须包含在 name.姓或者 name.名里

GET teacher/_search
{
  "query": {
    "multi_match" : {
      "query":      "吴磊",
      "type":       "cross_fields",
      "fields":     [ "name.姓", "name.名" ],
      "operator":   "or"
    }
  }
}

1.5.3 案例
假设我们有如下“teacher”索引，索引中包含了“name”字段，包含“姓”和“名”两个field（案例中使用中文为方便观察和理解，切勿在生产环境中使用中文命名，一定要遵循命名规范）。

POST /teacher/_bulk
{ "index": { "_id": "1"} }
{ "name" : {"姓" : "吴", "名" : "磊"} }
{ "index": { "_id": "2"} }    
{ "name" : {"姓" : "连", "名" : "鹏鹏"} }
{ "index": { "_id": "3"} }
{ "name" : { "姓" : "张","名" : "明明"} }
{ "index": { "_id": "4"} }
{ "name" : { "姓" : "周","名" : "志志"} }
{ "index": { "_id": "5"} }
{ "name" : {"姓" : "吴", "名" : "亦凡"} }
{ "index": { "_id": "6"} }
{ "name" : {"姓" : "吴", "名" : "京"} }
{ "index": { "_id": "7"} }
{ "name" : {"姓" : "吴", "名" : "彦祖"} }
{ "index": { "_id": "8"} }
{ "name" : {"姓" : "帅", "名" : "吴"} }
{ "index": { "_id": "9"} }
{ "name" : {"姓" : "连", "名" : "磊"} }
{ "index": { "_id": "10"} }
{ "name" : {"姓" : "周", "名" : "磊"} }
{ "index": { "_id": "11"} }
{ "name" : {"姓" : "张", "名" : "磊"} }
{ "index": { "_id": "12"} }
{ "name" : {"姓" : "马", "名" : "磊"} }
#{ "index": { "_id": "13"} }
#{ "name" : {"姓" : "诸葛", "名" : "吴磊"} }

我们执行以上代码创建teacher索引，并且执行以下查询语句

# 语义： 默认分词器的对“吴磊”的分词结果为“吴”和“磊”
# name.姓 中包含 “吴” 或者 “磊”  
# OR  
# name.名 中包含 “吴” 或者 “磊” 
# 如果设置了"operator": "and"，则中间 OR 的关系变为 AND
GET teacher/_search
{
  "query": {
    "multi_match": {
      "query": "吴 磊",
      "type": "most_fields",
      "fields": [
        "name.姓",
        "name.名"
      ]
      // ,"operator": "and"
    }
  }
}

根据上面查询的语义，我们期望的结果是：姓为“吴”并且名为“磊”的doc评分最高，然而结果却如下：

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 9,
      "relation" : "eq"
    },
    "max_score" : 2.4548545,
    "hits" : [
      {
        "_index" : "teacher",
        "_type" : "_doc",
        "_id" : "8",
        "_score" : 2.4548545,
        "_source" : {
          "name" : {
            "姓" : "帅",
            "名" : "吴"
          }
        }
      },
      {
        "_index" : "teacher",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 2.03873,
        "_source" : {
          "name" : {
            "姓" : "吴",
            "名" : "磊"
          }
        }
      },
      {
        "_index" : "teacher",
        "_type" : "_doc",
        "_id" : "5",
        "_score" : 1.060872,
        "_source" : {
          "name" : {
            "姓" : "吴",
            "名" : "亦凡"
          }
        }
      },
      {
        "_index" : "teacher",
        "_type" : "_doc",
        "_id" : "6",
        "_score" : 1.060872,
        "_source" : {
          "name" : {
            "姓" : "吴",
            "名" : "京"
          }
        }
      },
      {
        "_index" : "teacher",
        "_type" : "_doc",
        "_id" : "7",
        "_score" : 1.060872,
        "_source" : {
          "name" : {
            "姓" : "吴",
            "名" : "彦祖"
          }
        }
      },
      {
        "_index" : "teacher",
        "_type" : "_doc",
        "_id" : "9",
        "_score" : 0.977858,
        "_source" : {
          "name" : {
            "姓" : "连",
            "名" : "磊"
          }
        }
      },
      {
        "_index" : "teacher",
        "_type" : "_doc",
        "_id" : "10",
        "_score" : 0.977858,
        "_source" : {
          "name" : {
            "姓" : "周",
            "名" : "磊"
          }
        }
      },
      {
        "_index" : "teacher",
        "_type" : "_doc",
        "_id" : "11",
        "_score" : 0.977858,
        "_source" : {
          "name" : {
            "姓" : "张",
            "名" : "磊"
          }
        }
      },
      {
        "_index" : "teacher",
        "_type" : "_doc",
        "_id" : "12",
        "_score" : 0.977858,
        "_source" : {
          "name" : {
            "姓" : "马",
            "名" : "磊"
          }
        }
      }
    ]
  }
}

View Code

上面结果显示，名叫“帅磊”的doc排在了最前面，即便我们使用best_fields策略结果也是“帅磊”评分最高，因为导致这个结果的原因和使用哪种搜索策略并无关系。这些然不是我们希望的结果。那么导致上述问题的原因是什么呢？看以下三条基本的评分规则：

评分基本规则：

词频（TF term frequency ）：关键词在每个doc中出现的次数，词频越高，评分越高
反词频（ IDF inverse doc frequency）：关键词在整个索引中出现的次数，反词频越高，评分越低
每个doc的长度，越长相关度评分越低

分析结果：

在上述案例中，“吴磊”作为预期结果，其中“吴”字作为姓氏是非常常见的，“磊”作为名也是非常常见的。反应在索引中，他们的IDF都是非常高的，而反词频越高则评分越低，因此吴磊在索引中的IDF评分则会很低。

而“帅磊”中“帅”作为姓氏却是非常少见的，因此IDF的得分就很高。在词频相同的情况下就会导致以上不符合常理的搜索预期。

解决办法：

为了避免某一个字段的词频或者反词频对结果产生巨大影响，我们需要把“姓”和“名”作为一个整体来查询，体现在代码上即：

 吴 必须包含在 name.姓 或者 name.名 里
# 并且
# 磊 必须包含在 name.姓 或者 name.名 里
GET teacher/_search
{
  "query": {
    "multi_match" : {
      "query":      "吴磊",
      "type":       "cross_fields",
      "fields":     [ "name.姓", "name.名" ],
      "operator":   "and"
    }
  }
}

ES中的Multi_match深入解读：best_fields、most_fields、cross_fields用法一览

综合实战

PUT product
{
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "analyzer": "ik_max_word"
      },
      "desc":{
        "type": "text",
        "analyzer": "ik_max_word"
      }
    }
  }
}


PUT /product/_doc/1
{
  "name": "chiji shouji，游戏神器，super ",
  "desc": "基于TX深度定制，流畅游戏不发热，物理外挂，charge",
  "price": 3999,
  "createtime": "2020-05-20",
  "collected_num": 99,
  "tags": [
    "性价比",
    "发烧",
    "不卡"
  ]
}

PUT /product/_doc/2
{
  "name": "xiaomi NFC shouji",
  "desc": "支持全功能NFC,专业 chiji，charge",
  "price": 4999,
  "createtime": "2020-05-20",
  "collected_num": 299,
  "tags": [
    "性价比",
    "发烧",
    "公交卡"
  ]
}

PUT /product/_doc/3
{
  "name": "NFC shouji，super ",
  "desc": "shouji 中的轰炸机",
  "price": 2999,
  "createtime": "2020-05-20",
  "collected_num": 1299,
  "tags": [
    "性价比",
    "发烧",
    "门禁卡"
  ]
}
GET /product/_search
PUT /product/_doc/4
{
  "name": "xiaomi 耳机",
  "desc": "耳机中的黄焖鸡",
  "price": 999,
  "createtime": "2020-05-20",
  "collected_num": 9,
  "tags": [
    "低调",
    "防水",
    "音质好"
  ]
}

PUT /product/_doc/5
{
  "name": "红米耳机",
  "desc": "耳机中的肯德基",
  "price": 399,
  "createtime": "2020-05-20",
  "collected_num": 0,
  "tags": [
    "牛逼",
    "续航长",
    "质量好"
  ]
}

# _source
GET product/_search
{
  "_source": false, 
  "from": 0,
  "size": 10, 
  "query": {
    "match_all": {}
  }
}

GET product/_mapping
GET product/_search


GET _analyze
{
  "analyzer": "ik_max_word",
  "text": ["xiaomi NFC shouji msb tech"]
}
GET product/_search
{
  "query": {
    "match_phrase": {
      "name": "NFC shouji msb"
    }
  }
}

GET product/_search
GET product/_search
{
  "from": 0,
  "size": 20, 
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "price": {
        "order": "asc"
      }
    },
    {
      "_score": {
        "order": "desc"
      }
    }
  ]
}

GET product/_mapping
GET product/_search
{
  "query": {
    "term": {
      "name": "xiaomi"
    }
  },
  "highlight": {
    "fields": {
      "name": {
        "pre_tags": [
          "<b>"
        ],
        "post_tags": [
          "</b>"
        ]
      }
    }
  }
}



PUT /product2/_doc/1
{
    "name" : "xiaomi phone",
    "desc" :  "shouji zhong de zhandouji",
    "date": "2021-06-01",
    "price" :  3999,
    "tags": [ "xingjiabi", "fashao", "buka" ]
}
PUT /product2/_doc/2
{
    "name" : "xiaomi nfc phone",
    "desc" :  "zhichi quangongneng nfc,shouji zhong de jianjiji",
    "date": "2021-06-02",
    "price" :  4999,
    "tags": [ "xingjiabi", "fashao", "gongjiaoka" ]
}

GET product2/_search

GET _cat/indices
GET product2/_mapping
GET product2/_search
{
  "query": {
    "term": {
      "name.keyword": "xiaomi phone"
    }
  }
}

# where tags in ("xingjiabi","gongjiaoka")
GET product2/_search
{
  "query": {
    "terms": {
      "tags": ["xingjiabi","gongjiaoka"]
    }
  }
}


GET product/_search
{
  "query": {
    "range": {
      "price": {
        "gte": 4000,
        "lte": 5000
      }
    }
  }
}
GET product2/_search
{
  "query": {
    "match": {
      "name": "xiaomi phone"
    }
  }
}

GET product2/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "term": {
          "name.keyword": "xiaomi phone"
        }
      }
    }
  }
}
GET product/_search
GET product/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match_phrase": {
            "name": "chiji shouji"
          }
        }
      ],
      "should": [
        {
          "match_phrase": {
            "name": "msb tech"
          }
        }
      ],
      "minimum_should_match": 1
    }
  }
}
GET product/_search
{
  "query": {
    "bool": {
      "must_not": [
        {
          "match_phrase": {
            "name": "msb tech"
          }
        }
      ]
    }
  }
}

# most_fields ★ 
# 评分计算规则 ☆  有时间看看，没时间过
GET product/_search
GET product/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "name": "chiji shouji"
          }
        },
        {
          "match": {
            "desc": "chiji shouji"
          }
        }
      ]
    }
  }
}
GET product/_search
{
  "query": {
    "multi_match": {
      "query": "chiji shouji",
      "type": "most_fields",
      "fields": [
        "name",
        "desc"
      ]
    }
  }
}

# 默认 best_fields ★ 
GET product/_search
{
  "query": {
    "dis_max": {
      "queries": [
        { "match": { "name": "chiji shouji" }},
        { "match": { "desc": "chiji shouji" }}
      ]
    }
  }
}
# tie_breaker": 其他字段的权重值 ★
GET product/_search
{
  "query": {
    "dis_max": {
      "queries": [
        { "match": { "name": "super charge" }},
        { "match": { "desc": "super charge" }}
      ]
    }
  }
}

GET product/_search
{
  "query": {
    "dis_max": {
      "queries": [
        { "match": { "name": "super charge" }},
        { "match": { "desc": "super charge" }}
      ],
      "tie_breaker": 0.3,
      "boost": 2
    }
  }
}
GET product/_search
{
  "query": {
    "multi_match" : {
      "query":      "super charge",
      "type":       "best_fields", // 默认
      "fields":     [ "name^2", "desc" ],
      "tie_breaker": 0.3
    }
  }
}

#控制匹配精度 
GET product/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "name": {
              "query": "xiaomi shouji",
              "minimum_should_match": "100%"
            }
          }
        }
      ]
    }
  }
}
GET product/_search
{
  "query": {
    "bool": {
      "should": [
        {"match": {"name": "xiaomi "}},
        {"match": {"name": "shouji"}}
      ],
      "minimum_should_match": "50%"
    }
  }
}
# 略
GET product/_search
{
    "query": {
        "dis_max": {
            "queries": [
                {
                  "match": {
                    "name": {
                      "query": "super charge",
                      "minimum_should_match": "50%",
                      "boost": 2
                    }
                  }
                },
                {
                  "match": {
                    "desc":{
                      "query": "super charge",
                      "minimum_should_match": "50%",
                      "boost": 1
                    }
                  }
                }
            ],
            "tie_breaker": 0.7
        }
    }
}

GET product/_search
{
    "query": {
        "dis_max": {
            "queries": [
                {
                  "match": {
                    "name": {
                      "query": "super charge",
                      "minimum_should_match": "50%",
                      "boost": 2
                    }
                  }
                },
                {
                  "match": {
                    "desc":{
                      "query": "super charge",
                      "minimum_should_match": "50%",
                      "boost": 1
                    }
                  }
                }
            ],
            "tie_breaker": 0.7
        }
    }
}

# cross_fields ?
POST /teacher/_bulk
{ "index": { "_id": "1"} }
{ "name" : {"姓" : "吴", "名" : "磊"} }
{ "index": { "_id": "2"} }    
{ "name" : {"姓" : "连", "名" : "鹏鹏"} }
{ "index": { "_id": "3"} }
{ "name" : { "姓" : "张","名" : "明明"} }
{ "index": { "_id": "4"} }
{ "name" : { "姓" : "周","名" : "志志"} }
{ "index": { "_id": "5"} }
{ "name" : {"姓" : "吴", "名" : "亦凡"} }
{ "index": { "_id": "6"} }
{ "name" : {"姓" : "吴", "名" : "京"} }
{ "index": { "_id": "7"} }
{ "name" : {"姓" : "吴", "名" : "彦祖"} }
{ "index": { "_id": "8"} }
{ "name" : {"姓" : "帅", "名" : "吴"} }
{ "index": { "_id": "9"} }
{ "name" : {"姓" : "连", "名" : "磊"} }
{ "index": { "_id": "10"} }
{ "name" : {"姓" : "周", "名" : "磊"} }
{ "index": { "_id": "11"} }
{ "name" : {"姓" : "张", "名" : "磊"} }
{ "index": { "_id": "12"} }
{ "name" : {"姓" : "马", "名" : "磊"} }
#{ "index": { "_id": "13"} }
#{ "name" : {"姓" : "诸葛", "名" : "吴磊"} }

GET teacher/_mapping
# standard
GET _analyze
{
  "analyzer": "standard",
  "text": ["吴 磊"]
}
GET teacher/_search
{
  "query": {
    "multi_match": {
      "query": "吴 磊",
      "type": "most_fields",
      "fields": [
        "name.姓",
        "name.名"
      ]
    }
  }
}


# 吴 必须包含在 name.姓 或者 name.名 里
# 并且
# 磊 必须包含在 name.姓 或者 name.名 里
GET teacher/_search
{
  "query": {
    "multi_match" : {
      "query":      "吴磊",
      "type":       "cross_fields",
      "fields":     [ "name.姓", "name.名" ],
      "operator":   "or"
    }
  }
}

综合实战多字段搜索

2、搜索模板

# 创建搜索模板
PUT _scripts/my-search-template
{
  "script": {
    "lang": "mustache",
    "source": {
      "query": {
        "match": {
          "message": "{{query_string}}"
        }
      },
      "from": "{{from}}",
      "size": "{{size}}"
    },
    "params": {
      "query_string": "nfc phone"
    }
  }
}
# 验证使用模板
POST _render/template
{
  "id": "my-search-template",
  "params": {
    "query_string": "nfc phone",
    "from": 20,
    "size": 10
  }
}

# 使用搜索模板
GET product/_search/template
{
  "id":"my-search-template",
  "params": {
    "query_string":"nfc phone",
    "from":0,
    "size":10
  }
}
## 案例 Elastic认证考试原题
# 写入search template，名字为ser_tem_1
# 根据search template写出对应的match query，有my_field、my_value、size字段
# 通过参数来调用这个search template
# 这个search template运用到kibana_sample_data_flights中
# my_field是DestCountry，而my_value是CN，size为5.
POST _scripts/ser_temp_1
{
    "script": {
    "lang": "mustache",
    "source": {
      "query": {
        "match": {
          "{{my_field}}": "{{my_value}}"
        }
      },
      "size": "{{my_size}}"
    }
  }
}
GET kibana_sample_data_flights/_search/template
{
  "id":"ser_temp_1",
  "params": {
    "my_field":"DestCountry",
    "my_value":"CN",
    "my_size":5
  }
}
# 同时运行多个搜索模板
GET product,kibana_sample_data_flights/_msearch/template
{}
{"id":"my-search-template","params":{"query_string":"nfc","from":0,"size":10}}
{}
{"id":"ser_temp_1","params":{"my_field":"DestCountry","my_value":"CN","my_size":5}}

GET _scripts/my-search-template
GET _cluster/state/metadata?pretty&filter_path=metadata
#
DELETE _scripts/my-search-template
#设置默认值
PUT _scripts/default-template
{
  "script":{
    "lang":"mustache",
    "source": {
      "query":{
        "range":{
          "createtime":{
            "gte":"{{startdate}}",
            "lte":"{{enddate}}{{^enddate}}now/d{{/enddate}}"
          }
        }
      }
    }
  }
}
GET product/_search/template
{
  "id":"default-template",
  "params": {
    "startdate":"2020-05-01"
  }
}

3、Term Vector

3.1 场景及作用

3.2 基本用法

3.2.1 Term Vector选项

映射选项	解释
`no`	不存储术语向量。（默认）
`yes`	只存储字段中的术语。
`with_positions`	存储条款和位置。
`with_offsets`	存储术语和字符偏移。
`with_positions_offsets`	术语、位置和字符偏移被存储。
`with_positions_payloads`	存储术语、位置和有效载荷。
`with_positions_offsets_payloads`	存储术语、位置、偏移量和有效载荷。

3.2.2 查询参数

fields（可选，字符串）要包含在统计信息中的字段的逗号分隔列表或通配符表达式。

除非在completion_fields或fielddata_fields参数中提供了特定字段列表，否则用作默认列表。

field_statistics（可选，布尔值）如果true，则响应包括文档计数、文档频率总和和总词频总和。默认为true.
<offsets>（可选，布尔值）如果true，则响应包括术语偏移量。默认为true.
payloads（可选，布尔值）如果true，则响应包括术语有效负载。默认为true.
positions（可选，布尔值）如果true，则响应包括术语位置。默认为true.
preference（可选，字符串）指定应在其上执行操作的节点或分片。默认随机。
routing（可选，字符串）用于将操作路由到特定分片的自定义值。
realtime（可选，布尔值）如果true，则请求是实时的，而不是接近实时的。默认为true. 请参阅实时。
term_statistics（可选，布尔值）如果true，则响应包括词频和文档频率。默认为false.
version（可选，布尔值）如果true，则返回文档版本作为命中的一部分。
version_type（可选，枚举）特定版本类型：external, external_gte.

DELETE term_vector_index
GET term_vector_index/_mapping
PUT term_vector_index
{
  "mappings": {
    "properties": {
      "text": {
        "type": "text",
        "term_vector": "with_positions_offsets_payloads"
      }
    }
  }
}
PUT term_vector_index/_doc/1
{
  "text": "you can kill me but can't kuck me",
  "desc": "test test test"
}
PUT term_vector_index/_doc/2
{
  "text": "a big house bling bling me",
  "desc": "test2 test2 test2"
}
PUT term_vector_index/_doc/3
{
  "text" : "one day is your teacher,day day is your father"
}
PUT term_vector_index/_doc/4
{
  "text" : "good good study, day day up"
}
GET term_vector_index/_settings
GET term_vector_index/_search
GET term_vector_index/_doc/2
# 1: 9
# 2: 8
# 3: 12
# 4: 9
GET term_vector_index/_termvectors/1
{
  "fields": ["text"],
  "offsets": true,
  "payloads": true,
  "positions": true,
  "term_statistics": true,
  "field_statistics": true
}

POST _bulk
{"index":{"_index":"hightlight_index","_id":1}}
{"title":"宝强买了一顶帽子颜色特别好看，我感觉特别帅","content":"宝强买了一顶帽子颜色特别好看，我感觉特别帅。吧啦吧啦吧啦。。。"}
{"index":{"_index":"hightlight_index","_id":1}}
{"title":"宝强哥有两个孩子，一个和楼上长得特别像，一个和楼下长得特别像","content":"宝强哥有两个孩子，一个和楼上长得特别像，一个和楼下长得特别像。吧啦吧啦吧啦。。。"}
{"index":{"_index":"hightlight_index","_id":1}}
{"title":"宝强哥的钱被人卷走了，骗子是在太可恶了","content":"宝强买了一顶帽子颜色特别好看，我感觉特别帅。吧啦吧啦吧啦。。。"}

GET hightlight_index/_search
{
  "query": {
    "match": {
      "title": "宝强"
    }
  },
  "highlight": {
    "fields": {
      "title": {},
      "content": {}
    }
  }
}
GET hightlight_index/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "title": "宝强"
          }
        },
        {
          "match": {
            "content": "宝强"
          }
        }
      ]
    }
  }
}
GET hightlight_index/_search
{
  "query": {
    "multi_match": {
      "query": "宝强",
      "type": "most_fields",
      "fields": [
        "title",
        "content"
      ]
    }
  }
}

4、高亮查询

4.1 多字段高亮

4.2 三种高亮显示器（荧光笔）

4.2.1 unified highlighter

4.2.2 Plain highlighter

4.2.3 Fast vector highlighter

4.3 高亮的作用域

POST _bulk
{"index":{"_index":"hightlight_index","_id":1}}
{"title":"宝强买了一顶帽子颜色特别好看，我感觉特别帅","content":"宝强买了一顶帽子颜色特别好看，我感觉特别帅。吧啦吧啦吧啦。。。"}
{"index":{"_index":"hightlight_index","_id":1}}
{"title":"宝强哥有两个孩子，一个和楼上长得特别像，一个和楼下长得特别像","content":"宝强哥有两个孩子，一个和楼上长得特别像，一个和楼下长得特别像。吧啦吧啦吧啦。。。"}
{"index":{"_index":"hightlight_index","_id":1}}
{"title":"宝强哥的钱被人卷走了，骗子是在太可恶了","content":"宝强买了一顶帽子颜色特别好看，我感觉特别帅。吧啦吧啦吧啦。。。"}

GET hightlight_index/_search
{
  "query": {
    "match": {
      "title": "宝强"
    }
  },
  "highlight": {
    "fields": {
      "title": {},
      "content": {}
    }
  }
}
GET hightlight_index/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "title": "宝强"
          }
        },
        {
          "match": {
            "content": "宝强"
          }
        }
      ]
    }
  }
}
GET hightlight_index/_search
{
  "query": {
    "multi_match": {
      "query": "宝强",
      "type": "most_fields",
      "fields": [
        "title",
        "content"
      ]
    }
  }
}

5、地理位置搜索

5.1 使用场景

5.2 两种数据类型

5.2.1 geo_point：

概念：经纬度坐标，只支持WGS84坐标系，坐标范围Lat值为[-90,90]，Lon为[-180,180]

latitude：维度缩写：lat
longitude：经度缩写：lon
ignore_malformed：则忽略格式错误的地理位置。如果false（默认）

五种存储方式

5.2.2 geo_shape

概念：ES的特殊类型之一，用来描述复杂的几何图形的类型，比如点、线、面，多边形等二维几何模型。

GeoJSON：GeoJSON是一种用于编码各种地理数据结构的格式，支持以下几种几何类型：
- Point：点
- LineString：线段
- Polygon：多边形
- MultiPoint：多点
- MultiLineString：多线段
- MultiPolygon：多边形集合
- Feature：具有其他属性的对象
WKT（Well-Known Text）：POINT(125.6 10.1)

GeoJSON（OGC）和WKT到Elasticsearch类型的映射关系

GeoJSON类型	WKT类型	Elasticsearch类型	描述
Point	POINT	point	单个地理坐标。注意：Elasticsearch仅使用WGS-84坐标。
LineString	LINESTRING	linestring	给定两个或两个以上点的任意线。
Polygon	POLYGON	polygon	一个封闭的多边形，其第一个点和最后一个点必须匹配，因此需要n + 1顶点创建一个带n边的多边形和一个最小的4顶点。
MultiPoint	MULTIPOINT	multipoint	一组未连接但可能相关的点。
MultiLineString	MULTILINESTRING	multilinestring	单独的线串数组。
MultiPolygon	MULTIPOLYGON	multipolygon	一组单独的多边形。
GeometryCollection	GEOMETRYCOLLECTION	geometrycollection	与JSON形状相似的GeoJSON形状， multi*但可以同时存在多种类型（例如，Point和LineString）。
N/A	BBOX	envelope	通过仅指定左上和右下点指定的边界矩形。
N/A	N/A	circle	由中心点和半径指定的圆，单位为，默认为METERS。

5.3 Geo_point Based Request

5.3.1 矩形查询（geo_bounding box）

概念：在同一个平面内，两个点确定一个矩形，搜索矩形内的坐标。

top_left：矩形左上点坐标
bottom_right：矩形右上角表

5.3.2 半径查询（geo_distance）

概念：以某个点为圆心查找指定半径的圆内的坐标。

distance：距离单位，默认是米，支持以下选项
- Mile（英里）：mi 或者 miles
- Yard（码）：yd 或者 yards
- Feet（英尺）：ft 或者 feet
- Inch（英寸）：in 或者 inch
- Kilometer（公里）：km 或者 kilometers
- Meter（米）：m 或者 meters
- Centimeter（厘米）：cm 或者 centimeters
- Millimeter（毫米）： mm 或者 millimeters
- Nautical mile（海里）： NM , nmi , 或者 nauticalmiles
distance_type：计算距离的方式
- arc（默认值）：更准确，但是速度慢
- plane：（更快，但在长距离和极点附近不准确）

5.3.3 多边形（geo_polygon）

概念：查找给定多个点连成的多边形内的坐标。

5.4 Geo_shape Based Request

概念：支持指定几何图形相交、包含或是不相交等图形检索

5.4.1 地理几何分类（geo_shape type）

点（point）
矩形（envelope）
多边形（polygon）
圆形（circle）

5.4.2 地理几何存储

注：圆形处理精度解释

表示圆的多边形的精度定义为error_distance。这种差异越小，多边形越接近理想圆。下表是旨在帮助捕获在给定不同输入的情况下圆的半径如何影响多边形的边数的表格。最小边数为4，最大为1000。

error_distance	半径（米）	多边形的边数
1	1	4
1	10	14
1	100	45
1	1,000	141
1	10,000	445
1	100,000	1000

5.4.3 地理几何检索

Inline Shape Definition：内联形状
Pre-Indexed Shape：预定义形状
- id- 包含预索引形状的文档ID。
- index- 索引的名称，其中预索引形状为：默认形状。
- routing- 非必须。
- path- 包含预索引形状的指定路径，默认形状。
Spatial Relations：空间关系
- INTERSECTS- (default) Return all documents whose shape field intersects the query geometry。
- DISJOINT - Return all documents whose shape field has nothing in common with the query geometry
- WITHIN - Return all documents whose shape field is within the query geometry。
- CONTAINS- Return all documents whose shape field contains the query geometry。

# 地理位置搜索
DELETE geo_point
GET geo_point/_mapping
PUT geo_point
{
  "mappings": {
    "properties": {
      "location": {
        "type": "geo_point"
      }
    }
  }
}
## 位置信息的五种存储方式
## 第一种
PUT geo_point/_doc/1
{
  "name": "天安门",
  "location": { 
    "lat": 40.12,
    "lon": -71.34
  }
}
GET geo_point/_search
## 第二种 lat lon
PUT geo_point/_doc/2
{
  "name": "前门",
  "location": "40.12,-71.34"
}
## 第三种 lon lat 
PUT geo_point/_doc/3
{
  "name": "后门",
  "location": [-71.34,40.12]
}

## 第四种 WKT 
PUT geo_point/_doc/4
{
  "name": "西直门",
  "location":"POINT (-70 40.12)" 
}

GET geo_point/_mapping
GET geo_point/_search

## 第五种 
#Geo哈希 https://www.cnblogs.com/LBSer/p/3310455.html


GET _cat/indices
PUT test_index/_doc/1
{
  "tet":"sss"
}
GET test_index/_mapping

# 四种查询
GET geo_point/_mapping
## 矩形检索
GET geo_point/_search
{
  "query": {
    "geo_bounding_box":{
      "location":{
        "top_left":{
          "lat": 50.73,
          "lon": -74.1
        },
        "bottom_right":{
          "lat": 30.01,
          "lon": -61.12
        }
      }
    }
  }
}
## 半径查找（圆形查找）
GET geo_point/_search
{
  "query": {
    "geo_distance":{
      "distance": "50km",
      "location": {
        "lat": 40,
        "lon": -71
      }
    }
  }
}
## 多边形查找
GET geo_point/_search
{
  "query": {
    "geo_polygon": {
      "location": {
        "points": [
          {
            "lat": 40,
            "lon": -70
          },
          {
            "lat": 40,
            "lon": -80
          },
          {
            "lat": 50,
            "lon": -90
          }
        ]
      }
    }
  }
}

#评分和排序
GET geo_point/_search
{
  "query": {
    "geo_distance": {
      "distance": "87km",
      "location": {
        "lat": 40,
        "lon": -71
      }
    }
  },
  "sort": [
    {
      "_geo_distance": {
        "location": {
          "lat": 40,
          "lon": -71
        },
        "order": "asc"
      }
    }
  ]
}
## 特殊几何图形 geo_shape

DELETE geo_shape

PUT geo_shape
{
  "mappings": {
    "properties": {
      "location": {
        "type": "geo_shape"
      }
    }
  }
}

## 存储“点”
POST /geo_shape/_doc/1
{
  "name":"中国 香海",
  "location":{
    "type":"point",
    "coordinates":[13.400544, 52.530286]
  }
}
###
POST /geo_shape/_doc/1
{
  "name":"中国 香海",
  "location":"POINT (13.400544 52.530286)"
}

## 存储线段
POST /geo_shape/_doc/2
{
  "name":"湘江路",
  "location":{
    "type":"linestring",
    "coordinates":[[13.400544, 52.530286],[-77.400544, 38.530286]]
  }
}
### WKT
POST /geo_shape/_doc/2
{
  "name": "湘江路",
  "location":"LINESTRING (13.400544 52.530286,-77.400544 38.530286)"
}

## 存储矩形


## 存储多边形
POST /geo_shape/_doc/3
{
  "name": "湘江路",
  "location": {
    "type": "multipolygon",
    "coordinates": [
      [
        [
          [
            100,
            0
          ],
          [
            101,
            0
          ],
          [
            101,
            1
          ],
          [
            100,
            1
          ],
          [
            100,
            0
          ]
        ],
        [
          [
            100.2,
            0.2
          ],
          [
            100.8,
            0.2
          ],
          [
            100.8,
            0.8
          ],
          [
            100.2,
            0.8
          ],
          [
            100.2,
            0.2
          ]
        ]
      ],
      [
        [
          [
            100,
            0
          ],
          [
            101,
            0
          ],
          [
            101,
            1
          ],
          [
            100,
            1
          ],
          [
            100,
            0
          ]
        ],
        [
          [
            100.2,
            0.2
          ],
          [
            100.8,
            0.2
          ],
          [
            100.8,
            0.8
          ],
          [
            100.2,
            0.8
          ],
          [
            100.2,
            0.2
          ]
        ]
      ]
    ]
  }
}

##

PUT _ingest/pipeline/polygonize_circles
{
  "description": "圆形转换成多边形",
  "processors": [
    {
      "circle": {
        "field": "location",
        "error_distance": 0,
        "shape_type": "geo_shape"
      }
    }
  ]
}
POST /geo_shape/_doc/4?pipeline=polygonize_circles
{
  "name": "安全区",
  "location": {
    "type": "circle",
    "coordinates": [
      30,
      10
    ],
    "radius": "100m"
  }
}
GET geo_shape/_doc/4

200-16
184-4
180/4
45

4020-16-4
4000/4
1000



#inline Shape Definition
## 存储一个geo_shape坐标先
POST /geo_shape/_doc/1
{
    "name": "中国，香海",
    "location": {
        "type": "point",
        "coordinates": [13.400544, 52.530286]
    }
}

# 使用几何图形查找 “点”
# geo_point的检索方式
# 天安门
#"lat" : 40.12,
#"lon" : -71.34
GET geo_point/_search
{
  "query": {
    "geo_bounding_box":{
      "location":{
        "top_left":{
          "lat": 50.73,
          "lon": -74.1
        },
        "bottom_right":{
          "lat": 30.01,
          "lon": -61.12
        }
      }
    }
  }
}

# geo_point query 查 geo_shape 查不出来
GET geo_point/_search
{
  "query": {
    "geo_bounding_box":{
      "location":{
        "top_left":{
          "lat": 53,
          "lon": 13
        },
        "bottom_right":{
          "lat": 52,
          "lon": 14
        }
      }
    }
  }
}
# geo_shape的检索方式 内联查询 补充geo_shape envelope 矩形
GET geo_shape/_search
{
  "query": {
    "geo_shape": {
      "location": {
        "shape": {
          "type": "envelope",
          "coordinates": [
            [
              13,
              53
            ],
            [
              14,
              52
            ]
          ]
        },
        "relation": "within"
      }
    }
  }
}

# geo_shape的查询去检索geo_point存储的坐标
GET geo_shape/_search
{
  "query": {
    "geo_shape": {
      "location": {
        "shape": {
          "type": "envelope",
          "coordinates": [
            [
              -100,
              50
            ],
            [
              0,
              0
            ]
          ]
        },
        "relation": "within"
      }
    }
  }
}
GET geo_shape/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "geo_shape": {
            "location": {
              "shape": {
                "type": "envelope",
                "coordinates": [
                  [
                    13,
                    53
                  ],
                  [
                    14,
                    52
                  ]
                ]
              },
              "relation": "within"
            }
          }
        }
      ]
    }
  }
}

GET geo_shape/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "geo_shape": {
          "location": {
            "shape": {
              "type": "envelope",
              "coordinates": [
                [
                  13,
                  53
                ],
                [
                  14,
                  52
                ]
              ]
            },
            "relation": "within"
          }
        }
      }
    }
  }
}



GET /geo_shape/_search
{
  "query": {
    "bool": {
      "filter": {
        "geo_shape": {
          "location": {
            "indexed_shape": {
              "index": "geo_shape",
              "id": "4",
              "path": "location"
            },
            "relation": "within"
          }
        }
      }
    }
  }
}




PUT geo_shape_relation_test
{
  "mappings": {
    "properties": {
      "location": {
        "type": "geo_shape"
      }
    }
  }
}

#存矩形
POST /geo_shape_relation_test/_doc/A
{
  "location": {
    "type": "envelope",
    "coordinates": [[1,9],[10,1]]
  }
}
POST /geo_shape_relation_test/_doc/B
{
  "location": {
    "type": "envelope",
    "coordinates": [[10,6],[13,5]]
  }
}
POST /geo_shape_relation_test/_doc/C
{
  "location": {
    "type": "envelope",
    "coordinates": [[2,5],[5,2]]
  }
}
POST /geo_shape_relation_test/_doc/D
{
  "location": {
    "type": "envelope",
    "coordinates": [[1,9],[10,1]]
  }
}
POST /geo_shape_relation_test/_doc/E
{
  "location": {
    "type": "envelope",
    "coordinates": [[10,9],[14,1]]
  }
}
POST /geo_shape_relation_test/_doc/F
{
  "location": {
    "type": "envelope",
    "coordinates": [[0,1],[1,0]]
  }
}
#P1
POST /geo_shape_relation_test/_doc/P1
{
  "name":"P1",
  "location":{
    "type":"point",
    "coordinates":[3, 3]
  }
}
#P2
POST /geo_shape_relation_test/_doc/P2
{
  "name":"P2",
  "location":{
    "type":"point",
    "coordinates":[8, 7]
  }
}
#P3
POST /geo_shape_relation_test/_doc/P3
{
  "name":"P3",
  "location":{
    "type":"point",
    "coordinates":[11, 7]
  }
}
#P4
POST /geo_shape_relation_test/_doc/P4
{
  "name":"P4",
  "location":{
    "type":"point",
    "coordinates":[4, 7]
  }
}
#P5
POST /geo_shape_relation_test/_doc/P5
{
  "name":"P5",
  "location":{
    "type":"point",
    "coordinates":[11, 3]
  }
}
#P6
POST /geo_shape_relation_test/_doc/P6
{
  "name":"P6",
  "location":{
    "type":"point",
    "coordinates":[3, 1]
  }
}
#P7
POST /geo_shape_relation_test/_doc/P7
{
  "name":"P7",
  "location":{
    "type":"point",
    "coordinates":[1, 9]
  }
}
GET /geo_shape_relation_test/_search
{
  "query": {
    "bool": {
      "filter": {
        "geo_shape": {
          "location": {
            "indexed_shape": {
              "index": "geo_shape_relation_test",
              "id": "A",
              "path": "location"
            },
            "relation": "intersects"
          }
        }
      }
    }
  }
}

地理位置检索实战

posted on 2023-05-14 17:02 孙龙-程序员阅读(1903) 评论(0) 收藏举报

刷新页面返回顶部