Elasticsearch7.8.0教程（二）

一. Request Body深入搜索

1.1 term查询

term是表达语义的最小单位，在搜索的时候基本都要使用到term。

term查询的种类有：Term Query、Range Query等。

在ES中，Term查询不会对输入进行分词处理，将输入作为一个整体，在倒排索引中查找准确的词项。我们也可以使用 Constant Score 将查询转换为一个filter,避免算分，利用缓存，提高查询的效率。

1.1.1 查询电影名字中包含有 beautiful 这个单词的所有的电影，用于查询的单词不会进行分词的处理

#查询到98条结果
GET movies/_search
{
  "query": {
    "term": {
      "title": {
        "value": "beautiful"
      }
    }
  }
}

#查询到0条结果（term查询不会分词，相当于用Beautiful去查询，但user的索引存入时默认standard分词，倒排索引关键词是beautiful）
GET movies/_search
{
  "query": {
    "term": {
      "title": {
        "value": "Beautiful"
      }
    }
  }
}

1.1.2 查询电影名字中包含有 beautiful 或者 mind 这两个单词的所有的电影，用于查询的单词不会进行分词的处理

GET movies/_search
{
  "query": {
    "terms": {
      "title": [
        "beautiful",
        "mind"
      ]
    }
  }
}

1.1.3 查询上映在2016到2018年的所有的电影，再根据上映时间的倒序进行排序

GET movies/_search
{
  "query": {
    "range": {
      "year": {
        "gte": 2016,
        "lte": 2018
      }
    }
  },
  "sort": [
    {
      "year": {
        "order": "desc"
      }
    }
  ]
}

1.1.4 Constant Score查询(只能用term查询) title中包含有beautiful的所有的电影，不进行相关性算分，查询的数据进行缓存，提高效率

GET movies/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "term": {
          "title": "beautiful"
        }
      },
      "boost": 1.2
    }
  }
}

1.2 全文查询

全文查询的种类有: Match Query、Match Phrase Query、Query String Query等

索引和搜索的时候都会进行分词，在查询的时候，会对输入进行分词，然后每个词项会逐个到底层进行查询，将最终的结果进行合并

1.2.1 match 查询title中包含beautiful或mind的数据

GET movies/_search
{
  "query": {
    "match": {
      "title": "beautiful mind"
    }
  }
}

1.2.2 match 查询title中包含beautiful或mind的数据，指定查询属性

GET movies/_search
{
  "_source": ["title", "id", "year"], 
  "query": {
    "match": {
      "title": "beautiful mind"
    }
  }
}

1.2.3 match 查询年份区间为[1990，1992]的数据

GET movies/_search
{
  "query": {
    "range": {
      "year": {
        "gte": 1990,
        "lte": 1992
      }
    }
  }
}

1.2.4 match 查询年份区间为[1990，1992]的数据，并分页

GET movies/_search
{
  "query": {
    "range": {
      "year": {
        "gte": 1990,
        "lte": 1992
      }
    }
  },
  "from": 5,
  "size": 10
}

1.2.5 match 查询年份区间为[1990，1992]的数据，并且title包含beautiful或mind

GET movies/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "range": {
            "year": {
              "gte": 1990,
              "lte": 1992
            }
          }
        },
        {
          "match": {
            "title": "beautiful mind"
          }
        }
      ]
    }
  }
}

#报错，query只能一种查询
GET movies/_search
{
  "_source": ["title", "id", "year"], 
  "query": {
    "match": {
      "title": "beautiful mind"
    },
    "range": {
      "year": {
        "gte": 1990,
        "lte": 1992
      }
    }
  }
}

1.2.6 match_phrase 查询电影名字中包含有 "beautiful mind" 这个短语的所有的数据(以下三个查询一个效果)

GET movies/_search
{
  "query": {
    "match_phrase": {
      "title": "beautiful mind"
    }
  }
}

GET movies/_search
{
  "query": {
    "match_phrase": {
      "title": "Beautiful mind"
    }
  }
}

GET movies/_search
{
  "query": {
    "match_phrase": {
      "title": "BEautiful mind"
    }
  }
}

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 13.474829,
    "hits" : [
      {
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "4995",
        "_score" : 13.474829,
        "_source" : {
          "title" : "Beautiful Mind, A",
          "genre" : [
            "Drama",
            "Romance"
          ],
          "year" : 2001,
          "id" : "4995",
          "@version" : "1"
        }
      }
    ]
  }
}

1.2.7 match_all 查询所有的数据

GET movies/_search
{
  "query": {
    "match_all": {}
  }
}

#和不加request body等同
GET movies/_search

1.2.8 multi_match 查询title或genre中包含有beautiful或者Adventure的前20条数据

GET movies/_search
{
  "query": {
    "multi_match": {
      "query": "beautiful adventure",
      "fields": ["title", "genre"]
    }
  },
  "size": 20
}

1.2.9 query_string

#this或that
GET movies/_search
{
  "query": {
    "query_string": {
      "default_field": "title",
      "query": "this that"
    }
  }
}

#this或that
GET movies/_search
{
  "query": {
    "query_string": {
      "default_field": "title",
      "query": "this that",
      "default_operator": "OR"
    }
  }
}

#this和that
GET movies/_search
{
  "query": {
    "query_string": {
      "default_field": "title",
      "query": "this AND that"
    }
  }
}

#this和that
GET movies/_search
{
  "query": {
    "query_string": {
      "default_field": "title",
      "query": "this that",
      "default_operator": "AND"
    }
  }
}

1.2.10 simple_query_string

查询title中包含 beautiful或and或mind

GET movies/_search
{
  "query": {
    "simple_query_string": {
      "query": "beautiful AND mind",
      "fields": ["title"]
    }
  }
}

查询title中包含 beautiful或and

GET movies/_search
{
  "query": {
    "simple_query_string": {
      "query": "beautiful mind",
      "fields": ["title"],
      "default_operator": "AND"
    }
  }
}

GET movies/_search
{
  "query": {
    "simple_query_string": {
      "query": "beautiful + mind",
      "fields": ["title"]
    }
  }
}

查询title中包含 "beautiful mind" 这个短语的所有的电影 (用法和match_phrase类似)

GET movies/_search
{
  "query": {
    "simple_query_string": {
      "query": "\"beautiful mind\"",
      "fields": ["title"]
    }
  }
}

查询title或genre中包含有 beautiful mind romance 这个三个单词的所有的电影（与 multi_match类似）

GET movies/_search
{
  "query": {
    "simple_query_string": {
      "query": "beautiful mind Romance",
      "fields": ["title", "genre"]
    }
  }
}

查询title中包含 “beautiful mind” 或者 "Modern Romance" 这两个短语的所有的电影

GET movies/_search
{
  "query": {
    "simple_query_string": {
      "query": "\"beautiful mind\" | \"Modern Romance\"",

      "fields": ["title", "genre"]
    }
  }
}

查询title或者genre中包含有 beautiful + mind 这个两个词，或者Comedy + Romance + Musical + Drama + Children 这个五个词的所有的数据

GET movies/_search
{
  "query": {
    "simple_query_string": {
      "query": "(beautiful + mind) | (Comedy + Romance + Musical + Drama + Children)",
      "fields": ["title","genre"]
    }
  }
}

查询 title 中包含 beautiful 和 people 但是不包含 Animals 的所有的数据

GET movies/_search
{
  "query": {
    "simple_query_string": {
      "query": "beautiful + people + -Animals",
      "fields": ["title"]
    }
  }
}

1.3 fuzzy 模糊搜索

#3条结果
GET movies/_search
{
  "query": {
    "fuzzy": {
      "title": {
        "value": "neverending"
      }
    }
  }
}

neverending改为neverendign(一次调整)；neverending改为neverendong(一次调整)；neverending改为neverendogn(两次调整)；neverending改为neverendoon(三次调整)

从以下结果来看：不加fuzziness默认调整1或2次，加上后指定调整次数查询，fuzziness的取值区间为[0, 2]

#3条结果
GET movies/_search
{
  "query": {
    "fuzzy": {
      "title": {
        "value": "neverendign"
      }
    }
  }
}

#3条结果
GET movies/_search
{
  "query": {
    "fuzzy": {
      "title": {
        "value": "neverendong"
      }
    }
  }
}

#3条结果
GET movies/_search
{
  "query": {
    "fuzzy": {
      "title": {
        "value": "neverendong",
        "fuzziness": 1
      }
    }
  }
}

#3条结果
GET movies/_search
{
  "query": {
    "fuzzy": {
      "title": {
        "value": "neverendong",
        "fuzziness": 2
      }
    }
  }
}

#3条结果
GET movies/_search
{
  "query": {
    "fuzzy": {
      "title": {
        "value": "neverendogn"
      }
    }
  }
}

#0条结果
GET movies/_search
{
  "query": {
    "fuzzy": {
      "title": {
        "value": "neverendogn",
        "fuzziness": 1
      }
    }
  }
}

#3条结果
GET movies/_search
{
  "query": {
    "fuzzy": {
      "title": {
        "value": "neverendogn",
        "fuzziness": 2
      }
    }
  }
}

#0条结果
GET movies/_search
{
  "query": {
    "fuzzy": {
      "title": {
        "value": "neverendoon"
      }
    }
  }
}

#0条结果
GET movies/_search
{
  "query": {
    "fuzzy": {
      "title": {
        "value": "neverendoon",
        "fuzziness": 3
      }
    }
  }
}

查询title中从第6个字母开始只要最多纠正一次，就与 neverendign 匹配的所有的数据

GET movies/_search
{
  "query": {
    "fuzzy": {
      "title": {
        "value": "neverendign",
        "fuzziness": 1, 
        "prefix_length": 5
      }
    }
  }
}

1.4 多条件查询

1.4.1 查询title中包含有beautiful或者mind单词，并且上映时间在2016~1018年的所有的电影

GET movies/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "simple_query_string": {
            "query": "beautiful mind",
            "fields": ["title"]
          }
        },
        {
          "range": {
            "year": {
              "gte": 2016,
              "lte": 2018
            }
          }
        }
      ]
    }
  }
}

1.4.2 查询title中包含有beautiful或者mind，且不包含brain,上映时间在2016~1018年的所有的电影

# must必须满足，must_not必须不满足，若只有must_not则不会进行相关性算分
GET movies/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "simple_query_string": {
            "query": "beautiful mind",
            "fields": ["title"]
          }
        },
        {
          "range": {
            "year": {
              "gte": 2016,
              "lte": 2018
            }
          }
        }
      ],
      "must_not": [
        {
          "simple_query_string": {
            "query": "brain",
            "fields": ["title"]
          }
        }
      ]
    }
  }
}

1.4.3 查询 title 中包含有 beautiful这个单词，并且上映年份在1990~1992年间的所有电影，但是不进行相关性的算分

#filter不会进行相关性的算分，并且会将查出来的结果进行缓存，效率上比 must 高
GET movies/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "terms": {
            "title": [
              "beautiful"
            ]
          }
        },
        {
          "range": {
          "year": {
            "gte": 1990,
            "lte": 1992
          }
        }
        }
      ]
    }
  }
}

1.4.4 查询 title 中包含有 beautiful这个单词，或者上映年份在1990~1992年间的所有电影

GET movies/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "terms": {
            "title": [
              "beautiful"
            ]
          }
        },
        {
          "range": {
          "year": {
            "gte": 1990,
            "lte": 1992
          }
        }
        }
      ]
    }
  }
}

二. Mapping

mapping类似于数据库中的schema，作用如下:

定义索引中的字段类型；

定义字段的数据类型，例如：布尔、字符串、数字、日期.....

字段倒排索引的设置

2.1数据类型

类型名	描述
Text/Keyword	字符串， Keyword的意思是字符串的内容不会被分词处理，输入是什么内容，存储在ES中就是什么内容。Text类型ES会自动的添加一个Keyword类型的子字段
Date	日期类型
Integer/Float/Long	数字类型
Boolean	布尔类型

ES中还有 "对象类型/嵌套类型"、"特殊类型（geo_point/geo_shape）"。

2.2 Mapping的定义

定义mapping的建议方式: 写入一个样本文档到临时索引中，ES会自动生成mapping信息，通过访问 mapping信息的api查询mapping的定义，修改自动生成的mapping成为我们需要方式，创建索引，删除临时索引，简而言之就是 “卸磨杀驴” 。

语法格式如下：

PUT users
{
    "mappings": {
    // define your mappings here
    }
}

查看mapping

GET movies/_mapping

keyword搜索

GET movies/_search
{
  "query": {
    "match": {
      "title.keyword": "Julia"
    }
  }
}

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 9.717158,
    "hits" : [
      {
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "32234",
        "_score" : 9.717158,
        "_source" : {
          "title" : "Julia",
          "genre" : [
            "Drama"
          ],
          "year" : 1977,
          "id" : "32234",
          "@version" : "1"
        }
      },
      {
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "58937",
        "_score" : 9.717158,
        "_source" : {
          "title" : "Julia",
          "genre" : [
            "Drama",
            "Thriller"
          ],
          "year" : 2008,
          "id" : "58937",
          "@version" : "1"
        }
      },
      {
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "129333",
        "_score" : 9.717158,
        "_source" : {
          "title" : "Julia",
          "genre" : [
            "Horror",
            "Thriller"
          ],
          "year" : 2014,
          "id" : "129333",
          "@version" : "1"
        }
      }
    ]
  }
}

2.3 常见参数

2.3.1 index

可以给属性添加一个布尔类型的index属性，标识该属性是否能被倒排索引，也就是说是否能通过该字段进行搜索。

2.3.2 null_value

在数据索引进ES的时候，当某些数据为 null 的时候，该数据是不能被搜索的，可以使用 null_value 属性指定一个值，当属性的值为 null 的时候，转换为一个通过 null_value 指定的值。 null_value属性只能用于Keyword类型的属性

三、高级搜索

3.1 聚合查询

聚合搜索的语法格式如下：

GET indexName/_search
{
    "aggs": {
        "aggs_name": { #聚合分析的名字是由用户自定义的
            "aggs_type": {
            // aggregation body
            }
        }
    }
}

给users索引创建mapping信息

PUT employee
{
  "mappings": {
    "properties": {
      "id": {
        "type": "integer"
      },
      "name": {
        "type": "keyword"
      },
      "job": {
        "type": "keyword"
      },
      "age": {
        "type": "integer"
      },
      "gender": {
        "type": "keyword"
      }
    }
  }
}

{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "employee"
}

往 users 索引中写入数据

PUT employee/_bulk
{"index": {"_id": 1}}
{"id": 1, "name": "Bob", "job": "java", "age": 21, "sal": 8000, "gender": "female"}
{"index": {"_id": 2}}
{"id": 2, "name": "Rod", "job": "html", "age": 31, "sal": 18000, "gender": "female"}
{"index": {"_id": 3}}
{"id": 3, "name": "Gaving", "job": "java", "age": 24, "sal": 12000, "gender": "male"}
{"index": {"_id": 4}}
{"id": 4, "name": "King", "job": "dba", "age": 26, "sal": 15000, "gender": "female"}
{"index": {"_id": 5}}
{"id": 5, "name": "Jonhson", "job": "dba", "age": 29, "sal": 16000, "gender": "male"}
{"index": {"_id": 6}}
{"id": 6, "name": "Douge", "job": "java", "age": 41, "sal": 20000, "gender": "female"}
{"index": {"_id": 7}}
{"id": 7, "name": "cutting", "job": "dba", "age": 27, "sal": 7000, "gender": "male"}
{"index": {"_id": 8}}
{"id": 8, "name": "Bona", "job": "html", "age": 22, "sal": 14000, "gender": "female"}
{"index": {"_id": 9}}
{"id": 9, "name": "Shyon", "job": "dba", "age": 20, "sal": 19000, "gender": "female"}
{"index": {"_id": 10}}
{"id": 10, "name": "James", "job": "html", "age": 18, "sal": 22000, "gender": "male"}
{"index": {"_id": 11}}
{"id": 11, "name": "Golsling", "job": "java", "age": 32, "sal": 23000, "gender": "female"}
{"index": {"_id": 12}}
{"id": 12, "name": "Lily", "job": "java", "age": 24, "sal": 2000, "gender": "male"}
{"index": {"_id": 13}}
{"id": 13, "name": "Jack", "job": "html", "age": 23, "sal": 3000, "gender": "female"}
{"index": {"_id": 14}}
{"id": 14, "name": "Rose", "job": "java", "age": 36, "sal": 6000, "gender": "female"}
{"index": {"_id": 15}}
{"id": 15, "name": "Will", "job": "dba", "age": 38, "sal": 4500, "gender": "male"}
{"index": {"_id": 16}}
{"id": 16, "name": "smith", "job": "java", "age": 32, "sal": 23000, "gender": "male"}

3.1.1 单值的输出

ES中大多数的数学计算只输出一个值，如：min、max、sum、avg、cardinality

# 1.查询工资的总合，sum_sal为自定义属性，作聚合还会查数据
GET employee/_search
{
  "aggs": {
    "sum_sal": {
      "sum": {
        "field": "sal"
      }
    }
  }
}
# 只查聚合的结果，不查数据
GET employee/_search
{
  "size": 0,
  "aggs": {
    "sum_sal": {
      "sum": {
        "field": "sal"
      }
    }
  }
}

# 2.查询平均工资
GET employee/_search
{
  "size": 0,
  "aggs": {
    "avg_sal": {
      "avg": {
        "field": "sal"
      }
    }
  }
}

# 3.查询总共有多少个岗位(对属性去重后count查询)
GET employee/_search
{
  "size": 0,
  "aggs": {
    "sum_job": {
      "cardinality": {
        "field": "job"
      }
    }
  }
}

# 4.查询航空平均票价的最大值、最小值、平均值
GET kibana_sample_data_flights/_search
{
  "size": 0, 
  "aggs": {
    "max_ticket_price": {
      "max": {
        "field": "AvgTicketPrice"
      }
    },
    "min_ticket_price": {
      "min": {
        "field": "AvgTicketPrice"
      }
    },
    "avg_ticket_price": {
      "avg": {
        "field": "AvgTicketPrice"
      }
    }
  }
}

3.1.2 多值的输出

ES还有些函数，可以一次性输出很多个统计的数据: terms、stats

# 1.查询员工工资信息(数值类型)
GET employee/_search
{
  "size": 0, 
  "aggs": {
    "sal_info": {
      "stats": {
        "field": "sal"
      }
    }
  }
}

# 2.查询到达不同国家的航班数量(分组)
GET kibana_sample_data_flights/_search
{
  "size": 0,
  "aggs": {
    "dest_country_info": {
      "terms": {
        "field": "DestCountry",
        "size": 10
      }
    }
  }
}

# 3.查询每个岗位有多少人
GET employee/_search
{
  "size": 0,
  "aggs": {
    "job_emps_num": {
      "terms": {
        "field": "job",
        "size": 10
      }
    }
  }
}

# 4.查询目标地的航班班次以及天气的统计信息(子聚合)
GET kibana_sample_data_flights/_search
{
  "size": 0,
  "aggs": {
    "dest_country_info": {
      "terms": {
        "field": "DestCountry"
      },
      "aggs": {
        "dest_country_weather_info": {
          "terms": {
            "field": "DestWeather"
          }
        }
      }
    }
  }
}

# 5.查询每个岗位下工资的信息(平均、最高、最少等)
GET employee/_search
{
  "size": 0, 
  "aggs": {
    "job_info": {
      "terms": {
        "field": "job"
      },
      "aggs": {
        "diff_job_sal_info": {
          "stats": {
            "field": "sal"
          }
        }
      }
    }
  }
}

# 6.查询不同工种的男女员工数量、然后统计不同工种下男女员工的工资信息
GET employee/_search
{
  "size": 0,
  "aggs": {
    "job_info": {
      "terms": {
        "field": "job"
      },
      "aggs": {
        "diff_job_gender_no": {
          "terms": {
            "field": "gender"
          },
          "aggs": {
            "diff_job_gender_sal_info": {
              "stats": {
                "field": "sal"
              }
            }
          }
        }
      }
    }
  }
}

# 7.查询年龄最大的两位员工的信息
GET employee/_search
{
  "size": 0,
  "aggs": {
    "older_two_emp": {
      "top_hits": {
        "size": 2,
        "sort": [
          {
            "age": {
              "order": "desc"
            }
          }
        ]
      }
    }
  }
}

# 8.查询不同工资区间员工工资的统计信息
GET employee/_search
{
  "size": 0,
  "aggs": {
    "rang_sal_info": {
      "range": {
        "field": "sal",
        "ranges": [
          {
            "key": "0 <= sal < 10001", 
            "to": 10001
          },
          {
            "key": "10001 <= sal < 20001", 
            "from": 10001, 
            "to": 20001
          },
          {
            "key": "20001 <= sal < 30001", 
            "from": 20001, 
            "to": 30001
          }
        ]
      }
    }
  }
}

# 9.以直方图的方式以每5000元为一个区间查询员工工资信息
GET employee/_search
{
  "size": 0,
  "aggs": {
    "range_sal_info": {
      "histogram": {
        "field": "sal",
        "interval": 5000,
        "extended_bounds": {
          "min": 0,
          "max": 15000
        }
      }
    }
  }
}

# 10. 查询平均工资最低的工种
GET employee/_search
{
  "size": 0,
  "aggs": {
    "job_info": {
      "terms": {
        "field": "job"
      },
      "aggs": {
        "diff_job_avg_sal": {
          "avg": {
            "field": "sal"
          }
        }
      }
    },
    "min_avg_sal_job": {
      "min_bucket": {
        "buckets_path": "job_info>diff_job_avg_sal"
      }
    }
  }
}

# 11.查询年龄大于30岁的员工的平均工资
GET employee/_search
{
  "size": 0, 
  "query": {
    "range": {
      "age": {
        "gt": 30
      }
    }
  },
  "aggs": {
    "gt_30_emp_avg_sal": {
      "avg": {
        "field": "sal"
      }
    }
  }
}

# 12.查询Java员工的平均工资(不进行相关性算法，效率更高)
GET employee/_search
{
  "size": 0, 
  "query": {
    "constant_score": {
      "filter": {
        "term": {
          "job": "java"
        }
      },
      "boost": 1.2
    }
  },
  "aggs": {
    "java_emp_avg_sal": {
      "avg": {
        "field": "sal"
      }
    }
  }
}

# 13.求30岁以上的员工平均工资和所有员工的平均工资
GET employee/_search
{
  "size": 0,
  "aggs": {
    "all_emp_avg_sal": {
      "avg": {
        "field": "sal"
      }
    },
    "gt_30_emp_avg_info": {
      "filter": {
        "range": {
          "age": {
            "gt": 30
          }
        }
      },
      "aggs": {
        "gt_30_emp_avg_sal": {
          "avg": {
            "field": "sal"
          }
        }
      }
    }
  }
}

3.2 推荐搜索

在搜索过程中，因为单词的拼写错误，没有得到任何的结果，希望ES能够给我们一个推荐搜索。

GET movies/_search
{
  "suggest": {
  	# title_suggestion为我们自定义的名字
    "title_suggestion": {
      "text": "drema",
      "term": {
        "field": "title",
        "suggest_mode": "popular"
      }
    }
  }
}

suggest_mode，有三个值：popular、missing、always

popular 是推荐词频更高的一些搜索。

missing 是当没有要搜索的结果的时候才推荐。 (默认值)

always无论什么情况下都进行推荐。

GET movies/_search
{
  "suggest": {
    "title_suggestion": {
      "text": "beauti",
      "term": {
        "field": "title"
      }
    }
  }
}

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "suggest" : {
    "title_suggestion" : [
      {
        "text" : "beauti",
        "offset" : 0,
        "length" : 6,
        "options" : [
          {
            "text" : "beauty",
            "score" : 0.8333333,
            "freq" : 66
          },
          {
            "text" : "beasts",
            "score" : 0.6666666,
            "freq" : 9
          },
          {
            "text" : "beauties",
            "score" : 0.6666666,
            "freq" : 5
          },
          {
            "text" : "beastie",
            "score" : 0.6666666,
            "freq" : 2
          },
          {
            "text" : "beatie",
            "score" : 0.6666666,
            "freq" : 1
          }
        ]
      }
    ]
  }
}

GET movies/_search
{
  "suggest": {
    "title_suggestion": {
      "text": "beauty",
      "term": {
        "field": "title"
      }
    }
  }
}

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "suggest" : {
    "title_suggestion" : [
      {
        "text" : "beauty",
        "offset" : 0,
        "length" : 6,
        "options" : [ ]
      }
    ]
  }
}

GET movies/_search
{
  "suggest": {
    "title_suggestion": {
      "text": "beauty",
      "term": {
        "field": "title",
        "suggest_mode": "always"
      }
    }
  }
}

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "suggest" : {
    "title_suggestion" : [
      {
        "text" : "beauty",
        "offset" : 0,
        "length" : 6,
        "options" : [
          {
            "text" : "beasts",
            "score" : 0.6666666,
            "freq" : 9
          },
          {
            "text" : "bearly",
            "score" : 0.6666666,
            "freq" : 1
          },
          {
            "text" : "beastly",
            "score" : 0.6666666,
            "freq" : 1
          },
          {
            "text" : "beast",
            "score" : 0.6,
            "freq" : 74
          },
          {
            "text" : "betty",
            "score" : 0.6,
            "freq" : 13
          }
        ]
      }
    ]
  }
}

GET movies/_search
{
  "suggest": {
    "title_suggestion": {
      "text": "beauty",
      "term": {
        "field": "title",
        "suggest_mode": "popular"
      }
    }
  }
}

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "suggest" : {
    "title_suggestion" : [
      {
        "text" : "beauty",
        "offset" : 0,
        "length" : 6,
        "options" : [
          {
            "text" : "beast",
            "score" : 0.6,
            "freq" : 74
          }
        ]
      }
    ]
  }
}

3.3 自动补全

自动补全应该是我们在日常的开发过程中最常见的搜索方式了，如百度搜索和京东商品搜索。

自动补全的功能对性能的要求极高，用户每发送输入一个字符就要发送一个请求去查找匹配项。 ES采取了不同的数据结构来实现，并不是通过倒排索引来实现的；需要将对应的数据类型设置为 completion ; 所以在将数据索引进ES之前需要先定义 mapping 信息。

3.3.1 查看mapping

GET movies/_mapping

{
  "movies" : {
    "mappings" : {
      "properties" : {
        "@version" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "genre" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "id" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "title" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "year" : {
          "type" : "long"
        }
      }
    }
  }
}

3.3.2 删索引、重新定义mapping、重新导数据

先查询mapping

GET movies/_mapping

把查询到的mapping做修改，删除索引后再执行创建新mapping，再导入数据

PUT movies
{
  "mappings": {
    "properties": {
      "@version": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "genre": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "id": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "title": {
        "type": "completion"
      },
      "year": {
        "type": "long"
      }
    }
  }
}

DELETE movies

3.3.3 前缀搜索

GET movies/_search
{
  "_source": [""], 
  "suggest": {
    "title_prefix_suggest": {
      "prefix": "bu",
      "completion": {
        "field": "title",
        "skip_duplicates": true,
        "size": 10
      }
    }
  }
}

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "suggest" : {
    "title_prefix_suggest" : [
      {
        "text" : "bu",
        "offset" : 0,
        "length" : 2,
        "options" : [
          {
            "text" : "'burbs, The",
            "_index" : "movies",
            "_type" : "_doc",
            "_id" : "2072",
            "_score" : 1.0,
            "_source" : { }
          },
          {
            "text" : "Bubba Ho-tep",
            "_index" : "movies",
            "_type" : "_doc",
            "_id" : "6755",
            "_score" : 1.0,
            "_source" : { }
          },
          {
            "text" : "Bubble",
            "_index" : "movies",
            "_type" : "_doc",
            "_id" : "38188",
            "_score" : 1.0,
            "_source" : { }
          },
          {
            "text" : "Bubble Boy",
            "_index" : "movies",
            "_type" : "_doc",
            "_id" : "4732",
            "_score" : 1.0,
            "_source" : { }
          },
          {
            "text" : "Bubble, The",
            "_index" : "movies",
            "_type" : "_doc",
            "_id" : "55132",
            "_score" : 1.0,
            "_source" : { }
          },
          {
            "text" : "Bubblegum",
            "_index" : "movies",
            "_type" : "_doc",
            "_id" : "188595",
            "_score" : 1.0,
            "_source" : { }
          },
          {
            "text" : "Bubblegum and Broken Fingers",
            "_index" : "movies",
            "_type" : "_doc",
            "_id" : "162072",
            "_score" : 1.0,
            "_source" : { }
          },
          {
            "text" : "Bubu",
            "_index" : "movies",
            "_type" : "_doc",
            "_id" : "143753",
            "_score" : 1.0,
            "_source" : { }
          },
          {
            "text" : "Buccaneer, The",
            "_index" : "movies",
            "_type" : "_doc",
            "_id" : "75994",
            "_score" : 1.0,
            "_source" : { }
          },
          {
            "text" : "Buchanan Rides Alone",
            "_index" : "movies",
            "_type" : "_doc",
            "_id" : "82298",
            "_score" : 1.0,
            "_source" : { }
          }
        ]
      }
    ]
  }
}

skip_duplicates: 表示忽略掉重复。

size: 表示返回多少条数据。

3.4 高亮显示

高亮显示在实际的应用中也会碰到很多，如下给出了百度和极客时间的两个高亮搜索的案例：

#将title和genre中所有的romance进行高亮显示
GET movies/_search
{
  "query": {
    "multi_match": {
      "query": "romance",
      "fields": ["title", "genre"]
    }
  },
  "highlight": {
    "pre_tags": "<span>",
    "post_tags": "</span>", 
    "fields": {
      "title": {},
      "genre": {
        "pre_tags": "<em>",
        "post_tags": "</em>"
      }
    }
  }
}

{
  "took" : 77,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 7428,
      "relation" : "eq"
    },
    "max_score" : 9.80649,
    "hits" : [
      {
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "2894",
        "_score" : 9.80649,
        "_source" : {
          "year" : 1999,
          "id" : "2894",
          "@version" : "1",
          "genre" : [
            "Drama",
            "Romance"
          ],
          "title" : "Romance"
        },
        "highlight" : {
          "genre" : [
            "<em>Romance</em>"
          ],
          "title" : [
            "<span>Romance</span>"
          ]
        }
      },
      {
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "116867",
        "_score" : 9.80649,
        "_source" : {
          "year" : 1930,
          "id" : "116867",
          "@version" : "1",
          "genre" : [
            "Drama",
            "Romance"
          ],
          "title" : "Romance"
        },
        "highlight" : {
          "genre" : [
            "<em>Romance</em>"
          ],
          "title" : [
            "<span>Romance</span>"
          ]
        }
      },
      {
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "124991",
        "_score" : 9.80649,
        "_source" : {
          "year" : 2008,
          "id" : "124991",
          "@version" : "1",
          "genre" : [
            "Romance"
          ],
          "title" : "Romance"
        },
        "highlight" : {
          "genre" : [
            "<em>Romance</em>"
          ],
          "title" : [
            "<span>Romance</span>"
          ]
        }
      },
      {
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "3501",
        "_score" : 8.259426,
        "_source" : {
          "year" : 1985,
          "id" : "3501",
          "@version" : "1",
          "genre" : [
            "Comedy",
            "Romance"
          ],
          "title" : "Murphy's Romance"
        },
        "highlight" : {
          "genre" : [
            "<em>Romance</em>"
          ],
          "title" : [
            "Murphy's <span>Romance</span>"
          ]
        }
      },
      {
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "555",
        "_score" : 8.259426,
        "_source" : {
          "year" : 1993,
          "id" : "555",
          "@version" : "1",
          "genre" : [
            "Crime",
            "Thriller"
          ],
          "title" : "True Romance"
        },
        "highlight" : {
          "title" : [
            "True <span>Romance</span>"
          ]
        }
      },
      {
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "40342",
        "_score" : 8.259426,
        "_source" : {
          "year" : 2005,
          "id" : "40342",
          "@version" : "1",
          "genre" : [
            "Comedy",
            "Drama",
            "Musical",
            "Romance"
          ],
          "title" : "Romance & Cigarettes"
        },
        "highlight" : {
          "genre" : [
            "<em>Romance</em>"
          ],
          "title" : [
            "<span>Romance</span> & Cigarettes"
          ]
        }
      },
      {
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "149446",
        "_score" : 8.259426,
        "_source" : {
          "year" : 2010,
          "id" : "149446",
          "@version" : "1",
          "genre" : [
            "Comedy",
            "Drama"
          ],
          "title" : "Petty Romance"
        },
        "highlight" : {
          "title" : [
            "Petty <span>Romance</span>"
          ]
        }
      },
      {
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "150016",
        "_score" : 8.259426,
        "_source" : {
          "year" : 2012,
          "id" : "150016",
          "@version" : "1",
          "genre" : [
            "Comedy",
            "Drama"
          ],
          "title" : "Brasserie Romance"
        },
        "highlight" : {
          "title" : [
            "Brasserie <span>Romance</span>"
          ]
        }
      },
      {
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "133712",
        "_score" : 8.259426,
        "_source" : {
          "year" : 1977,
          "id" : "133712",
          "@version" : "1",
          "genre" : [
            "Comedy",
            "Romance"
          ],
          "title" : "Office Romance"
        },
        "highlight" : {
          "genre" : [
            "<em>Romance</em>"
          ],
          "title" : [
            "Office <span>Romance</span>"
          ]
        }
      },
      {
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "5769",
        "_score" : 8.259426,
        "_source" : {
          "year" : 1981,
          "id" : "5769",
          "@version" : "1",
          "genre" : [
            "Comedy",
            "Romance"
          ],
          "title" : "Modern Romance"
        },
        "highlight" : {
          "genre" : [
            "<em>Romance</em>"
          ],
          "title" : [
            "Modern <span>Romance</span>"
          ]
        }
      }
    ]
  }
}

#查询2012年电影的名字中包含romance的电影，将title中romance进行高亮显示，同时将这些电影中genre包含Children单纯进行高亮显示
GET movies/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "year": "2012"
          }
        },
        {
          "match": {
            "title": "romance"
          }
        }
      ]
    }
  },
  "highlight": {
    "fields": {
      "title": {},
      "genre": {
        "pre_tags": "<span>",
        "post_tags": "</span>",
        "highlight_query": {
          "match": {
            "genre": "Children"
          }
        }
      }
    }
  }
}

{
  "took" : 9,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 9.259426,
    "hits" : [
      {
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "150016",
        "_score" : 9.259426,
        "_source" : {
          "year" : 2012,
          "id" : "150016",
          "@version" : "1",
          "genre" : [
            "Comedy",
            "Drama"
          ],
          "title" : "Brasserie Romance"
        },
        "highlight" : {
          "title" : [
            "Brasserie <em>Romance</em>"
          ]
        }
      },
      {
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "158946",
        "_score" : 7.2784586,
        "_source" : {
          "year" : 2012,
          "id" : "158946",
          "@version" : "1",
          "genre" : [
            "Children",
            "Romance"
          ],
          "title" : "A Taste of Romance"
        },
        "highlight" : {
          "genre" : [
            "<span>Children</span>"
          ],
          "title" : [
            "A Taste of <em>Romance</em>"
          ]
        }
      }
    ]
  }
}

四、分词器安装

4.1 ik分词器

4.1.1 下载

https://github.com/medcl/elasticsearch-analysis-ik/releases

4.1.2 安装

IK分词器在任何操作系统下安装步骤均⼀样: 在ES的家⽬录下的 plugins ⽬录下创建名为 ik 的⽂件夹，然后将下载后的 zip 包拷⻉到 ik 解压即可

IK分词器提供了两种分词⽅式：

分词器名称	说明
ik_smart	会做最粗粒度的拆分，⽐如会将“中华⼈⺠共和国国歌”拆分为“中华⼈⺠共和国,国歌”，适合 Phrase 查询
ik_max_word	会将⽂本做最细粒度的拆分，⽐如会将“中华⼈⺠共和国国歌”拆分为“中华⼈⺠共和国,中华⼈⺠,中华,华⼈,⼈⺠共和国,⼈⺠,⼈,⺠,共和国,共和,和,国国,国歌”，会穷尽各种可能的组合，适合 Term Query；

4.1.3 验证

standard分词器处理不了中文

GET _analyze
{
  "analyzer": "standard",
  "text": "教育"
}

{
  "tokens" : [
    {
      "token" : "教",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "<IDEOGRAPHIC>",
      "position" : 0
    },
    {
      "token" : "育",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "<IDEOGRAPHIC>",
      "position" : 1
    }
  ]
}

使⽤ ik_smart 分词器

GET _analyze
{
  "analyzer": "ik_smart",
  "text": "教育"
}

{
  "tokens" : [
    {
      "token" : "教育",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 0
    }
  ]
}

GET _analyze
{
  "analyzer": "ik_smart",
  "text": "中华人民共和国"
}

{
  "tokens" : [
    {
      "token" : "中华人民共和国",
      "start_offset" : 0,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 0
    }
  ]
}

使⽤ ik_max_word 分词器

GET _analyze
{
  "analyzer": "ik_max_word",
  "text": "中华人民共和国"
}

{
  "tokens" : [
    {
      "token" : "中华人民共和国",
      "start_offset" : 0,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "中华人民",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "中华",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "华人",
      "start_offset" : 1,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 3
    },
    {
      "token" : "人民共和国",
      "start_offset" : 2,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 4
    },
    {
      "token" : "人民",
      "start_offset" : 2,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 5
    },
    {
      "token" : "共和国",
      "start_offset" : 4,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 6
    },
    {
      "token" : "共和",
      "start_offset" : 4,
      "end_offset" : 6,
      "type" : "CN_WORD",
      "position" : 7
    },
    {
      "token" : "国",
      "start_offset" : 6,
      "end_offset" : 7,
      "type" : "CN_CHAR",
      "position" : 8
    }
  ]
}

4.1.4 ⾃定义词库

在很多的时候，业务上的⼀些词库极有可能不在IK分词器的词库中，需要去定制属于我们⾃⼰的词库。例如下⾯的例⼦中， 正井猫 、 up主 被切分为⼀个个的字，我们希望这两个词语是不被拆分；另外 的 作为中⽂的停顿词，也不希望出现在分词中，所以我们需要⾃定义词库和停顿词词库。

GET _analyze
{
  "analyzer": "ik_smart",
  "text": "请关注正井猫up主，你们的支持是我坚持的动力。"
}

{
  "tokens" : [
    {
      "token" : "请",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "CN_CHAR",
      "position" : 0
    },
    {
      "token" : "关注",
      "start_offset" : 1,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "正",
      "start_offset" : 3,
      "end_offset" : 4,
      "type" : "CN_CHAR",
      "position" : 2
    },
    {
      "token" : "井",
      "start_offset" : 4,
      "end_offset" : 5,
      "type" : "CN_CHAR",
      "position" : 3
    },
    {
      "token" : "猫",
      "start_offset" : 5,
      "end_offset" : 6,
      "type" : "CN_CHAR",
      "position" : 4
    },
    {
      "token" : "up",
      "start_offset" : 6,
      "end_offset" : 8,
      "type" : "ENGLISH",
      "position" : 5
    },
    {
      "token" : "主",
      "start_offset" : 8,
      "end_offset" : 9,
      "type" : "CN_CHAR",
      "position" : 6
    },
    {
      "token" : "你们",
      "start_offset" : 10,
      "end_offset" : 12,
      "type" : "CN_WORD",
      "position" : 7
    },
    {
      "token" : "的",
      "start_offset" : 12,
      "end_offset" : 13,
      "type" : "CN_CHAR",
      "position" : 8
    },
    {
      "token" : "支持",
      "start_offset" : 13,
      "end_offset" : 15,
      "type" : "CN_WORD",
      "position" : 9
    },
    {
      "token" : "是",
      "start_offset" : 15,
      "end_offset" : 16,
      "type" : "CN_CHAR",
      "position" : 10
    },
    {
      "token" : "我",
      "start_offset" : 16,
      "end_offset" : 17,
      "type" : "CN_CHAR",
      "position" : 11
    },
    {
      "token" : "坚持",
      "start_offset" : 17,
      "end_offset" : 19,
      "type" : "CN_WORD",
      "position" : 12
    },
    {
      "token" : "的",
      "start_offset" : 19,
      "end_offset" : 20,
      "type" : "CN_CHAR",
      "position" : 13
    },
    {
      "token" : "动力",
      "start_offset" : 20,
      "end_offset" : 22,
      "type" : "CN_WORD",
      "position" : 14
    }
  ]
}

进⼊到 $ES_HOME/plugins/ik/config ⽬录下，创建 custom ⽬录，在⽬录下创建 mydic.dic 、 ext_stopword.dic ⽂件。(文件名可以自定义，但必须是.dic文件)

在 mydic.dic ⽂件中添加两⾏内容：

正井猫
up主

在 ext_stopword.dic 中添加⼀⾏内容:

的
是

最后修改 $ES_HOME/plugins/ik/config/IKAnalyzer.cfg.xml ⽂件，内容如下：

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
	<comment>IK Analyzer 扩展配置</comment>
	<!--用户可以在这里配置自己的扩展字典 -->
	<entry key="ext_dict">custom/mydic.dic</entry>
	 <!--用户可以在这里配置自己的扩展停止词字典-->
	<entry key="ext_stopwords">custom/ext_stopword.dic</entry>
	<!--用户可以在这里配置远程扩展字典 -->
	<!-- <entry key="remote_ext_dict">words_location</entry> -->
	<!--用户可以在这里配置远程扩展停止词字典-->
	<!-- <entry key="remote_ext_stopwords">words_location</entry> -->
</properties>

启重启elasticsearch elasticsearch ，重新执⾏如上的命令，结果如下：

GET _analyze
{
  "analyzer": "ik_smart",
  "text": "请关注正井猫up主，你们的支持是我坚持的动力。"
}

{
  "tokens" : [
    {
      "token" : "请",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "CN_CHAR",
      "position" : 0
    },
    {
      "token" : "关注",
      "start_offset" : 1,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "正井猫",
      "start_offset" : 3,
      "end_offset" : 6,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "up主",
      "start_offset" : 6,
      "end_offset" : 9,
      "type" : "CN_WORD",
      "position" : 3
    },
    {
      "token" : "你们",
      "start_offset" : 10,
      "end_offset" : 12,
      "type" : "CN_WORD",
      "position" : 4
    },
    {
      "token" : "支持",
      "start_offset" : 13,
      "end_offset" : 15,
      "type" : "CN_WORD",
      "position" : 5
    },
    {
      "token" : "我",
      "start_offset" : 16,
      "end_offset" : 17,
      "type" : "CN_CHAR",
      "position" : 6
    },
    {
      "token" : "坚持",
      "start_offset" : 17,
      "end_offset" : 19,
      "type" : "CN_WORD",
      "position" : 7
    },
    {
      "token" : "动力",
      "start_offset" : 20,
      "end_offset" : 22,
      "type" : "CN_WORD",
      "position" : 8
    }
  ]
}

GET _analyze
{
  "analyzer": "ik_max_word",
  "text": "请关注正井猫up主，你们的支持是我坚持的动力。"
}

{
  "tokens" : [
    {
      "token" : "请",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "CN_CHAR",
      "position" : 0
    },
    {
      "token" : "关注",
      "start_offset" : 1,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "正井猫",
      "start_offset" : 3,
      "end_offset" : 6,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "up主",
      "start_offset" : 6,
      "end_offset" : 9,
      "type" : "CN_WORD",
      "position" : 3
    },
    {
      "token" : "up",
      "start_offset" : 6,
      "end_offset" : 8,
      "type" : "ENGLISH",
      "position" : 4
    },
    {
      "token" : "主",
      "start_offset" : 8,
      "end_offset" : 9,
      "type" : "CN_CHAR",
      "position" : 5
    },
    {
      "token" : "你们",
      "start_offset" : 10,
      "end_offset" : 12,
      "type" : "CN_WORD",
      "position" : 6
    },
    {
      "token" : "支持",
      "start_offset" : 13,
      "end_offset" : 15,
      "type" : "CN_WORD",
      "position" : 7
    },
    {
      "token" : "我",
      "start_offset" : 16,
      "end_offset" : 17,
      "type" : "CN_CHAR",
      "position" : 8
    },
    {
      "token" : "坚持",
      "start_offset" : 17,
      "end_offset" : 19,
      "type" : "CN_WORD",
      "position" : 9
    },
    {
      "token" : "动力",
      "start_offset" : 20,
      "end_offset" : 22,
      "type" : "CN_WORD",
      "position" : 10
    }
  ]
}

4.1.5 创建mapping指定分词器(不指定默认standard),analyzer是指定索引进es时用的分词器,search_analyzer是指定搜索时指定的分词器

PUT news
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "ik_max_word",
        "search_analyzer": "ik_smart"
      },
      "content": {
        "type": "text",
        "analyzer": "ik_max_word",
        "search_analyzer": "ik_smart"
      }
    }
  }
}

自定义分词器后,已有的数据还可以重新分词索引(POST news/_update_by_query)

ik分词器的应用以及动态重新索引数据

ik分词器动态词库的添加

4.2 pinyin分词器

4.2.1 下载

下载地址:https://github.com/medcl/elasticsearch-analysis-pinyin/releases

4.2.2 安装

pinyin 分词器在任何操作系统下安装步骤均⼀样: 在ES的家⽬录下的 plugins ⽬录下创建名为 pinyin 的⽂件夹，然后将下载后的 zip 包拷⻉到 pinyin 解压即可

4.2.3 验证

执⾏如下命令：

GET _analyze
{
 "analyzer": "pinyin",
 "text": "正井猫"
}

拼音分词器的高级应用(一)

拼音分词器的高级应用(二)

4.3 ⾃定义分词器以及应⽤

对于 <p>刘德华</p> ，现在想要得到如下的分词结果

{
  "tokens": [
    {
      "token": "刘德华",
      "start_offset": 0,
      "end_offset": 3,
      "type": "word",
      "position": 0
    },
    {
      "token": "liudehua",
      "start_offset": 0,
      "end_offset": 3,
      "type": "word",
      "position": 0
    },
    {
      "token": "ldh",
      "start_offset": 0,
      "end_offset": 3,
      "type": "word",
      "position": 0
    }
  ]
}

4.3.1 设置分词器

PUT test
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "char_filter": [
            "html_strip"
          ],
          "tokenizer": "keyword",
          "filter": "my_pinyin_filter"
        }
      },
      "filter": {
        "my_pinyin_filter": {
          "type": "pinyin",
          "keep_first_letter": true,
          "keep_full_pinyin": false,
          "keep_joined_full_pinyin": true,
          "keep_original": true,
          "keep_none_chinese": false,
          "keep_none_chinese_in_joined_full_pinyin": true
        }
      }
    }
  }
}

4.3.2 验证分词器效果

GET test/_analyze
{
    "analyzer": "my_analyzer",
    "text": ["刘德华"]
}

4.3.3 为属性添加分词器

设定 mappings 信息，指定索引 test 的 name 属性的 analyzer ⾃定义的分词器。

PUT test/_mapping
{
    "properties": {
        "name": {
            "type": "completion",
            "analyzer": "my_analyzer"
        }
    }
}

4.3.4 结果验证

实现效果

执⾏如下命令添加数据

POST test/_bulk
{"index": {}}
{"name": "刘德华"}
{"index": {}}
{"name": "张学友"}
{"index": {}}
{"name": "柳岩"}

执⾏前缀建议语句

通过如上最后⼀个结果⼤家仔细去理解《通过如上最后⼀个结果⼤家仔细去理解《Elasticsearch Elasticsearch教程教程((⼀⼀))》中，第》中，第55节的开始标红的节的开始标红的那句话。那句话。

五、MySQL数据导⼊到ES

将MySQL的初始化数据导⼊到ES的⽅式可以通过程序的⽅式和⼯具的⽅式。本教程使⽤ Logstash 来初始化导⼊。⾸先将 MySQL 的驱动包拷⻉到 $logStash/logstashcore/lib/jars/ ⽬录下；在 $logstash/config/ ⽬录下创建名为 logstash-mysqlnews.conf 的⽂件，⽂件内容如下:

input {
    jdbc {
        jdbc_driver_class => "com.mysql.cj.jdbc.Driver"
        jdbc_connection_string => "jdbc:mysql://localhost:3306/es?
        useSSL=false&serverTimezone=UTC"
        jdbc_user => root
        jdbc_password => "123456"
        #启⽤追踪，如果为true，则需要指定tracking_column
        use_column_value => true
        #指定追踪的字段，
        tracking_column => id
        #追踪字段的类型，⽬前只有数字(numeric)和时间类型(timestamp)，默认是数字类型
        tracking_column_type => "numeric"
        #记录最后⼀次运⾏的结果
        record_last_run => true
        #上⾯运⾏结果的保存位置
        last_run_metadata_path => "mysql-position.txt"
        statement => "SELECT * FROM news where id > :sql_last_value"
        schedule => "* * * * * *"
    }
}
filter {
    mutate {
    	split => { "tags" => ","}
    }
}
output {
    elasticsearch {
        document_id => "%{id}"
        document_type => "_doc"
        index => "news"
        hosts => ["http://localhost:9200"]
    }
    stdout{
    	codec => rubydebug
    }
}

六、视频/博客/项目引用

1.Elasticsearch(7.8.1)沥血之作(包含仿百度搜索案例)_哔哩哔哩_bilibili

2.kiramie/elasticsearch-demo1 (gitee.com)

3.kiramie/elasticsearch-demo2 (gitee.com)

4.Elasticsearch入门及掌握其JavaAPI - 掘金 (juejin.cn)

5.ES + Spring boot的正确姿势（ES系列三） - 掘金 (juejin.cn)

posted @ 2021-10-06 17:08 no1486 阅读(194) 评论(0) 编辑收藏举报

刷新页面返回顶部

no1486

Elasticsearch7.8.0教程（二）

Elasticsearch7.8.0教程（二）

一. Request Body深入搜索

1.1 term查询

1.1.1 查询电影名字中包含有 beautiful 这个单词的所有的电影，用于查询的单词不会进行分词的处理

1.1.2 查询电影名字中包含有 beautiful 或者 mind 这两个单词的所有的电影，用于查询的单词不会进行分词的处理

1.1.3 查询上映在2016到2018年的所有的电影，再根据上映时间的倒序进行排序

1.1.4 Constant Score查询(只能用term查询) title中包含有beautiful的所有的电影，不进行相关性算分，查询的数据进行缓存，提高效率

1.2 全文查询

1.2.1 match 查询title中包含beautiful或mind的数据

1.2.2 match 查询title中包含beautiful或mind的数据，指定查询属性

1.2.3 match 查询年份区间为[1990，1992]的数据

1.2.4 match 查询年份区间为[1990，1992]的数据，并分页

1.2.5 match 查询年份区间为[1990，1992]的数据，并且title包含beautiful或mind

1.2.6 match_phrase 查询电影名字中包含有 "beautiful mind" 这个短语的所有的数据(以下三个查询一个效果)

1.2.7 match_all 查询所有的数据

1.2.8 multi_match 查询title或genre中包含有beautiful或者Adventure的前20条数据

1.2.9 query_string

1.2.10 simple_query_string

1.3 fuzzy 模糊搜索

1.4 多条件查询

1.4.1 查询title中包含有beautiful或者mind单词，并且上映时间在2016~1018年的所有的电影

1.4.2 查询title中包含有beautiful或者mind，且不包含brain,上映时间在2016~1018年的所有的电影

1.4.3 查询 title 中包含有 beautiful这个单词，并且上映年份在1990~1992年间的所有电影，但是不 进行相关性的算分

1.4.4 查询 title 中包含有 beautiful这个单词，或者上映年份在1990~1992年间的所有电影

二. Mapping

2.1数据类型

2.2 Mapping的定义

2.3 常见参数

2.3.1 index

2.3.2 null_value

三、高级搜索

3.1 聚合查询

3.1.1 单值的输出

3.1.2 多值的输出

3.2 推荐搜索

3.3 自动补全

3.3.1 查看mapping

3.3.2 删索引、重新定义mapping、重新导数据

3.3.3 前缀搜索

3.4 高亮显示

四、分词器安装

4.1 ik分词器

4.1.1 下载

4.1.2 安装

4.1.3 验证

4.1.4 ⾃定义词库

4.1.5 创建mapping指定分词器(不指定默认standard),analyzer是指定索引进es时用的分词器,search_analyzer是指定搜索时指定的分词器

4.2 pinyin分词器

4.2.1 下载

4.2.2 安装

4.2.3 验证

4.3 ⾃定义分词器以及应⽤

4.3.1 设置分词器

4.3.2 验证分词器效果

4.3.3 为属性添加分词器

4.3.4 结果验证

五、MySQL数据导⼊到ES

六、视频/博客/项目引用

公告

1.4.3 查询 title 中包含有 beautiful这个单词，并且上映年份在1990~1992年间的所有电影，但是不进行相关性的算分