Elasticsearch7.8.0教程(二)

Elasticsearch7.8.0教程(二)

一. Request Body深入搜索

1.1 term查询

term是表达语义的最小单位,在搜索的时候基本都要使用到term。

term查询的种类有:Term Query、Range Query等。

在ES中,Term查询不会对输入进行分词处理,将输入作为一个整体,在倒排索引中查找准确的词项。 我们也可以使用 Constant Score 将查询转换为一个filter,避免算分,利用缓存,提高查询的效率。

1.1.1 查询电影名字中包含有 beautiful 这个单词的所有的电影,用于查询的单词不会进行分词的处理

#查询到98条结果
GET movies/_search
{
  "query": {
    "term": {
      "title": {
        "value": "beautiful"
      }
    }
  }
}

#查询到0条结果(term查询不会分词,相当于用Beautiful去查询,但user的索引存入时默认standard分词,倒排索引关键词是beautiful)
GET movies/_search
{
  "query": {
    "term": {
      "title": {
        "value": "Beautiful"
      }
    }
  }
}

1.1.2 查询电影名字中包含有 beautiful 或者 mind 这两个单词的所有的电影,用于查询的单词不会进行分词的处理

GET movies/_search
{
  "query": {
    "terms": {
      "title": [
        "beautiful",
        "mind"
      ]
    }
  }
}

1.1.3 查询上映在2016到2018年的所有的电影,再根据上映时间的倒序进行排序

GET movies/_search
{
  "query": {
    "range": {
      "year": {
        "gte": 2016,
        "lte": 2018
      }
    }
  },
  "sort": [
    {
      "year": {
        "order": "desc"
      }
    }
  ]
}

1.1.4 Constant Score查询(只能用term查询) title中包含有beautiful的所有的电影,不进行相关性算分,查询的数据进行缓存,提高效率

GET movies/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "term": {
          "title": "beautiful"
        }
      },
      "boost": 1.2
    }
  }
}

1.2 全文查询

全文查询的种类有: Match Query、Match Phrase Query、Query String Query等

索引和搜索的时候都会进行分词,在查询的时候,会对输入进行分词,然后每个词项会逐个到底层进行 查询,将最终的结果进行合并

1.2.1 match 查询title中包含beautiful或mind的数据

GET movies/_search
{
  "query": {
    "match": {
      "title": "beautiful mind"
    }
  }
}

1.2.2 match 查询title中包含beautiful或mind的数据,指定查询属性

GET movies/_search
{
  "_source": ["title", "id", "year"], 
  "query": {
    "match": {
      "title": "beautiful mind"
    }
  }
}

1.2.3 match 查询年份区间为[1990,1992]的数据

GET movies/_search
{
  "query": {
    "range": {
      "year": {
        "gte": 1990,
        "lte": 1992
      }
    }
  }
}

1.2.4 match 查询年份区间为[1990,1992]的数据,并分页

GET movies/_search
{
  "query": {
    "range": {
      "year": {
        "gte": 1990,
        "lte": 1992
      }
    }
  },
  "from": 5,
  "size": 10
}

1.2.5 match 查询年份区间为[1990,1992]的数据,并且title包含beautiful或mind

GET movies/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "range": {
            "year": {
              "gte": 1990,
              "lte": 1992
            }
          }
        },
        {
          "match": {
            "title": "beautiful mind"
          }
        }
      ]
    }
  }
}

#报错,query只能一种查询
GET movies/_search
{
  "_source": ["title", "id", "year"], 
  "query": {
    "match": {
      "title": "beautiful mind"
    },
    "range": {
      "year": {
        "gte": 1990,
        "lte": 1992
      }
    }
  }
}

1.2.6 match_phrase 查询电影名字中包含有 "beautiful mind" 这个短语的所有的数据(以下三个查询一个效果)

GET movies/_search
{
  "query": {
    "match_phrase": {
      "title": "beautiful mind"
    }
  }
}

GET movies/_search
{
  "query": {
    "match_phrase": {
      "title": "Beautiful mind"
    }
  }
}

GET movies/_search
{
  "query": {
    "match_phrase": {
      "title": "BEautiful mind"
    }
  }
}
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 13.474829,
    "hits" : [
      {
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "4995",
        "_score" : 13.474829,
        "_source" : {
          "title" : "Beautiful Mind, A",
          "genre" : [
            "Drama",
            "Romance"
          ],
          "year" : 2001,
          "id" : "4995",
          "@version" : "1"
        }
      }
    ]
  }
}

1.2.7 match_all 查询所有的数据

GET movies/_search
{
  "query": {
    "match_all": {}
  }
}

#和不加request body等同
GET movies/_search

1.2.8 multi_match 查询title或genre中包含有beautiful或者Adventure的前20条数据

GET movies/_search
{
  "query": {
    "multi_match": {
      "query": "beautiful adventure",
      "fields": ["title", "genre"]
    }
  },
  "size": 20
}

1.2.9 query_string

#this或that
GET movies/_search
{
  "query": {
    "query_string": {
      "default_field": "title",
      "query": "this that"
    }
  }
}

#this或that
GET movies/_search
{
  "query": {
    "query_string": {
      "default_field": "title",
      "query": "this that",
      "default_operator": "OR"
    }
  }
}

#this和that
GET movies/_search
{
  "query": {
    "query_string": {
      "default_field": "title",
      "query": "this AND that"
    }
  }
}

#this和that
GET movies/_search
{
  "query": {
    "query_string": {
      "default_field": "title",
      "query": "this that",
      "default_operator": "AND"
    }
  }
}

1.2.10 simple_query_string

查询title中包含 beautiful或and或mind

GET movies/_search
{
  "query": {
    "simple_query_string": {
      "query": "beautiful AND mind",
      "fields": ["title"]
    }
  }
}

查询title中包含 beautiful或and

GET movies/_search
{
  "query": {
    "simple_query_string": {
      "query": "beautiful mind",
      "fields": ["title"],
      "default_operator": "AND"
    }
  }
}

GET movies/_search
{
  "query": {
    "simple_query_string": {
      "query": "beautiful + mind",
      "fields": ["title"]
    }
  }
}

查询title中包含 "beautiful mind" 这个短语的所有的电影 (用法和match_phrase类似)

GET movies/_search
{
  "query": {
    "simple_query_string": {
      "query": "\"beautiful mind\"",
      "fields": ["title"]
    }
  }
}

查询title或genre中包含有 beautiful mind romance 这个三个单词的所有的电影 (与 multi_match类似)

GET movies/_search
{
  "query": {
    "simple_query_string": {
      "query": "beautiful mind Romance",
      "fields": ["title", "genre"]
    }
  }
}

查询title中包含 “beautiful mind” 或者 "Modern Romance" 这两个短语的所有的电影

GET movies/_search
{
  "query": {
    "simple_query_string": {
      "query": "\"beautiful mind\" | \"Modern Romance\"",

      "fields": ["title", "genre"]
    }
  }
}

查询title或者genre中包含有 beautiful + mind 这个两个词,或者Comedy + Romance + Musical + Drama + Children 这个五个词的所有的数据

GET movies/_search
{
  "query": {
    "simple_query_string": {
      "query": "(beautiful + mind) | (Comedy + Romance + Musical + Drama + Children)",
      "fields": ["title","genre"]
    }
  }
}

查询 title 中包含 beautiful 和 people 但是不包含 Animals 的所有的数据

GET movies/_search
{
  "query": {
    "simple_query_string": {
      "query": "beautiful + people + -Animals",
      "fields": ["title"]
    }
  }
}

1.3 fuzzy 模糊搜索

#3条结果
GET movies/_search
{
  "query": {
    "fuzzy": {
      "title": {
        "value": "neverending"
      }
    }
  }
}

neverending改为neverendign(一次调整);neverending改为neverendong(一次调整);neverending改为neverendogn(两次调整);neverending改为neverendoon(三次调整)

从以下结果来看:不加fuzziness默认调整1或2次,加上后指定调整次数查询,fuzziness的取值区间为[0, 2]

#3条结果
GET movies/_search
{
  "query": {
    "fuzzy": {
      "title": {
        "value": "neverendign"
      }
    }
  }
}

#3条结果
GET movies/_search
{
  "query": {
    "fuzzy": {
      "title": {
        "value": "neverendong"
      }
    }
  }
}

#3条结果
GET movies/_search
{
  "query": {
    "fuzzy": {
      "title": {
        "value": "neverendong",
        "fuzziness": 1
      }
    }
  }
}

#3条结果
GET movies/_search
{
  "query": {
    "fuzzy": {
      "title": {
        "value": "neverendong",
        "fuzziness": 2
      }
    }
  }
}

#3条结果
GET movies/_search
{
  "query": {
    "fuzzy": {
      "title": {
        "value": "neverendogn"
      }
    }
  }
}

#0条结果
GET movies/_search
{
  "query": {
    "fuzzy": {
      "title": {
        "value": "neverendogn",
        "fuzziness": 1
      }
    }
  }
}

#3条结果
GET movies/_search
{
  "query": {
    "fuzzy": {
      "title": {
        "value": "neverendogn",
        "fuzziness": 2
      }
    }
  }
}

#0条结果
GET movies/_search
{
  "query": {
    "fuzzy": {
      "title": {
        "value": "neverendoon"
      }
    }
  }
}

#0条结果
GET movies/_search
{
  "query": {
    "fuzzy": {
      "title": {
        "value": "neverendoon",
        "fuzziness": 3
      }
    }
  }
}

查询title中从第6个字母开始只要最多纠正一次,就与 neverendign 匹配的所有的数据

GET movies/_search
{
  "query": {
    "fuzzy": {
      "title": {
        "value": "neverendign",
        "fuzziness": 1, 
        "prefix_length": 5
      }
    }
  }
}

1.4 多条件查询

1.4.1 查询title中包含有beautiful或者mind单词,并且上映时间在2016~1018年的所有的电影

GET movies/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "simple_query_string": {
            "query": "beautiful mind",
            "fields": ["title"]
          }
        },
        {
          "range": {
            "year": {
              "gte": 2016,
              "lte": 2018
            }
          }
        }
      ]
    }
  }
}

1.4.2 查询title中包含有beautiful或者mind,且不包含brain,上映时间在2016~1018年的所有的电影

# must必须满足,must_not必须不满足,若只有must_not则不会进行相关性算分
GET movies/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "simple_query_string": {
            "query": "beautiful mind",
            "fields": ["title"]
          }
        },
        {
          "range": {
            "year": {
              "gte": 2016,
              "lte": 2018
            }
          }
        }
      ],
      "must_not": [
        {
          "simple_query_string": {
            "query": "brain",
            "fields": ["title"]
          }
        }
      ]
    }
  }
}

1.4.3 查询 title 中包含有 beautiful这个单词,并且上映年份在1990~1992年间的所有电影,但是不 进行相关性的算分

#filter不会进行相关性的算分,并且会将查出来的结果进行缓存,效率上比 must 高
GET movies/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "terms": {
            "title": [
              "beautiful"
            ]
          }
        },
        {
          "range": {
          "year": {
            "gte": 1990,
            "lte": 1992
          }
        }
        }
      ]
    }
  }
}

1.4.4 查询 title 中包含有 beautiful这个单词,或者上映年份在1990~1992年间的所有电影

GET movies/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "terms": {
            "title": [
              "beautiful"
            ]
          }
        },
        {
          "range": {
          "year": {
            "gte": 1990,
            "lte": 1992
          }
        }
        }
      ]
    }
  }
}

二. Mapping

mapping类似于数据库中的schema,作用如下:

  1. 定义索引中的字段类型;
  2. 定义字段的数据类型,例如:布尔、字符串、数字、日期.....
  3. 字段倒排索引的设置

2.1数据类型

类型名 描述
Text/Keyword 字符串, Keyword的意思是字符串的内容不会被分词处理,输入是什么内容,存 储在ES中就是什么内容。Text类型ES会自动的添加一个Keyword类型的子字段
Date 日期类型
Integer/Float/Long 数字类型
Boolean 布尔类型

ES中还有 "对象类型/嵌套类型"、"特殊类型(geo_point/geo_shape)"。

2.2 Mapping的定义

定义mapping的建议方式: 写入一个样本文档到临时索引中,ES会自动生成mapping信息,通过访问 mapping信息的api查询mapping的定义,修改自动生成的mapping成为我们需要方式,创建索引,删 除临时索引,简而言之就是 “卸磨杀驴” 。

语法格式如下:

PUT users
{
    "mappings": {
    // define your mappings here
    }
}

查看mapping

GET movies/_mapping

keyword搜索

GET movies/_search
{
  "query": {
    "match": {
      "title.keyword": "Julia"
    }
  }
}
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 9.717158,
    "hits" : [
      {
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "32234",
        "_score" : 9.717158,
        "_source" : {
          "title" : "Julia",
          "genre" : [
            "Drama"
          ],
          "year" : 1977,
          "id" : "32234",
          "@version" : "1"
        }
      },
      {
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "58937",
        "_score" : 9.717158,
        "_source" : {
          "title" : "Julia",
          "genre" : [
            "Drama",
            "Thriller"
          ],
          "year" : 2008,
          "id" : "58937",
          "@version" : "1"
        }
      },
      {
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "129333",
        "_score" : 9.717158,
        "_source" : {
          "title" : "Julia",
          "genre" : [
            "Horror",
            "Thriller"
          ],
          "year" : 2014,
          "id" : "129333",
          "@version" : "1"
        }
      }
    ]
  }
}

2.3 常见参数

2.3.1 index

可以给属性添加一个 布尔类型的index属性,标识该属性是否能被倒排索引,也就是说是否能通过 该字段进行搜索。

4jdWDS.png

2.3.2 null_value

在数据索引进ES的时候,当某些数据为 null 的时候,该数据是不能被搜索的,可以使用 null_value 属性指定一个值,当属性的值为 null 的时候,转换为一个通过 null_value 指 定的值。 null_value属性只能用于Keyword类型的属性

4jwZUH.png

4jw7IH.md.png

三、高级搜索

3.1 聚合查询

4jrcm6.png

聚合搜索的语法格式如下:

GET indexName/_search
{
    "aggs": {
        "aggs_name": { #聚合分析的名字是由用户自定义的
            "aggs_type": {
            // aggregation body
            }
        }
    }
}

给users索引创建mapping信息

PUT employee
{
  "mappings": {
    "properties": {
      "id": {
        "type": "integer"
      },
      "name": {
        "type": "keyword"
      },
      "job": {
        "type": "keyword"
      },
      "age": {
        "type": "integer"
      },
      "gender": {
        "type": "keyword"
      }
    }
  }
}
{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "employee"
}

往 users 索引中写入数据

PUT employee/_bulk
{"index": {"_id": 1}}
{"id": 1, "name": "Bob", "job": "java", "age": 21, "sal": 8000, "gender": "female"}
{"index": {"_id": 2}}
{"id": 2, "name": "Rod", "job": "html", "age": 31, "sal": 18000, "gender": "female"}
{"index": {"_id": 3}}
{"id": 3, "name": "Gaving", "job": "java", "age": 24, "sal": 12000, "gender": "male"}
{"index": {"_id": 4}}
{"id": 4, "name": "King", "job": "dba", "age": 26, "sal": 15000, "gender": "female"}
{"index": {"_id": 5}}
{"id": 5, "name": "Jonhson", "job": "dba", "age": 29, "sal": 16000, "gender": "male"}
{"index": {"_id": 6}}
{"id": 6, "name": "Douge", "job": "java", "age": 41, "sal": 20000, "gender": "female"}
{"index": {"_id": 7}}
{"id": 7, "name": "cutting", "job": "dba", "age": 27, "sal": 7000, "gender": "male"}
{"index": {"_id": 8}}
{"id": 8, "name": "Bona", "job": "html", "age": 22, "sal": 14000, "gender": "female"}
{"index": {"_id": 9}}
{"id": 9, "name": "Shyon", "job": "dba", "age": 20, "sal": 19000, "gender": "female"}
{"index": {"_id": 10}}
{"id": 10, "name": "James", "job": "html", "age": 18, "sal": 22000, "gender": "male"}
{"index": {"_id": 11}}
{"id": 11, "name": "Golsling", "job": "java", "age": 32, "sal": 23000, "gender": "female"}
{"index": {"_id": 12}}
{"id": 12, "name": "Lily", "job": "java", "age": 24, "sal": 2000, "gender": "male"}
{"index": {"_id": 13}}
{"id": 13, "name": "Jack", "job": "html", "age": 23, "sal": 3000, "gender": "female"}
{"index": {"_id": 14}}
{"id": 14, "name": "Rose", "job": "java", "age": 36, "sal": 6000, "gender": "female"}
{"index": {"_id": 15}}
{"id": 15, "name": "Will", "job": "dba", "age": 38, "sal": 4500, "gender": "male"}
{"index": {"_id": 16}}
{"id": 16, "name": "smith", "job": "java", "age": 32, "sal": 23000, "gender": "male"}

3.1.1 单值的输出

ES中大多数的数学计算只输出一个值,如:min、max、sum、avg、cardinality

# 1.查询工资的总合,sum_sal为自定义属性,作聚合还会查数据
GET employee/_search
{
  "aggs": {
    "sum_sal": {
      "sum": {
        "field": "sal"
      }
    }
  }
}
# 只查聚合的结果,不查数据
GET employee/_search
{
  "size": 0,
  "aggs": {
    "sum_sal": {
      "sum": {
        "field": "sal"
      }
    }
  }
}

# 2.查询平均工资
GET employee/_search
{
  "size": 0,
  "aggs": {
    "avg_sal": {
      "avg": {
        "field": "sal"
      }
    }
  }
}

# 3.查询总共有多少个岗位(对属性去重后count查询)
GET employee/_search
{
  "size": 0,
  "aggs": {
    "sum_job": {
      "cardinality": {
        "field": "job"
      }
    }
  }
}

# 4.查询航空平均票价的最大值、最小值、平均值
GET kibana_sample_data_flights/_search
{
  "size": 0, 
  "aggs": {
    "max_ticket_price": {
      "max": {
        "field": "AvgTicketPrice"
      }
    },
    "min_ticket_price": {
      "min": {
        "field": "AvgTicketPrice"
      }
    },
    "avg_ticket_price": {
      "avg": {
        "field": "AvgTicketPrice"
      }
    }
  }
}

3.1.2 多值的输出

ES还有些函数,可以一次性输出很多个统计的数据: terms、stats

# 1.查询员工工资信息(数值类型)
GET employee/_search
{
  "size": 0, 
  "aggs": {
    "sal_info": {
      "stats": {
        "field": "sal"
      }
    }
  }
}

# 2.查询到达不同国家的航班数量(分组)
GET kibana_sample_data_flights/_search
{
  "size": 0,
  "aggs": {
    "dest_country_info": {
      "terms": {
        "field": "DestCountry",
        "size": 10
      }
    }
  }
}

# 3.查询每个岗位有多少人
GET employee/_search
{
  "size": 0,
  "aggs": {
    "job_emps_num": {
      "terms": {
        "field": "job",
        "size": 10
      }
    }
  }
}

# 4.查询目标地的航班班次以及天气的统计信息(子聚合)
GET kibana_sample_data_flights/_search
{
  "size": 0,
  "aggs": {
    "dest_country_info": {
      "terms": {
        "field": "DestCountry"
      },
      "aggs": {
        "dest_country_weather_info": {
          "terms": {
            "field": "DestWeather"
          }
        }
      }
    }
  }
}

# 5.查询每个岗位下工资的信息(平均、最高、最少等)
GET employee/_search
{
  "size": 0, 
  "aggs": {
    "job_info": {
      "terms": {
        "field": "job"
      },
      "aggs": {
        "diff_job_sal_info": {
          "stats": {
            "field": "sal"
          }
        }
      }
    }
  }
}

# 6.查询不同工种的男女员工数量、然后统计不同工种下男女员工的工资信息
GET employee/_search
{
  "size": 0,
  "aggs": {
    "job_info": {
      "terms": {
        "field": "job"
      },
      "aggs": {
        "diff_job_gender_no": {
          "terms": {
            "field": "gender"
          },
          "aggs": {
            "diff_job_gender_sal_info": {
              "stats": {
                "field": "sal"
              }
            }
          }
        }
      }
    }
  }
}

# 7.查询年龄最大的两位员工的信息
GET employee/_search
{
  "size": 0,
  "aggs": {
    "older_two_emp": {
      "top_hits": {
        "size": 2,
        "sort": [
          {
            "age": {
              "order": "desc"
            }
          }
        ]
      }
    }
  }
}

# 8.查询不同工资区间员工工资的统计信息
GET employee/_search
{
  "size": 0,
  "aggs": {
    "rang_sal_info": {
      "range": {
        "field": "sal",
        "ranges": [
          {
            "key": "0 <= sal < 10001", 
            "to": 10001
          },
          {
            "key": "10001 <= sal < 20001", 
            "from": 10001, 
            "to": 20001
          },
          {
            "key": "20001 <= sal < 30001", 
            "from": 20001, 
            "to": 30001
          }
        ]
      }
    }
  }
}

# 9.以直方图的方式以每5000元为一个区间查询员工工资信息
GET employee/_search
{
  "size": 0,
  "aggs": {
    "range_sal_info": {
      "histogram": {
        "field": "sal",
        "interval": 5000,
        "extended_bounds": {
          "min": 0,
          "max": 15000
        }
      }
    }
  }
}

# 10. 查询平均工资最低的工种
GET employee/_search
{
  "size": 0,
  "aggs": {
    "job_info": {
      "terms": {
        "field": "job"
      },
      "aggs": {
        "diff_job_avg_sal": {
          "avg": {
            "field": "sal"
          }
        }
      }
    },
    "min_avg_sal_job": {
      "min_bucket": {
        "buckets_path": "job_info>diff_job_avg_sal"
      }
    }
  }
}

# 11.查询年龄大于30岁的员工的平均工资
GET employee/_search
{
  "size": 0, 
  "query": {
    "range": {
      "age": {
        "gt": 30
      }
    }
  },
  "aggs": {
    "gt_30_emp_avg_sal": {
      "avg": {
        "field": "sal"
      }
    }
  }
}

# 12.查询Java员工的平均工资(不进行相关性算法,效率更高)
GET employee/_search
{
  "size": 0, 
  "query": {
    "constant_score": {
      "filter": {
        "term": {
          "job": "java"
        }
      },
      "boost": 1.2
    }
  },
  "aggs": {
    "java_emp_avg_sal": {
      "avg": {
        "field": "sal"
      }
    }
  }
}

# 13.求30岁以上的员工平均工资和所有员工的平均工资
GET employee/_search
{
  "size": 0,
  "aggs": {
    "all_emp_avg_sal": {
      "avg": {
        "field": "sal"
      }
    },
    "gt_30_emp_avg_info": {
      "filter": {
        "range": {
          "age": {
            "gt": 30
          }
        }
      },
      "aggs": {
        "gt_30_emp_avg_sal": {
          "avg": {
            "field": "sal"
          }
        }
      }
    }
  }
}

3.2 推荐搜索

在搜索过程中,因为单词的拼写错误,没有得到任何的结果,希望ES能够给我们一个推荐搜索。

GET movies/_search
{
  "suggest": {
  	# title_suggestion为我们自定义的名字
    "title_suggestion": {
      "text": "drema",
      "term": {
        "field": "title",
        "suggest_mode": "popular"
      }
    }
  }
}

suggest_mode,有三个值:popular、missing、always

  1. popular 是推荐词频更高的一些搜索。
  2. missing 是当没有要搜索的结果的时候才推荐。 (默认值)
  3. always无论什么情况下都进行推荐。
GET movies/_search
{
  "suggest": {
    "title_suggestion": {
      "text": "beauti",
      "term": {
        "field": "title"
      }
    }
  }
}
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "suggest" : {
    "title_suggestion" : [
      {
        "text" : "beauti",
        "offset" : 0,
        "length" : 6,
        "options" : [
          {
            "text" : "beauty",
            "score" : 0.8333333,
            "freq" : 66
          },
          {
            "text" : "beasts",
            "score" : 0.6666666,
            "freq" : 9
          },
          {
            "text" : "beauties",
            "score" : 0.6666666,
            "freq" : 5
          },
          {
            "text" : "beastie",
            "score" : 0.6666666,
            "freq" : 2
          },
          {
            "text" : "beatie",
            "score" : 0.6666666,
            "freq" : 1
          }
        ]
      }
    ]
  }
}
GET movies/_search
{
  "suggest": {
    "title_suggestion": {
      "text": "beauty",
      "term": {
        "field": "title"
      }
    }
  }
}
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "suggest" : {
    "title_suggestion" : [
      {
        "text" : "beauty",
        "offset" : 0,
        "length" : 6,
        "options" : [ ]
      }
    ]
  }
}
GET movies/_search
{
  "suggest": {
    "title_suggestion": {
      "text": "beauty",
      "term": {
        "field": "title",
        "suggest_mode": "always"
      }
    }
  }
}
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "suggest" : {
    "title_suggestion" : [
      {
        "text" : "beauty",
        "offset" : 0,
        "length" : 6,
        "options" : [
          {
            "text" : "beasts",
            "score" : 0.6666666,
            "freq" : 9
          },
          {
            "text" : "bearly",
            "score" : 0.6666666,
            "freq" : 1
          },
          {
            "text" : "beastly",
            "score" : 0.6666666,
            "freq" : 1
          },
          {
            "text" : "beast",
            "score" : 0.6,
            "freq" : 74
          },
          {
            "text" : "betty",
            "score" : 0.6,
            "freq" : 13
          }
        ]
      }
    ]
  }
}
GET movies/_search
{
  "suggest": {
    "title_suggestion": {
      "text": "beauty",
      "term": {
        "field": "title",
        "suggest_mode": "popular"
      }
    }
  }
}
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "suggest" : {
    "title_suggestion" : [
      {
        "text" : "beauty",
        "offset" : 0,
        "length" : 6,
        "options" : [
          {
            "text" : "beast",
            "score" : 0.6,
            "freq" : 74
          }
        ]
      }
    ]
  }
}

3.3 自动补全

自动补全应该是我们在日常的开发过程中最常见的搜索方式了,如百度搜索和京东商品搜索。

4vVBeU.png

4vVro4.png

自动补全的功能对性能的要求极高,用户每发送输入一个字符就要发送一个请求去查找匹配项。 ES采取了不同的数据结构来实现,并不是通过倒排索引来实现的;需要将对应的数据类型设置为 completion ; 所以在将数据索引进ES之前需要先定义 mapping 信息。

3.3.1 查看mapping

GET movies/_mapping
{
  "movies" : {
    "mappings" : {
      "properties" : {
        "@version" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "genre" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "id" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "title" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "year" : {
          "type" : "long"
        }
      }
    }
  }
}

3.3.2 删索引、重新定义mapping、重新导数据

先查询mapping

GET movies/_mapping

把查询到的mapping做修改,删除索引后再执行创建新mapping,再导入数据

PUT movies
{
  "mappings": {
    "properties": {
      "@version": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "genre": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "id": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "title": {
        "type": "completion"
      },
      "year": {
        "type": "long"
      }
    }
  }
}
DELETE movies

3.3.3 前缀搜索

GET movies/_search
{
  "_source": [""], 
  "suggest": {
    "title_prefix_suggest": {
      "prefix": "bu",
      "completion": {
        "field": "title",
        "skip_duplicates": true,
        "size": 10
      }
    }
  }
}
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "suggest" : {
    "title_prefix_suggest" : [
      {
        "text" : "bu",
        "offset" : 0,
        "length" : 2,
        "options" : [
          {
            "text" : "'burbs, The",
            "_index" : "movies",
            "_type" : "_doc",
            "_id" : "2072",
            "_score" : 1.0,
            "_source" : { }
          },
          {
            "text" : "Bubba Ho-tep",
            "_index" : "movies",
            "_type" : "_doc",
            "_id" : "6755",
            "_score" : 1.0,
            "_source" : { }
          },
          {
            "text" : "Bubble",
            "_index" : "movies",
            "_type" : "_doc",
            "_id" : "38188",
            "_score" : 1.0,
            "_source" : { }
          },
          {
            "text" : "Bubble Boy",
            "_index" : "movies",
            "_type" : "_doc",
            "_id" : "4732",
            "_score" : 1.0,
            "_source" : { }
          },
          {
            "text" : "Bubble, The",
            "_index" : "movies",
            "_type" : "_doc",
            "_id" : "55132",
            "_score" : 1.0,
            "_source" : { }
          },
          {
            "text" : "Bubblegum",
            "_index" : "movies",
            "_type" : "_doc",
            "_id" : "188595",
            "_score" : 1.0,
            "_source" : { }
          },
          {
            "text" : "Bubblegum and Broken Fingers",
            "_index" : "movies",
            "_type" : "_doc",
            "_id" : "162072",
            "_score" : 1.0,
            "_source" : { }
          },
          {
            "text" : "Bubu",
            "_index" : "movies",
            "_type" : "_doc",
            "_id" : "143753",
            "_score" : 1.0,
            "_source" : { }
          },
          {
            "text" : "Buccaneer, The",
            "_index" : "movies",
            "_type" : "_doc",
            "_id" : "75994",
            "_score" : 1.0,
            "_source" : { }
          },
          {
            "text" : "Buchanan Rides Alone",
            "_index" : "movies",
            "_type" : "_doc",
            "_id" : "82298",
            "_score" : 1.0,
            "_source" : { }
          }
        ]
      }
    ]
  }
}

skip_duplicates: 表示忽略掉重复。

size: 表示返回多少条数据。

3.4 高亮显示

高亮显示在实际的应用中也会碰到很多,如下给出了百度和极客时间的两个高亮搜索的案例:

4vKJFU.png

4vKtW4.png

#将title和genre中所有的romance进行高亮显示
GET movies/_search
{
  "query": {
    "multi_match": {
      "query": "romance",
      "fields": ["title", "genre"]
    }
  },
  "highlight": {
    "pre_tags": "<span>",
    "post_tags": "</span>", 
    "fields": {
      "title": {},
      "genre": {
        "pre_tags": "<em>",
        "post_tags": "</em>"
      }
    }
  }
}
{
  "took" : 77,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 7428,
      "relation" : "eq"
    },
    "max_score" : 9.80649,
    "hits" : [
      {
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "2894",
        "_score" : 9.80649,
        "_source" : {
          "year" : 1999,
          "id" : "2894",
          "@version" : "1",
          "genre" : [
            "Drama",
            "Romance"
          ],
          "title" : "Romance"
        },
        "highlight" : {
          "genre" : [
            "<em>Romance</em>"
          ],
          "title" : [
            "<span>Romance</span>"
          ]
        }
      },
      {
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "116867",
        "_score" : 9.80649,
        "_source" : {
          "year" : 1930,
          "id" : "116867",
          "@version" : "1",
          "genre" : [
            "Drama",
            "Romance"
          ],
          "title" : "Romance"
        },
        "highlight" : {
          "genre" : [
            "<em>Romance</em>"
          ],
          "title" : [
            "<span>Romance</span>"
          ]
        }
      },
      {
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "124991",
        "_score" : 9.80649,
        "_source" : {
          "year" : 2008,
          "id" : "124991",
          "@version" : "1",
          "genre" : [
            "Romance"
          ],
          "title" : "Romance"
        },
        "highlight" : {
          "genre" : [
            "<em>Romance</em>"
          ],
          "title" : [
            "<span>Romance</span>"
          ]
        }
      },
      {
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "3501",
        "_score" : 8.259426,
        "_source" : {
          "year" : 1985,
          "id" : "3501",
          "@version" : "1",
          "genre" : [
            "Comedy",
            "Romance"
          ],
          "title" : "Murphy's Romance"
        },
        "highlight" : {
          "genre" : [
            "<em>Romance</em>"
          ],
          "title" : [
            "Murphy's <span>Romance</span>"
          ]
        }
      },
      {
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "555",
        "_score" : 8.259426,
        "_source" : {
          "year" : 1993,
          "id" : "555",
          "@version" : "1",
          "genre" : [
            "Crime",
            "Thriller"
          ],
          "title" : "True Romance"
        },
        "highlight" : {
          "title" : [
            "True <span>Romance</span>"
          ]
        }
      },
      {
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "40342",
        "_score" : 8.259426,
        "_source" : {
          "year" : 2005,
          "id" : "40342",
          "@version" : "1",
          "genre" : [
            "Comedy",
            "Drama",
            "Musical",
            "Romance"
          ],
          "title" : "Romance & Cigarettes"
        },
        "highlight" : {
          "genre" : [
            "<em>Romance</em>"
          ],
          "title" : [
            "<span>Romance</span> & Cigarettes"
          ]
        }
      },
      {
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "149446",
        "_score" : 8.259426,
        "_source" : {
          "year" : 2010,
          "id" : "149446",
          "@version" : "1",
          "genre" : [
            "Comedy",
            "Drama"
          ],
          "title" : "Petty Romance"
        },
        "highlight" : {
          "title" : [
            "Petty <span>Romance</span>"
          ]
        }
      },
      {
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "150016",
        "_score" : 8.259426,
        "_source" : {
          "year" : 2012,
          "id" : "150016",
          "@version" : "1",
          "genre" : [
            "Comedy",
            "Drama"
          ],
          "title" : "Brasserie Romance"
        },
        "highlight" : {
          "title" : [
            "Brasserie <span>Romance</span>"
          ]
        }
      },
      {
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "133712",
        "_score" : 8.259426,
        "_source" : {
          "year" : 1977,
          "id" : "133712",
          "@version" : "1",
          "genre" : [
            "Comedy",
            "Romance"
          ],
          "title" : "Office Romance"
        },
        "highlight" : {
          "genre" : [
            "<em>Romance</em>"
          ],
          "title" : [
            "Office <span>Romance</span>"
          ]
        }
      },
      {
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "5769",
        "_score" : 8.259426,
        "_source" : {
          "year" : 1981,
          "id" : "5769",
          "@version" : "1",
          "genre" : [
            "Comedy",
            "Romance"
          ],
          "title" : "Modern Romance"
        },
        "highlight" : {
          "genre" : [
            "<em>Romance</em>"
          ],
          "title" : [
            "Modern <span>Romance</span>"
          ]
        }
      }
    ]
  }
}
#查询2012年电影的名字中包含romance的电影,将title中romance进行高亮显示,同时将这些电影中genre包含Children单纯进行高亮显示
GET movies/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "year": "2012"
          }
        },
        {
          "match": {
            "title": "romance"
          }
        }
      ]
    }
  },
  "highlight": {
    "fields": {
      "title": {},
      "genre": {
        "pre_tags": "<span>",
        "post_tags": "</span>",
        "highlight_query": {
          "match": {
            "genre": "Children"
          }
        }
      }
    }
  }
}
{
  "took" : 9,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 9.259426,
    "hits" : [
      {
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "150016",
        "_score" : 9.259426,
        "_source" : {
          "year" : 2012,
          "id" : "150016",
          "@version" : "1",
          "genre" : [
            "Comedy",
            "Drama"
          ],
          "title" : "Brasserie Romance"
        },
        "highlight" : {
          "title" : [
            "Brasserie <em>Romance</em>"
          ]
        }
      },
      {
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "158946",
        "_score" : 7.2784586,
        "_source" : {
          "year" : 2012,
          "id" : "158946",
          "@version" : "1",
          "genre" : [
            "Children",
            "Romance"
          ],
          "title" : "A Taste of Romance"
        },
        "highlight" : {
          "genre" : [
            "<span>Children</span>"
          ],
          "title" : [
            "A Taste of <em>Romance</em>"
          ]
        }
      }
    ]
  }
}

四、分词器安装

4.1 ik分词器

4.1.1 下载

https://github.com/medcl/elasticsearch-analysis-ik/releases

4.1.2 安装

IK分词器在任何操作系统下安装步骤均⼀样: 在ES的家⽬录下的 plugins ⽬录下创建名为 ik 的 ⽂件夹,然后将下载后的 zip 包拷⻉到 ik 解压即可

IK分词器提供了两种分词⽅式:

分词器名称 说明
ik_smart 会做最粗粒度的拆分,⽐如会将“中华⼈⺠共和国国歌”拆分为“中华⼈⺠共和国,国 歌”,适合 Phrase 查询
ik_max_word 会将⽂本做最细粒度的拆分,⽐如会将“中华⼈⺠共和国国歌”拆分为“中华⼈⺠共 和国,中华⼈⺠,中华,华⼈,⼈⺠共和国,⼈⺠,⼈,⺠,共和国,共和,和,国国,国歌”,会穷 尽各种可能的组合,适合 Term Query;

4.1.3 验证

standard分词器处理不了中文

GET _analyze
{
  "analyzer": "standard",
  "text": "教育"
}
{
  "tokens" : [
    {
      "token" : "教",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "<IDEOGRAPHIC>",
      "position" : 0
    },
    {
      "token" : "育",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "<IDEOGRAPHIC>",
      "position" : 1
    }
  ]
}

使⽤ ik_smart 分词器

GET _analyze
{
  "analyzer": "ik_smart",
  "text": "教育"
}
{
  "tokens" : [
    {
      "token" : "教育",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 0
    }
  ]
}
GET _analyze
{
  "analyzer": "ik_smart",
  "text": "中华人民共和国"
}
{
  "tokens" : [
    {
      "token" : "中华人民共和国",
      "start_offset" : 0,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 0
    }
  ]
}

使⽤ ik_max_word 分词器

GET _analyze
{
  "analyzer": "ik_max_word",
  "text": "中华人民共和国"
}
{
  "tokens" : [
    {
      "token" : "中华人民共和国",
      "start_offset" : 0,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "中华人民",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "中华",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "华人",
      "start_offset" : 1,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 3
    },
    {
      "token" : "人民共和国",
      "start_offset" : 2,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 4
    },
    {
      "token" : "人民",
      "start_offset" : 2,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 5
    },
    {
      "token" : "共和国",
      "start_offset" : 4,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 6
    },
    {
      "token" : "共和",
      "start_offset" : 4,
      "end_offset" : 6,
      "type" : "CN_WORD",
      "position" : 7
    },
    {
      "token" : "国",
      "start_offset" : 6,
      "end_offset" : 7,
      "type" : "CN_CHAR",
      "position" : 8
    }
  ]
}

4.1.4 ⾃定义词库

在很多的时候,业务上的⼀些词库极有可能不在IK分词器的词库中,需要去定制属于我们⾃⼰的词 库。例如下⾯的例⼦中, 正井猫up主 被切分为⼀个个的字,我们希望这两个词语是不被拆 分;另外 作为中⽂的停顿词,也不希望出现在分词中,所以我们需要⾃定义词库和停顿词词库。

GET _analyze
{
  "analyzer": "ik_smart",
  "text": "请关注正井猫up主,你们的支持是我坚持的动力。"
}
{
  "tokens" : [
    {
      "token" : "请",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "CN_CHAR",
      "position" : 0
    },
    {
      "token" : "关注",
      "start_offset" : 1,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "正",
      "start_offset" : 3,
      "end_offset" : 4,
      "type" : "CN_CHAR",
      "position" : 2
    },
    {
      "token" : "井",
      "start_offset" : 4,
      "end_offset" : 5,
      "type" : "CN_CHAR",
      "position" : 3
    },
    {
      "token" : "猫",
      "start_offset" : 5,
      "end_offset" : 6,
      "type" : "CN_CHAR",
      "position" : 4
    },
    {
      "token" : "up",
      "start_offset" : 6,
      "end_offset" : 8,
      "type" : "ENGLISH",
      "position" : 5
    },
    {
      "token" : "主",
      "start_offset" : 8,
      "end_offset" : 9,
      "type" : "CN_CHAR",
      "position" : 6
    },
    {
      "token" : "你们",
      "start_offset" : 10,
      "end_offset" : 12,
      "type" : "CN_WORD",
      "position" : 7
    },
    {
      "token" : "的",
      "start_offset" : 12,
      "end_offset" : 13,
      "type" : "CN_CHAR",
      "position" : 8
    },
    {
      "token" : "支持",
      "start_offset" : 13,
      "end_offset" : 15,
      "type" : "CN_WORD",
      "position" : 9
    },
    {
      "token" : "是",
      "start_offset" : 15,
      "end_offset" : 16,
      "type" : "CN_CHAR",
      "position" : 10
    },
    {
      "token" : "我",
      "start_offset" : 16,
      "end_offset" : 17,
      "type" : "CN_CHAR",
      "position" : 11
    },
    {
      "token" : "坚持",
      "start_offset" : 17,
      "end_offset" : 19,
      "type" : "CN_WORD",
      "position" : 12
    },
    {
      "token" : "的",
      "start_offset" : 19,
      "end_offset" : 20,
      "type" : "CN_CHAR",
      "position" : 13
    },
    {
      "token" : "动力",
      "start_offset" : 20,
      "end_offset" : 22,
      "type" : "CN_WORD",
      "position" : 14
    }
  ]
}

进⼊到 $ES_HOME/plugins/ik/config ⽬录下,创建 custom ⽬录,在⽬录下创建 mydic.dic 、 ext_stopword.dic ⽂件。(文件名可以自定义,但必须是.dic文件)

在 mydic.dic ⽂件中添加两⾏内容:

正井猫
up主

在 ext_stopword.dic 中添加⼀⾏内容:

的
是

最后修改 $ES_HOME/plugins/ik/config/IKAnalyzer.cfg.xml ⽂件,内容如下:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
	<comment>IK Analyzer 扩展配置</comment>
	<!--用户可以在这里配置自己的扩展字典 -->
	<entry key="ext_dict">custom/mydic.dic</entry>
	 <!--用户可以在这里配置自己的扩展停止词字典-->
	<entry key="ext_stopwords">custom/ext_stopword.dic</entry>
	<!--用户可以在这里配置远程扩展字典 -->
	<!-- <entry key="remote_ext_dict">words_location</entry> -->
	<!--用户可以在这里配置远程扩展停止词字典-->
	<!-- <entry key="remote_ext_stopwords">words_location</entry> -->
</properties>

启重启elasticsearch elasticsearch , 重新执⾏如上的命令,结果如下:

GET _analyze
{
  "analyzer": "ik_smart",
  "text": "请关注正井猫up主,你们的支持是我坚持的动力。"
}
{
  "tokens" : [
    {
      "token" : "请",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "CN_CHAR",
      "position" : 0
    },
    {
      "token" : "关注",
      "start_offset" : 1,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "正井猫",
      "start_offset" : 3,
      "end_offset" : 6,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "up主",
      "start_offset" : 6,
      "end_offset" : 9,
      "type" : "CN_WORD",
      "position" : 3
    },
    {
      "token" : "你们",
      "start_offset" : 10,
      "end_offset" : 12,
      "type" : "CN_WORD",
      "position" : 4
    },
    {
      "token" : "支持",
      "start_offset" : 13,
      "end_offset" : 15,
      "type" : "CN_WORD",
      "position" : 5
    },
    {
      "token" : "我",
      "start_offset" : 16,
      "end_offset" : 17,
      "type" : "CN_CHAR",
      "position" : 6
    },
    {
      "token" : "坚持",
      "start_offset" : 17,
      "end_offset" : 19,
      "type" : "CN_WORD",
      "position" : 7
    },
    {
      "token" : "动力",
      "start_offset" : 20,
      "end_offset" : 22,
      "type" : "CN_WORD",
      "position" : 8
    }
  ]
}
GET _analyze
{
  "analyzer": "ik_max_word",
  "text": "请关注正井猫up主,你们的支持是我坚持的动力。"
}
{
  "tokens" : [
    {
      "token" : "请",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "CN_CHAR",
      "position" : 0
    },
    {
      "token" : "关注",
      "start_offset" : 1,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "正井猫",
      "start_offset" : 3,
      "end_offset" : 6,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "up主",
      "start_offset" : 6,
      "end_offset" : 9,
      "type" : "CN_WORD",
      "position" : 3
    },
    {
      "token" : "up",
      "start_offset" : 6,
      "end_offset" : 8,
      "type" : "ENGLISH",
      "position" : 4
    },
    {
      "token" : "主",
      "start_offset" : 8,
      "end_offset" : 9,
      "type" : "CN_CHAR",
      "position" : 5
    },
    {
      "token" : "你们",
      "start_offset" : 10,
      "end_offset" : 12,
      "type" : "CN_WORD",
      "position" : 6
    },
    {
      "token" : "支持",
      "start_offset" : 13,
      "end_offset" : 15,
      "type" : "CN_WORD",
      "position" : 7
    },
    {
      "token" : "我",
      "start_offset" : 16,
      "end_offset" : 17,
      "type" : "CN_CHAR",
      "position" : 8
    },
    {
      "token" : "坚持",
      "start_offset" : 17,
      "end_offset" : 19,
      "type" : "CN_WORD",
      "position" : 9
    },
    {
      "token" : "动力",
      "start_offset" : 20,
      "end_offset" : 22,
      "type" : "CN_WORD",
      "position" : 10
    }
  ]
}

4.1.5 创建mapping指定分词器(不指定默认standard),analyzer是指定索引进es时用的分词器,search_analyzer是指定搜索时指定的分词器

PUT news
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "ik_max_word",
        "search_analyzer": "ik_smart"
      },
      "content": {
        "type": "text",
        "analyzer": "ik_max_word",
        "search_analyzer": "ik_smart"
      }
    }
  }
}

自定义分词器后,已有的数据还可以重新分词索引(POST news/_update_by_query)

ik分词器的应用以及动态重新索引数据

ik分词器动态词库的添加

4.2 pinyin分词器

4x2HHA.png

4.2.1 下载

下载地址:https://github.com/medcl/elasticsearch-analysis-pinyin/releases

4.2.2 安装

pinyin 分词器在任何操作系统下安装步骤均⼀样: 在ES的家⽬录下的 plugins ⽬录下创建名为 pinyin 的⽂件夹,然后将下载后的 zip 包拷⻉到 pinyin 解压即可

4.2.3 验证

执⾏如下命令:

GET _analyze
{
 "analyzer": "pinyin",
 "text": "正井猫"
}

4xWUzT.png

拼音分词器的高级应用(一)

拼音分词器的高级应用(二)

4.3 ⾃定义分词器以及应⽤

对于 <p>刘德华</p> ,现在想要得到如下的分词结果
{
  "tokens": [
    {
      "token": "刘德华",
      "start_offset": 0,
      "end_offset": 3,
      "type": "word",
      "position": 0
    },
    {
      "token": "liudehua",
      "start_offset": 0,
      "end_offset": 3,
      "type": "word",
      "position": 0
    },
    {
      "token": "ldh",
      "start_offset": 0,
      "end_offset": 3,
      "type": "word",
      "position": 0
    }
  ]
}

4.3.1 设置分词器

PUT test
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "char_filter": [
            "html_strip"
          ],
          "tokenizer": "keyword",
          "filter": "my_pinyin_filter"
        }
      },
      "filter": {
        "my_pinyin_filter": {
          "type": "pinyin",
          "keep_first_letter": true,
          "keep_full_pinyin": false,
          "keep_joined_full_pinyin": true,
          "keep_original": true,
          "keep_none_chinese": false,
          "keep_none_chinese_in_joined_full_pinyin": true
        }
      }
    }
  }
}

4.3.2 验证分词器效果

GET test/_analyze
{
    "analyzer": "my_analyzer",
    "text": ["刘德华"]
}

4xb2E8.png

4.3.3 为属性添加分词器

设定 mappings 信息,指定索引 testname 属性的 analyzer ⾃定义的分词器。

PUT test/_mapping
{
    "properties": {
        "name": {
            "type": "completion",
            "analyzer": "my_analyzer"
        }
    }
}

4.3.4 结果验证

实现效果

4xLdld.png

执⾏如下命令添加数据

POST test/_bulk
{"index": {}}
{"name": "刘德华"}
{"index": {}}
{"name": "张学友"}
{"index": {}}
{"name": "柳岩"}

执⾏前缀建议语句

4xqyRJ.png

4xqgMR.png

4xq4IO.png

通过如上最后⼀个结果⼤家仔细去理解《 通过如上最后⼀个结果⼤家仔细去理解《Elasticsearch Elasticsearch教程教程((⼀⼀))》中,第 》中,第55节的开始标红的 节的开始标红的 那句话。 那句话。

五、MySQL数据导⼊到ES

将MySQL的初始化数据导⼊到ES的⽅式可以通过程序的⽅式和⼯具的⽅式。本教程使⽤ Logstash 来初始化导⼊。⾸先将 MySQL 的驱动包拷⻉到 $logStash/logstashcore/lib/jars/ ⽬录下;在 $logstash/config/ ⽬录下创建名为 logstash-mysqlnews.conf 的⽂件,⽂件内容如下:

input {
    jdbc {
        jdbc_driver_class => "com.mysql.cj.jdbc.Driver"
        jdbc_connection_string => "jdbc:mysql://localhost:3306/es?
        useSSL=false&serverTimezone=UTC"
        jdbc_user => root
        jdbc_password => "123456"
        #启⽤追踪,如果为true,则需要指定tracking_column
        use_column_value => true
        #指定追踪的字段,
        tracking_column => id
        #追踪字段的类型,⽬前只有数字(numeric)和时间类型(timestamp),默认是数字类型
        tracking_column_type => "numeric"
        #记录最后⼀次运⾏的结果
        record_last_run => true
        #上⾯运⾏结果的保存位置
        last_run_metadata_path => "mysql-position.txt"
        statement => "SELECT * FROM news where id > :sql_last_value"
        schedule => "* * * * * *"
    }
}
filter {
    mutate {
    	split => { "tags" => ","}
    }
}
output {
    elasticsearch {
        document_id => "%{id}"
        document_type => "_doc"
        index => "news"
        hosts => ["http://localhost:9200"]
    }
    stdout{
    	codec => rubydebug
    }
}

六、视频/博客/项目引用

1.Elasticsearch(7.8.1)沥血之作(包含仿百度搜索案例)_哔哩哔哩_bilibili

2.kiramie/elasticsearch-demo1 (gitee.com)

3.kiramie/elasticsearch-demo2 (gitee.com)

4.Elasticsearch入门及掌握其JavaAPI - 掘金 (juejin.cn)

5.ES + Spring boot的正确姿势 (ES系列三) - 掘金 (juejin.cn)

posted @ 2021-10-06 17:08  no1486  阅读(192)  评论(0编辑  收藏  举报