1 简介
一般很少用,因为都是全文检索倒排索引,就算检索到了 也会继续往下检索
2 示例
2.1 数据准备
DELETE /product
PUT /product/_doc/1
{
"name" : "xiaomi phone",
"desc" : "shouji zhong de zhandouji",
"price" : 3999,
"tags": [ "xingjiabi", "fashao", "buka" ]
}
PUT /product/_doc/2
{
"name" : "xiaomi nfc phone",
"desc" : "zhichi quangongneng nfc,shouji zhong de jianjiji",
"price" : 4999,
"tags": [ "xingjiabi", "fashao", "gongjiaoka" ]
}
PUT /product/_doc/3
{
"name" : "nfc phone",
"desc" : "shouji zhong de hongzhaji",
"price" : 2999,
"tags": [ "xingjiabi", "fashao", "menjinka" ]
}
PUT /product/_doc/4
{
"name" : "xiaomi erji",
"desc" : "erji zhong de huangmenji",
"price" : 999,
"tags": [ "low", "bufangshui", "yinzhicha" ]
}
PUT /product/_doc/5
{
"name" : "hongmi erji",
"desc" : "erji zhong de kendeji",
"price" : 399,
"tags": [ "lowbee", "xuhangduan", "zhiliangx" ]
}
2.2 前缀搜索
2.2.1 官网文档
2.2.2 示例
前缀搜索-它是去倒排索引里面去查找以er开头的-效率比较慢
GET /product/_search
{
"query": {
"prefix": {
"desc": {
"value": "er"
}
}
}
}
查询结果
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "product",
"_type" : "_doc",
"_id" : "4",
"_score" : 1.0,
"_source" : {
"name" : "xiaomi erji",
"desc" : "erji zhong de huangmenji",
"price" : 999,
"tags" : [
"low",
"bufangshui",
"yinzhicha"
]
}
},
{
"_index" : "product",
"_type" : "_doc",
"_id" : "5",
"_score" : 1.0,
"_source" : {
"name" : "hongmi erji",
"desc" : "erji zhong de kendeji",
"price" : 399,
"tags" : [
"lowbee",
"xuhangduan",
"zhiliangx"
]
}
}
]
}
}
2.3 通配符查询
2.3.1 官网文档
2.3.2 示例
查询包含ji的
GET /product/_search
{
"query": {
"wildcard": {
"desc": {
"value": "*ji*"
}
}
}
}
查询结果
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 5,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "product",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"name" : "xiaomi phone",
"desc" : "shouji zhong de zhandouji",
"price" : 3999,
"tags" : [
"xingjiabi",
"fashao",
"buka"
]
}
},
{
"_index" : "product",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"name" : "xiaomi nfc phone",
"desc" : "zhichi quangongneng nfc,shouji zhong de jianjiji",
"price" : 4999,
"tags" : [
"xingjiabi",
"fashao",
"gongjiaoka"
]
}
},
{
"_index" : "product",
"_type" : "_doc",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"name" : "nfc phone",
"desc" : "shouji zhong de hongzhaji",
"price" : 2999,
"tags" : [
"xingjiabi",
"fashao",
"menjinka"
]
}
},
{
"_index" : "product",
"_type" : "_doc",
"_id" : "4",
"_score" : 1.0,
"_source" : {
"name" : "xiaomi erji",
"desc" : "erji zhong de huangmenji",
"price" : 999,
"tags" : [
"low",
"bufangshui",
"yinzhicha"
]
}
},
{
"_index" : "product",
"_type" : "_doc",
"_id" : "5",
"_score" : 1.0,
"_source" : {
"name" : "hongmi erji",
"desc" : "erji zhong de kendeji",
"price" : 399,
"tags" : [
"lowbee",
"xuhangduan",
"zhiliangx"
]
}
}
]
}
}
2.4 正则查询
2.4.1 官方文档
2.4.2 示例
GET /product/_search
{
"query": {
"regexp": {
"desc": {
"value": ".*rj.*"
}
}
}
}
结果
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "product",
"_type" : "_doc",
"_id" : "4",
"_score" : 1.0,
"_source" : {
"name" : "xiaomi erji",
"desc" : "erji zhong de huangmenji",
"price" : 999,
"tags" : [
"low",
"bufangshui",
"yinzhicha"
]
}
},
{
"_index" : "product",
"_type" : "_doc",
"_id" : "5",
"_score" : 1.0,
"_source" : {
"name" : "hongmi erji",
"desc" : "erji zhong de kendeji",
"price" : 399,
"tags" : [
"lowbee",
"xuhangduan",
"zhiliangx"
]
}
}
]
}
}
3 模糊查询
转:https://www.jianshu.com/p/06f43b537a29
3.1 官方文档
3.2 简介
fuzzy在es中可以理解为模糊查询,搜索本身很多时候是不精确的,很多时候我们需要在用户的查询词中有部分错误的情况下也能召回正确的结果,但是计算机无法理解自然语言,因此我们只能通过一些算法替代语言理解能力实现类似的事情,前缀查询的实现比较简单但效果很难令人满意,就模糊查询而言es的fuzzy实现了一种复杂度和效果比较折中的查询能力。
3.3 字符的相似度-编辑距离
3.3.1 简介
编辑距离是对两个字符串差异长度的量化,或者说一个字符至少需要处理多少次才能变成另一个字符,比如lucene和lucece只差了一个字符他们的编辑距离是1。lucene和luceen的编辑距离是2,它们最后两字字符的顺序不一样,需要换个顺序,换位置相当于变两次,所以距离是2
3.3.2 莱文斯坦距离(Levenshtein distance)
编辑距离的一种,指两个字符串之间,由一个转成另一个所需的最少编辑操作次数。
允许的编辑包括:
将一个字符替换成另一个字符
插入一个字符
删除一个字符
3.3.3 Damerau–Levenshtein distance
莱文斯坦距离的一个扩展版 ,将相邻位置的两个字符的互换当做一次编辑,而在经典的莱文斯坦距离计算中位置互换是2次编辑。
ElasticSearch支持经典的Levenshtein距离和Damerau-Levenshtein距离,在es中对模糊查询的支持有两种方式match query和fuzzy query。
3.4 ES支持的距离算法和模糊查询方式
ElasticSearch支持经典的Levenshtein距离和Damerau-Levenshtein距离,在es中对模糊查询的支持有两种方式match query和fuzzy query。
match query查询条件会分词
fuzzy query查询条件不会分词
3.5 match query语法
GET product/_search
{
"query": {
"match": {
"desc": {
"query": "erji zhong",
"fuzziness": 0,
"prefix_length": 0,
"max_expansions": 50
}
}
}
}
3.6 fuzzy query语法
GET /product/_search
{
"query": {
"fuzzy": {
"desc": {
"value": "erxi",
"fuzziness": 1,
"prefix_length": 0,
"max_expansions": 50
}
}
}
}
3.7 参数介绍
1)fuzziness
本次查询允许的最大编辑距离,默认不开启模糊查询,相当于fuzziness=0。
支持的格式
-
可以是数字(
0、1、2
)代表固定的最大编辑距离 -
自动模式,
AUTO:[low],[high]
的格式,含义为:- 查询词长度在[0-low)范围内编辑距离为0(即强匹配)
- [low, high)范围内允许编辑一次
- >high允许编辑2次
也可以只写AUTO
代表默认的自动模式,相当于AUTO:3,6
3.8 match query示例
GET product/_search
{
"query": {
"match": {
"desc": {
"query": "erxi zhomg",
"fuzziness": 1,
"prefix_length": 0,
"max_expansions": 50
}
}
}
}
返回结果
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 5,
"relation" : "eq"
},
"max_score" : 0.7671452,
"hits" : [
{
"_index" : "product",
"_type" : "_doc",
"_id" : "4",
"_score" : 0.7671452,
"_source" : {
"name" : "xiaomi erji",
"desc" : "erji zhong de huangmenji",
"price" : 999,
"tags" : [
"low",
"bufangshui",
"yinzhicha"
]
}
},
{
"_index" : "product",
"_type" : "_doc",
"_id" : "5",
"_score" : 0.7671452,
"_source" : {
"name" : "hongmi erji",
"desc" : "erji zhong de kendeji",
"price" : 399,
"tags" : [
"lowbee",
"xuhangduan",
"zhiliangx"
]
}
},
{
"_index" : "product",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.07353278,
"_source" : {
"name" : "xiaomi phone",
"desc" : "shouji zhong de zhandouji",
"price" : 3999,
"tags" : [
"xingjiabi",
"fashao",
"buka"
]
}
},
{
"_index" : "product",
"_type" : "_doc",
"_id" : "3",
"_score" : 0.07353278,
"_source" : {
"name" : "nfc phone",
"desc" : "shouji zhong de hongzhaji",
"price" : 2999,
"tags" : [
"xingjiabi",
"fashao",
"menjinka"
]
}
},
{
"_index" : "product",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.05736515,
"_source" : {
"name" : "xiaomi nfc phone",
"desc" : "zhichi quangongneng nfc,shouji zhong de jianjiji",
"price" : 4999,
"tags" : [
"xingjiabi",
"fashao",
"gongjiaoka"
]
}
}
]
}
}
3.9 fuzzy查询示例
GET /product/_search
{
"query": {
"fuzzy": {
"desc": {
"value": "erxi",
"fuzziness": 1,
"prefix_length": 0,
"max_expansions": 50
}
}
}
}
返回结果
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 0.69361246,
"hits" : [
{
"_index" : "product",
"_type" : "_doc",
"_id" : "4",
"_score" : 0.69361246,
"_source" : {
"name" : "xiaomi erji",
"desc" : "erji zhong de huangmenji",
"price" : 999,
"tags" : [
"low",
"bufangshui",
"yinzhicha"
]
}
},
{
"_index" : "product",
"_type" : "_doc",
"_id" : "5",
"_score" : 0.69361246,
"_source" : {
"name" : "hongmi erji",
"desc" : "erji zhong de kendeji",
"price" : 399,
"tags" : [
"lowbee",
"xuhangduan",
"zhiliangx"
]
}
}
]
}
}
2)示例2
由于它查询条件不分词,所以无结果
GET /product/_search
{
"query": {
"fuzzy": {
"desc": {
"value": "erji zhong",
"fuzziness": 1,
"prefix_length": 0,
"max_expansions": 50
}
}
}
}
查询结果
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}