ES 查询
1、构造运行环境
打开kibana进入Dev tools,创建索引,并插入测试数据,代码如下:
PUT /logs/_doc/1 { "Name":"燕麦", "Desc":"燕麦商品描述", "Price":111, "Tags":["Breakfast","Carbon","Cheap"] } PUT /logs/_doc/2 { "Name":"牛奶", "Desc":"牛奶商品描述", "Price":222, "Tags":["Breakfast","Nutrition","Expensive"] } PUT /logs/_doc/3 { "Name":"面包", "Desc":"牛奶商品描述", "Price":333, "Tags":["Breakfast","Barley","Cheap","Carbon"] } PUT /logs/_doc/4 { "Name":"玉米", "Desc":"玉米商品描述", "Price":444, "Tags":["Breakfast","Vegetables","Cheap","Carbon"] } PUT /logs/_doc/5 { "Name":"葡萄", "Desc":"葡萄商品描述", "Price":555, "Tags":["Breakfast","Fruits","Expensive","Carbon"] }
执行以上代码,并执行搜索,查看数据是否插入,简单搜索代码如下:
GET /logs/_search
或者
GET /logs/_search
{
"query": {
"match_all": {}
}
}
注:这里相当于select * from 表名 where 1=1,match_all相当于匹配所有.
查询结果集如下:
{ "took": 2,//当前请求花费的时间 "timed_out": false,//当前请求是否超时
//当前请求的分片情况 "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 5,//当前请求查询到5条结果 "relation": "eq" //当前查询的关系是等于 }, "max_score": 1, //当前查询的评分,最大是1.0 "hits": [ { "_index": "logs", "_id": "1", "_score": 1, "_source": { "Name": "燕麦", "Desc": "燕麦商品描述", "Price": 111, "Tags": [ "Breakfast", "Carbon", "Cheap" ] } }, { "_index": "logs", "_id": "2", "_score": 1, "_source": { "Name": "牛奶", "Desc": "牛奶商品描述", "Price": 222, "Tags": [ "Breakfast", "Nutrition", "Expensive" ] } }, { "_index": "logs", "_id": "3", "_score": 1, "_source": { "Name": "面包", "Desc": "牛奶商品描述", "Price": 333, "Tags": [ "Breakfast", "Barley", "Cheap", "Carbon" ] } }, { "_index": "logs", "_id": "4", "_score": 1, "_source": { "Name": "玉米", "Desc": "玉米商品描述", "Price": 444, "Tags": [ "Breakfast", "Vegetables", "Cheap", "Carbon" ] } }, "_index": "logs", "_id": "5", "_score": 1, "_source": { "Name": "葡萄", "Desc": "葡萄商品描述", "Price": 555, "Tags": [ "Breakfast", "Fruits", "Expensive", "Carbon" ] } } ] } }
这里插入成功了.
2、相关度评分
1中的demo查询结果集中有一个max_score字段就是相关度评分,当使用查询接口时没有指定排序字段,那么ES就会按照每条记录的评分进行排序.相关度评分中涉及到两种算法,会在后续的文章中进行介绍.
3、元数据
1中的demo查询结果集有一个source字段就是元数据,其大致结构如下
"_source": { "Name": "燕麦", "Desc": "燕麦商品描述", "Price": 111, "Tags": [ "Breakfast", "Carbon", "Cheap" ] }
结果类似与关系型数据库中的表字段和相关的值.
3.1、禁用元数据(source)
3.1.1 优缺点
优点:节省存储开销
缺点:不支持update、update_by_query、reindex_api、不支持高亮、不支持reindex,更改mapping分析器以及版本升级、通过索引时查看原文档进行聚合查询会失效、导致自动修复索引的功能实现.
注:如果单纯为了介绍存储开销,可以使用压缩索引,比禁用source更好.
3.1.2 通过创建索引时指定mapping配置来控制source
缺点:通过创建索引时指定mapping配置,来强制限制souce的字段查询的方式或者在mapping中禁用source,设置之后将无法修改.
(1)、完全禁用source
PUT /logs { "mappings": { "_source": { "enabled": false } } }
(2)、禁用source中的部分字段
PUT /logs { "mappings": { "_source": { "includes": [ "Name", "Tags" ], "excludes": [ "Price", "Desc" ] } } }
通过指定includes和excludes来展示和禁用source中的字段
注:不推荐使用mapping的方式来控制source!!!!!!!
3.1.3 通过在查询条件来控制source
(1)、查询时禁用mapping
GET /logs/_search { "_source":false, "query": { "match_all": {} } }
这时结果集中不会包含mapping等相关信息,只包含index、id、score等相关信息.
(2)、常规搜索
GET /logs/_search { "_source": ["Name","Tags"], "query": { "match_all": {} } }
通过在api中指定_source来控制查询结果返回的字段,类似与关系型数据库中的 select Name,Tags from logs;
(3)、通配符查找
删除1demo中的索引,新建以下索引
PUT /logs/_doc/1 { "Name":"燕麦", "Desc":"燕麦商品描述", "Price":111, "Items":{ "Name":"子名称", "Price":222 }, "Tags":["Breakfast","Carbon","Cheap"] }
logs索引中包含一个Items的对象属性,如果此时搜索只希望查Items的相关信息,可以执行以下操作
GET /logs/_search { "_source":["Items.*"], "query": { "match_all": {} } }
返回的结果如下:
"hits": [ { "_index": "logs", "_id": "1", "_score": 1, "_source": { "Items": { "Price": 222, "Name": "子名称" } } } ]
(4)、includes查看和exincludes查找
GET /logs/_search { "_source":{ "includes": ["Items.*","Price","Desc"], "excludes": ["Name"] }, "query": { "match_all": {} } }
这里可以指定查询包含哪些字段和不包含哪些字段
4、QueryString 查询
4.1 查询所有
GET /logs/_search
4.2 分页搜索
GET /logs/_search?from=0&size=3&sort=Price:desc
查询从第from条开始,一共查size条数据,排序条件时Price按desc排序.
4.3 精准匹配
GET /logs/_search?q=Name:牛奶
这里查询的就是log中Name索引集合中值为牛奶的document.
注:ES默认会为所有的字段创建倒排索引,如果通过q=字段:字段值的形式进行搜索,ES会去指定字段的索引集合查找相关的值并返回.
4.4 all搜索
GET /logs/_search?q=111
重点注意:ES默认会为所有的字段创建倒排索引,所以如4.3中一样,查询条件没有以q=字段:字段值的形式进行搜索,ES扫描所有建立了倒排索引的字段.所以这里的结果集如下:
"hits": [ { "_index": "logs", "_id": "2", "_score": 1.3167865, "_source": { "Name": "牛奶", "Desc": "牛奶商品描述111", "Price": 222, "Tags": [ "Breakfast", "Nutrition", "Expensive" ] } }, { "_index": "logs", "_id": "1", "_score": 1, "_source": { "Name": "燕麦", "Desc": "燕麦商品描述", "Price": 111, "Tags": [ "Breakfast", "Carbon", "Cheap" ] } } ]
Price和Desc中包含111的记录都被检索出来了.
注:这里需要注意精准匹配的问题.demo中Desc字段会被分词,在进行匹配,但是Price并不会.!!!