Elasticsearch之索引、文档、组合查询、排序查询、filter过滤操作
# es的倒排索引(扩展阅读.md)
-把文章进行分词,对每个词建立索引
具体操作可以查看官方文档
https://www.elastic.co/guide/en/elasticsearch/reference/7.5/indices.html>
官方2版本的中文文档
https://www.elastic.co/guide/cn/elasticsearch/guide/current/index-settings.html
一 索引初始化
#新建一个lqz2的索引,索引分片数量为5,索引副本数量为1 PUT lqz2 { "settings": { "index":{ "number_of_shards":5, "number_of_replicas":1 } } } ''' number_of_shards 每个索引的主分片数,默认值是 5 。这个配置在索引创建后不能修改。 number_of_replicas 每个主分片的副本数,默认值是 1 。对于活动的索引库,这个配置可以随时修改。 '''
二 查询索引配置
#获取lqz2索引的配置信息 GET lqz2/_settings #获取所有索引的配置信息 GET _all/_settings #同上 GET _settings #获取lqz和lqz2索引的配置信息 GET lqz,lqz2/_settings
三 更新索引
#修改索引副本数量为2 PUT lqz/_settings { "number_of_replicas": 2 } #如遇到报错:cluster_block_exception,因为这是由于ES新节点的数据目录data存储空间不足,导致从master主节点接收同步数据的时候失败,此时ES集群为了保护数据,会自动把索引分片index置为只读read-only
PUT _all/_settings { "index": { "blocks": { "read_only_allow_delete": false } } }
四 删除索引
#删除lqz索引 DELETE lqz
一 新增文档
#新增一个id为1的书籍(POST和PUT都可以) POST lqz/_doc/1/_create #POST lqz/_doc/1 #POST lqz/_doc 会自动创建id,必须用Post { "title":"红楼梦", "price":12, "publish_addr":{ "province":"黑龙江", "city":"鹤岗" }, "publish_date":"2013-11-11", "read_num":199, "tag":["古典","名著"] }
二 查询文档
#查询lqz索引下id为7的文档 GET lqz/_doc/1 #查询lqz索引下id为7的文档,只要title字段 GET lqz/_doc/7?_source=title #查询lqz索引下id为7的文档,只要title和price字段 GET lqz/_doc/7?_source=title,price #查询lqz索引下id为7的文档,要全部字段 GET lqz/_doc/7?_source
三 修改文档
#修改文档(覆盖修改,原来的字段就没有了) PUT lqz/_doc/1 { "title":"xxxx", "price":333, "publish_addr":{ "province":"黑龙江", "city":"福州" } } #修改文档,增量修改,只修改某个字段(注意是post)(一定要注意包在doc中) POST lqz/_update/1 { "doc":{ "title":"修改" } }
四 删除文档
#删除文档id为10的 DELETE lqz/_doc/10
五 批量操作之_mget
#批量获取lqz索引_doc类型下id为2的数据和lqz2索引_doc类型下id为1的数据 GET _mget { "docs":[ { "_index":"lqz", "_type":"_doc", "_id":2 }, { "_index":"lqz2", "_type":"_doc", "_id":1 } ] } #批量获取lqz索引下id为1和2的数据 GET lqz/_mget { "docs":[ { "_id":2 }, { "_id":1 } ] } #同上 GET lqz/_mget { "ids":[1,2] }
六 批量操作之 bulk
PUT test/_doc/2/_create { "field1" : "value22" } POST _bulk { "index" : { "_index" : "test", "_id" : "1" } } { "field1" : "value1" } { "delete" : { "_index" : "test", "_id" : "2" } } { "create" : { "_index" : "test", "_id" : "3" } } { "field1" : "value3" } { "update" : {"_id" : "1", "_index" : "test"} } { "doc" : {"field2" : "value2"} }
一 前言
elasticsearch提供两种查询方式:
-
-
另外一种是通过DSL语句来进行查询,被称为DSL查询(Query DSL),DSL是Elasticsearch提供的一种丰富且灵活的查询语言,该语言以json请求体的形式出现,通过restful请求与Elasticsearch进行交互。
二 准备数据
PUT lqz/doc/1 { "name":"顾老二", "age":30, "from": "gu", "desc": "皮肤黑、武器长、性格直", "tags": ["黑", "长", "直"] } PUT lqz/doc/2 { "name":"大娘子", "age":18, "from":"sheng", "desc":"肤白貌美,娇憨可爱", "tags":["白", "富","美"] } PUT lqz/doc/3 { "name":"龙套偏房", "age":22, "from":"gu", "desc":"mmp,没怎么看,不知道怎么形容", "tags":["造数据", "真","难"] } PUT lqz/doc/4 { "name":"石头", "age":29, "from":"gu", "desc":"粗中有细,狐假虎威", "tags":["粗", "大","猛"] } PUT lqz/doc/5 { "name":"魏行首", "age":25, "from":"广云台", "desc":"仿佛兮若轻云之蔽月,飘飘兮若流风之回雪,mmp,最后竟然没有嫁给顾老二!", "tags":["闭月","羞花"] }
三 查询字符串
GET lqz/doc/_search?q=from:gu
还是使用GET
命令,通过_serarch
查询,查询条件是什么呢?条件是from
属性是gu
家的人都有哪些。
结果如下 { "took" : 1, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 3, "max_score" : 0.6931472, "hits" : [ { "_index" : "lqz", "_type" : "doc", "_id" : "4", "_score" : 0.6931472, "_source" : { "name" : "石头", "age" : 29, "from" : "gu", "desc" : "粗中有细,狐假虎威", "tags" : [ "粗", "大", "猛" ] } }, { "_index" : "lqz", "_type" : "doc", "_id" : "1", "_score" : 0.2876821, "_source" : { "name" : "顾老二", "age" : 30, "from" : "gu", "desc" : "皮肤黑、武器长、性格直", "tags" : [ "黑", "长", "直" ] } }, { "_index" : "lqz", "_type" : "doc", "_id" : "3", "_score" : 0.2876821, "_source" : { "name" : "龙套偏房", "age" : 22, "from" : "gu", "desc" : "mmp,没怎么看,不知道怎么形容", "tags" : [ "造数据", "真", "难" ] } } ] } }
我们来重点说下hits
,hits
是返回的结果集——所有from
属性为gu
的结果集。重点中的重点是_score
得分,得分是什么呢?根据算法算出跟查询条件的匹配度,匹配度高得分就高。后面再说这个算法是怎么回事。
四 结构化查询
我们现在使用DSL方式,来完成刚才的查询,查看来自顾家的都有哪些人。
GET lqz/_doc/_search { "query": { "match": { "from": "gu" } } }
上例,查询条件是一步步构建出来的,将查询条件添加到match
中即可,而match
则是查询所有from
字段的值中含有gu
的结果就会返回。 当然结果没啥变化:
{ "took" : 0, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 3, "max_score" : 0.6931472, "hits" : [ { "_index" : "lqz", "_type" : "doc", "_id" : "4", "_score" : 0.6931472, "_source" : { "name" : "石头", "age" : 29, "from" : "gu", "desc" : "粗中有细,狐假虎威", "tags" : [ "粗", "大", "猛" ] } }, { "_index" : "lqz", "_type" : "doc", "_id" : "1", "_score" : 0.2876821, "_source" : { "name" : "顾老二", "age" : 30, "from" : "gu", "desc" : "皮肤黑、武器长、性格直", "tags" : [ "黑", "长", "直" ] } }, { "_index" : "lqz", "_type" : "doc", "_id" : "3", "_score" : 0.2876821, "_source" : { "name" : "龙套偏房", "age" : 22, "from" : "gu", "desc" : "mmp,没怎么看,不知道怎么形容", "tags" : [ "造数据", "真", "难" ] } } ] } }
GET lqz/doc/_search { "query": { "match": { "from": "gu" } }, "sort": [ { "age": { "order": "desc" } } ] }
{ "took" : 0, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 3, "max_score" : null, "hits" : [ { "_index" : "lqz", "_type" : "doc", "_id" : "1", "_score" : null, "_source" : { "name" : "顾老二", "age" : 30, "from" : "gu", "desc" : "皮肤黑、武器长、性格直", "tags" : [ "黑", "长", "直" ] }, "sort" : [ 30 ] }, { "_index" : "lqz", "_type" : "doc", "_id" : "4", "_score" : null, "_source" : { "name" : "石头", "age" : 29, "from" : "gu", "desc" : "粗中有细,狐假虎威", "tags" : [ "粗", "大", "猛" ] }, "sort" : [ 29 ] }, { "_index" : "lqz", "_type" : "doc", "_id" : "3", "_score" : null, "_source" : { "name" : "龙套偏房", "age" : 22, "from" : "gu", "desc" : "mmp,没怎么看,不知道怎么形容", "tags" : [ "造数据", "真", "难" ] }, "sort" : [ 22 ] } ] } }
上例中,结果是以降序排列方式返回的。
GET lqz/doc/_search { "query": { "match_all": {} }, "sort": [ { "age": { "order": "asc" } } ] }
{ "took" : 0, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 5, "max_score" : null, "hits" : [ { "_index" : "lqz", "_type" : "doc", "_id" : "2", "_score" : null, "_source" : { "name" : "大娘子", "age" : 18, "from" : "sheng", "desc" : "肤白貌美,娇憨可爱", "tags" : [ "白", "富", "美" ] }, "sort" : [ 18 ] }, { "_index" : "lqz", "_type" : "doc", "_id" : "3", "_score" : null, "_source" : { "name" : "龙套偏房", "age" : 22, "from" : "gu", "desc" : "mmp,没怎么看,不知道怎么形容", "tags" : [ "造数据", "真", "难" ] }, "sort" : [ 22 ] }, { "_index" : "lqz", "_type" : "doc", "_id" : "5", "_score" : null, "_source" : { "name" : "魏行首", "age" : 25, "from" : "广云台", "desc" : "仿佛兮若轻云之蔽月,飘飘兮若流风之回雪,mmp,最后竟然没有嫁给顾老二!", "tags" : [ "闭月", "羞花" ] }, "sort" : [ 25 ] }, { "_index" : "lqz", "_type" : "doc", "_id" : "4", "_score" : null, "_source" : { "name" : "石头", "age" : 29, "from" : "gu", "desc" : "粗中有细,狐假虎威", "tags" : [ "粗", "大", "猛" ] }, "sort" : [ 29 ] }, { "_index" : "lqz", "_type" : "doc", "_id" : "1", "_score" : null, "_source" : { "name" : "顾老二", "age" : 30, "from" : "gu", "desc" : "皮肤黑、武器长、性格直", "tags" : [ "黑", "长", "直" ] }, "sort" : [ 30 ] } ] } }
GET lqz/doc/_search { "query": { "match_all": {} }, "sort": [ { "age": { "order": "desc" } } ], "from": 2, "size": 1 } #上例,首先以`age`降序排序,查询所有。并且在查询的时候,添加两个属性`from`和`size`来控制查询结果集的数据条数。 - from:从哪开始查 - size:返回几条结果 # 有了这个查询,如何分页? 一页有10条数据 第一页: "from": 0, "size": 10 第二页: "from": 10, "size": 10 第三页: "from": 20, "size": 10
{ "took" : 0, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 5, "max_score" : null, "hits" : [ { "_index" : "lqz", "_type" : "doc", "_id" : "5", "_score" : null, "_source" : { "name" : "魏行首", "age" : 25, "from" : "广云台", "desc" : "仿佛兮若轻云之蔽月,飘飘兮若流风之回雪,mmp,最后竟然没有嫁给顾老二!", "tags" : [ "闭月", "羞花" ] }, "sort" : [ 25 ] } ] } }
多个条件 - must(and) - should(or) - must_not(not) - filter
组合查询之must
# 查询form gu和age=30的数据 GET lqz/doc/_search { "query": { "bool": { "must": [ { "match": { "from": "gu" } }, { "match": { "age": "30" } } ] } } }
{ "took" : 8, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 1.287682, "hits" : [ { "_index" : "lqz", "_type" : "doc", "_id" : "1", "_score" : 1.287682, "_source" : { "name" : "顾老二", "age" : 30, "from" : "gu", "desc" : "皮肤黑、武器长、性格直", "tags" : [ "黑", "长", "直" ] } } ] } }
#查询`from`为`gu`或者`tags`为`闭月`的数据 GET lqz/doc/_search { "query": { "bool": { "should": [ { "match": { "from": "gu" } }, { "match": { "tags": "闭月" } } ] } } }
{ "took" : 1, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 4, "max_score" : 0.6931472, "hits" : [ { "_index" : "lqz", "_type" : "doc", "_id" : "4", "_score" : 0.6931472, "_source" : { "name" : "石头", "age" : 29, "from" : "gu", "desc" : "粗中有细,狐假虎威", "tags" : [ "粗", "大", "猛" ] } }, { "_index" : "lqz", "_type" : "doc", "_id" : "5", "_score" : 0.5753642, "_source" : { "name" : "魏行首", "age" : 25, "from" : "广云台", "desc" : "仿佛兮若轻云之蔽月,飘飘兮若流风之回雪,mmp,最后竟然没有嫁给顾老二!", "tags" : [ "闭月", "羞花" ] } }, { "_index" : "lqz", "_type" : "doc", "_id" : "1", "_score" : 0.2876821, "_source" : { "name" : "顾老二", "age" : 30, "from" : "gu", "desc" : "皮肤黑、武器长、性格直", "tags" : [ "黑", "长", "直" ] } }, { "_index" : "lqz", "_type" : "doc", "_id" : "3", "_score" : 0.2876821, "_source" : { "name" : "龙套偏房", "age" : 22, "from" : "gu", "desc" : "mmp,没怎么看,不知道怎么形容", "tags" : [ "造数据", "真", "难" ] } } ] } }
#查询`from`既不是`gu`并且`tags`也不是`可爱`,还有`age`不是`18`的数据 GET lqz/doc/_search { "query": { "bool": { "must_not": [ { "match": { "from": "gu" } }, { "match": { "tags": "可爱" } }, { "match": { "age": 18 } } ] } } }
filter查询
filter条件过滤查询,过滤条件的范围用`range`表示,`gt`表示大于
gt:大于 lt:小于 get:大于等于 let:小于等于
#查询`from`为`gu`,`age`大于`25`的数据 GET lqz/doc/_search { "query": { "bool": { "must": [ { "match": { "from": "gu" } } ], "filter": { "range": { "age": { "gt": 25 } } } } } }
{ "took" : 2, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 2, "max_score" : 0.6931472, "hits" : [ { "_index" : "lqz", "_type" : "doc", "_id" : "4", "_score" : 0.6931472, "_source" : { "name" : "石头", "age" : 29, "from" : "gu", "desc" : "粗中有细,狐假虎威", "tags" : [ "粗", "大", "猛" ] } }, { "_index" : "lqz", "_type" : "doc", "_id" : "1", "_score" : 0.2876821, "_source" : { "name" : "顾老二", "age" : 30, "from" : "gu", "desc" : "皮肤黑、武器长、性格直", "tags" : [ "黑", "长", "直" ] } } ] } }
小结:
-
must
:与关系,相当于关系型数据库中的and
。 -
should
:或关系,相当于关系型数据库中的or
。 -
must_not
:非关系,相当于关系型数据库中的not
。 -
filter
:过滤条件。 -
range
:条件筛选范围。 -
gt
:大于,相当于关系型数据库中的>
。 -
gte
:大于等于,相当于关系型数据库中的>=
。 -
lt
:小于,相当于关系型数据库中的<
。 -
lte
:小于等于,相当于关系型数据库中的<=
。
一 前言
在未来,一篇文档可能有很多的字段,每次查询都默认给我们返回全部,在数据量很大的时候,是的,比如我只想查姑娘的手机号,你一并给我个喜好啊、三围什么的算什么? 所以,我们对结果做一些过滤,清清白白的告诉elasticsearch
PUT lqz/doc/1 { "name":"顾老二", "age":30, "from": "gu", "desc": "皮肤黑、武器长、性格直", "tags": ["黑", "长", "直"] }
三 结果过滤:_source
现在,在所有的结果中,我只需要查看name
和age
两个属性,其他的不要怎么办?
GET lqz/doc/_search { "query": { "match": { "name": "顾老二" } }, "_source": ["name", "age"] }
{ "took" : 8, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.8630463, "hits" : [ { "_index" : "lqz", "_type" : "doc", "_id" : "1", "_score" : 0.8630463, "_source" : { "name" : "顾老二", "age" : 30 } } ] } }
在数据量很大的时候,我们需要什么字段,就返回什么字段就好了,提高查询效率