elasticsearch学习笔记——DSL查询
基于elasticsearch7.8.0,指令来自官网。个人笔记,备忘。
批操作
POST /_bulk POST /<index>/_bulk POST _bulk { "index" : { "_index" : "test", "_id" : "1" } } { "field1" : "value1" } { "delete" : { "_index" : "test", "_id" : "2" } } { "create" : { "_index" : "test", "_id" : "3" } } { "field1" : "value3" } { "update" : {"_id" : "1", "_index" : "test"} } { "doc" : {"field2" : "value2"} }
一般搜索
- 默认查询10条记录,match_all:所有,sort:排序
GET /bank/_search { "query": { "match_all": {} }, "sort": [ { "account_number": "asc" } ], "from": 10, "size": 10 }
- match:字段全文搜索
GET /bank/_search { "query": { "match": { "address": "mill lane" } } }
- match_phrase:短语查询
GET /bank/_search { "query": { "match_phrase": { "address": "mill lane" } } }
- bool:组合多个查询条件
- must:必须
- should:应该
- must_not:必须不
GET /bank/_search { "query": { "bool": { "must": [ { "match": { "age": "40" } } ], "must_not": [ { "match": { "state": "ID" } } ] } } }
- filter:过滤,常用与范围筛选
GET /bank/_search { "query": { "bool": { "must": { "match_all": {} }, "filter": { "range": { "balance": { "gte": 20000, "lte": 30000 } } } } } }
- terms\term:精准匹配,不分词,常用语聚合
- group_by_state:聚合,需要聚合条件,无论写不写集合内容,都会自动显示聚合数据量
- aggs:最外层出现表示要做聚合,内部出现表示聚合计算显示于自定义字段
GET /bank/_search { "size": 0, "aggs": { "group_by_state": { "terms": { "field": "state.keyword" } } } }
- avg:平均值值
GET /bank/_search { "size":0, "aggs":{ "group_by_state":{ "terms":{ "field":"state.keyword" }, "aggs":{ "average_balance":{ "avg":{ "field":"balance" } } } } } }
average_balance表示自定义字段
- order:排序,可对聚合计算后的自定义字段排序
GET /bank/_search { "size":0, "aggs":{ "group_by_state":{ "terms":{ "field":"state.keyword", "order":{ "average_balance":"desc" } }, "aggs":{ "average_balance":{ "avg":{ "field":"balance" } } } } } }
- _source:指定搜索结果字段includes
- includes:在_source中,包含的字段,可以写通配符
- excludes:不包含
GET /_search { "_source": { "includes": [ "obj1.*", "obj2.*" ], "excludes": [ "*.description" ] }, "query": { "term": { "user.id": "kimchy" } } }
- docvalue_fields:直接从doc_value返回值,即直接从索引获得数据,不去找_source效率高
- format:自定返回的doc格式
GET /_search { "query": { "match_all": {} }, "docvalue_fields": [ "my_ip*", { "field": "my_keyword_field" }, { "field": "*_date_field", "format": "epoch_millis" } ] }
分页
- collapse:折叠结果,可在inner_hitts中嵌套,二级嵌套后不可再用inner_hiits
- inner_hits:扩展展示内容,展示内部匹配内容,可写负数内容
- max_concurrent_group_searches:每个组允许检索并发inner_hits的并发请求数
GET /twitter/_search
{ "query": { "match": { "message": "elasticsearch" } }, "collapse": { "field": "user", "inner_hits": { "name": "last_tweets", "size": 5, "sort": [ { "date": "asc" } ] }, "max_concurrent_group_searches": 4 }, "sort": [ "likes" ] }
- scroll:滚动搜索,类似关系型数据库的游标,查询出大量数据时提升效率,只需要在初始查询末尾加上?scroll=xxx即可,xxx表示游标存活时间,之后的返回值会得到scorll_id。删除仅需DELETE /_search/scroll中制定scroll_id即可
POST /saltfish_index/_search?scroll=1m { "size": 1, "query":{ "match_all": {} } }
- scroll_id:第一次启动滚动查询请求后返回的id,用于同一个查询的下一页,使用此查询中不可写{index}
POST /_search/scroll { "scroll" : "1m", "scroll_id" : "FGluY2x1ZGVfY29udGV4dF91dWlkDXF1ZXJ5QW5kRmV0Y2gBFFVMdFRybk1CbFljakZVWDIzZ1Q0AAAAAAAAA_MWZkpPRlYzMWNTdkdKbnE0V01LUjhQdw==" }
- slice:对于大量数据的查询,在scorll中可用slice切片,切片可独立使用,多个切片根据设置的切片值需要做多次请求,获得多个切片scorll_id;默认切片计算:slice(doc) = floorMod(hashCode(doc._uid), max),默认切片会自动分配给自定义的切片;切片数量大于分片时,效率会慢,具体处理见官网:https://www.elastic.co/guide/en/elasticsearch/reference/current/paginate-search-results.html#slice-scroll
GET /twitter/_search?scroll=1m
{
"slice": {
"id": 0,
"max": 2
},
"query": {
"match": {
"title": "elasticsearch"
}
}
}
GET /twitter/_search?scroll=1m
{
"slice": {
"id": 1,
"max": 2
},
"query": {
"match": {
"title": "elasticsearch"
}
}
}
max切片数必须大于1
- search_after:查询某个向量性字段某个值之后的值,相等值也会匹配上,类似于关系型数据库使用主键排序后查询某主键值之后的数据;因Scroll开销大,适用实时查询。
GET /twitter/_search
{ "size": 10, "query": { "match" : { "title" : "elasticsearch" } }, "search_after": [1463538857, "654323"], "sort": [ {"date": "asc"}, {"tie_breaker_id": "asc"} ] } '
排序
- sort:排序
- mode:排序模式
- min:仅适用数组字段
- max:仅适用数组字段
- sum:仅适用数组字段
- avg:仅适用数组字段
- median:中位数,仅适用数组字段
POST /_search { "query" : { "term" : { "product" : "chocolate" } }, "sort" : [ {"price" : {"order" : "asc", "mode" : "avg"}} ] }
- numeric_type:将一个字段类型转为另一个,对于交叉索引搜索排序有用
POST /index_long,index_double/_search
{
"sort" : [
{
"field" : {
"numeric_type" : "date_nanos"
}
}
]
}
- nested:嵌套内容排序,必须是对嵌套字段,外层的过滤不能对里层产生效果,必须手写里层的过滤规则,切仅用于排序时过滤,不会减少结果
- path:确定嵌套对象
- filter:嵌套过滤
- max_children
POST /_search { "query": { "nested": { "path": "parent", "query": { "bool": { "must": {"range": {"parent.age": {"gte": 21}}}, "filter": { "nested": { "path": "parent.child", "query": {"match": {"parent.child.name": "matt"}} } } } } } }, "sort" : [ { "parent.child.age" : { "mode" : "min", "order" : "asc", "nested": { "path": "parent", "filter": { "range": {"parent.age": {"gte": 21}} }, "nested": { "path": "parent.child", "filter": { "match": {"parent.child.name": "matt"} } } } } } ] }
- missing:缺失值,可用于设置为_last、_first值,默认是_last
GET /_search { "sort" : [ { "price" : {"missing" : "_last"} } ], "query" : { "term" : { "product" : "chocolate" } } }
- unmapped_type:当字段没有映射类型时,设置为制定的类型,因为没有映射类型的字段将无法排序
GET /_search { "sort" : [ { "price" : {"unmapped_type" : "long"} } ], "query" : { "term" : { "product" : "chocolate" } } }
- _geo_distance:地理位置排序,含有经纬度数据,用于geo_point类型字段
- distance_type:如何计算距离。可以是arc(默认值),也可以是plane(更快,但在长距离和极点附近不准确)。
- mode:min,max,median和avg
- unit:计算排序值时使用的单位。默认值为m(米)。
- ignore_unmapped:指示是否应将未映射的字段视为缺失值。将其设置true为等于unmapped_type在字段排序中指定。默认值为false(未映射的字段会导致搜索失败)。
- pin.location:标记目标地点,经纬数据,可用hash、数组、map等类型表示,可用于多地点
GET /_search { "sort" : [ { "_geo_distance" : { "pin.location" : [-70, 40], "order" : "asc", "unit" : "km", "mode" : "min", "distance_type" : "arc", "ignore_unmapped": true } } ], "query" : { "term" : { "user" : "kimchy" } } }
- script:脚本排序,自定义排序
GET /_search { "query": { "term": { "user": "kimchy" } }, "sort": { "_script": { "type": "number", "script": { "lang": "painless", "source": "doc['field_name'].value * params.factor", "params": { "factor": 1.1 } }, "order": "asc" } } }
- track_scores:计分,通常排序会忽略分数,启动此属性仍会计分
GET /_search { "track_scores": true, "sort" : [ { "post_date" : {"order" : "desc"} }, { "name" : "desc" }, { "age" : "desc" } ], "query" : { "term" : { "user" : "kimchy" } } }
跨集群搜索
- cluster:需要提交配置,指定集群。集群索引写法:cluster_name:index_name,多集群索引用逗号隔开,如:cluster_one:twitter,cluster_two:twitter
PUT _cluster/settings { "persistent": { "cluster": { "remote": { "cluster_one": { "seeds": [ "127.0.0.1:9300" ] }, "cluster_two": { "seeds": [ "127.0.0.1:9301" ] }, "cluster_three": { "seeds": [ "127.0.0.1:9302" ] } } } } }
- skip_unavailable:跳过不可用集群,若搜索中的任意集群不可用,都会返回错误,此设置用来跳过不可用集群
PUT _cluster/settings { "persistent": { "cluster.remote.cluster_two.skip_unavailable": true } }
查询优化
- boosting:用来自定义搜索命中分数
- positive:搜索条件
- negative:搜索命中时再根据此条件可按比例调整分数,根据官方文档描述是用来减分的,但是实际上可以把negative_boost设置大于1,导致此条件命中的文档分数更高
- negative_boost:调整分数比例
GET /_search { "query": { "boosting": { "positive": { "term": { "text": "apple" } }, "negative": { "term": { "text": "pie tart fruit crumble tree" } }, "negative_boost": 0.5 } } }
- constant_score:恒定分数,搜索文档会得出匹配分数,当搜索的一个词在文档中出现越多,则匹配分数越高,当使用这个条件,即可忽略检索词频,给出同样的分数,按理论能提升效率
- boost:恒定分数值,自定义设置的分数,默认1.0
- filter:满足搜索的条件
GET /_search { "query": { "constant_score": { "filter": { "term": { "user.id": "kimchy" } }, "boost": 1.2 } } }
- dis_max:获得多匹配子句的最高得分,queries中可写多个查询子句,取最大匹配得分为准
- queries:写多个查询子句
- tie_breaker:0.0~1.0之间相关性得分,默认0.0
GET /_search { "query": { "dis_max": { "queries": [ { "term": { "title": "Quick pets" } }, { "term": { "body": "Quick pets" } } ], "tie_breaker": 0.7 } } }
自定义分数
太多了,懒得记了,见官方文档吧
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html
posted on 2020-08-01 20:21 SaltFishYe 阅读(480) 评论(0) 编辑 收藏 举报