elasticsearch学习笔记——DSL查询

基于elasticsearch7.8.0，指令来自官网。个人笔记，备忘。

批操作

POST /_bulk
POST /<index>/_bulk

POST _bulk
{ "index" : { "_index" : "test", "_id" : "1" } }
{ "field1" : "value1" }
{ "delete" : { "_index" : "test", "_id" : "2" } }
{ "create" : { "_index" : "test", "_id" : "3" } }
{ "field1" : "value3" }
{ "update" : {"_id" : "1", "_index" : "test"} }
{ "doc" : {"field2" : "value2"} }

一般搜索

默认查询10条记录，match_all：所有，sort：排序

GET /bank/_search
{
  "query": { "match_all": {} },
  "sort": [
    { "account_number": "asc" }
  ],
  "from": 10,
  "size": 10
}

match：字段全文搜索

GET /bank/_search
{
  "query": { "match": { "address": "mill lane" } }
}

match_phrase：短语查询

GET /bank/_search
{
  "query": { "match_phrase": { "address": "mill lane" } }
}

bool：组合多个查询条件
must：必须
should：应该
must_not：必须不

GET /bank/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "age": "40" } }
      ],
      "must_not": [
        { "match": { "state": "ID" } }
      ]
    }
  }
}

filter：过滤，常用与范围筛选

GET /bank/_search
{
  "query": {
    "bool": {
      "must": { "match_all": {} },
      "filter": {
        "range": {
          "balance": {
            "gte": 20000,
            "lte": 30000
          }
        }
      }
    }
  }
}

terms\term：精准匹配，不分词，常用语聚合
group_by_state：聚合，需要聚合条件，无论写不写集合内容，都会自动显示聚合数据量
aggs：最外层出现表示要做聚合，内部出现表示聚合计算显示于自定义字段

GET /bank/_search
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword"
      }
    }
  }
}

avg：平均值值

GET /bank/_search
{
  "size":0,
  "aggs":{
    "group_by_state":{
      "terms":{
        "field":"state.keyword"
      },
      "aggs":{
        "average_balance":{
          "avg":{
            "field":"balance"
          }
        }
      }
    }
  }
}

average_balance表示自定义字段

order：排序，可对聚合计算后的自定义字段排序

GET /bank/_search
{
  "size":0,
  "aggs":{
    "group_by_state":{
      "terms":{
        "field":"state.keyword",
        "order":{
          "average_balance":"desc"
        }
      },
      "aggs":{
        "average_balance":{
          "avg":{
            "field":"balance"
          }
        }
      }
    }
  }
}

_source：指定搜索结果字段includes
includes：在_source中，包含的字段，可以写通配符
excludes：不包含

GET /_search
{
  "_source": {
    "includes": [ "obj1.*", "obj2.*" ],
    "excludes": [ "*.description" ]
  },
  "query": {
    "term": {
      "user.id": "kimchy"
    }
  }
}

docvalue_fields：直接从doc_value返回值，即直接从索引获得数据，不去找_source效率高
format：自定返回的doc格式

GET /_search
{
  "query": {
    "match_all": {}
  },
  "docvalue_fields": [
    "my_ip*",                     
    {
      "field": "my_keyword_field" 
    },
    {
      "field": "*_date_field",
      "format": "epoch_millis"    
    }
  ]
}

分页

collapse：折叠结果，可在inner_hitts中嵌套，二级嵌套后不可再用inner_hiits
inner_hits：扩展展示内容，展示内部匹配内容，可写负数内容
max_concurrent_group_searches：每个组允许检索并发inner_hits的并发请求数

GET /twitter/_search

{
  "query": {
    "match": {
      "message": "elasticsearch"
    }
  },
  "collapse": {
    "field": "user",                    
    "inner_hits": {
      "name": "last_tweets",            
      "size": 5,                        
      "sort": [ { "date": "asc" } ]     
    },
    "max_concurrent_group_searches": 4  
  },
  "sort": [ "likes" ]
}

scroll：滚动搜索，类似关系型数据库的游标，查询出大量数据时提升效率，只需要在初始查询末尾加上?scroll=xxx即可，xxx表示游标存活时间，之后的返回值会得到scorll_id。删除仅需DELETE /_search/scroll中制定scroll_id即可

POST /saltfish_index/_search?scroll=1m
{
  "size": 1,
  "query":{
    "match_all": {}
  }
}

scroll_id：第一次启动滚动查询请求后返回的id，用于同一个查询的下一页，使用此查询中不可写{index}

POST /_search/scroll
{
  "scroll" : "1m",                                                                 
  "scroll_id" : "FGluY2x1ZGVfY29udGV4dF91dWlkDXF1ZXJ5QW5kRmV0Y2gBFFVMdFRybk1CbFljakZVWDIzZ1Q0AAAAAAAAA_MWZkpPRlYzMWNTdkdKbnE0V01LUjhQdw==" 
}

slice：对于大量数据的查询，在scorll中可用slice切片，切片可独立使用，多个切片根据设置的切片值需要做多次请求，获得多个切片scorll_id；默认切片计算：slice(doc) = floorMod(hashCode(doc._uid), max)，默认切片会自动分配给自定义的切片；切片数量大于分片时，效率会慢，具体处理见官网：https://www.elastic.co/guide/en/elasticsearch/reference/current/paginate-search-results.html#slice-scroll

GET /twitter/_search?scroll=1m

{
  "slice": {
    "id": 0,                      
    "max": 2                      
  },
  "query": {
    "match": {
      "title": "elasticsearch"
    }
  }
}
GET /twitter/_search?scroll=1m
{
  "slice": {
    "id": 1,
    "max": 2
  },
  "query": {
    "match": {
      "title": "elasticsearch"
    }
  }
}

max切片数必须大于1

search_after：查询某个向量性字段某个值之后的值，相等值也会匹配上，类似于关系型数据库使用主键排序后查询某主键值之后的数据；因Scroll开销大，适用实时查询。

GET /twitter/_search

{
  "size": 10,
  "query": {
    "match" : {
      "title" : "elasticsearch"
    }
  },
  "search_after": [1463538857, "654323"],
  "sort": [
    {"date": "asc"},
    {"tie_breaker_id": "asc"}
  ]
}
'

排序

sort：排序
mode：排序模式

min：仅适用数组字段
max：仅适用数组字段
sum：仅适用数组字段
avg：仅适用数组字段
median：中位数，仅适用数组字段

POST /_search
{
   "query" : {
      "term" : { "product" : "chocolate" }
   },
   "sort" : [
      {"price" : {"order" : "asc", "mode" : "avg"}}
   ]
}

numeric_type：将一个字段类型转为另一个，对于交叉索引搜索排序有用

POST /index_long,index_double/_search
{
   "sort" : [
      {
        "field" : {
            "numeric_type" : "date_nanos"
        }
      }
   ]
}

nested：嵌套内容排序，必须是对嵌套字段，外层的过滤不能对里层产生效果，必须手写里层的过滤规则，切仅用于排序时过滤，不会减少结果

path：确定嵌套对象
filter：嵌套过滤
max_children

POST /_search
{
   "query": {
      "nested": {
         "path": "parent",
         "query": {
            "bool": {
                "must": {"range": {"parent.age": {"gte": 21}}},
                "filter": {
                    "nested": {
                        "path": "parent.child",
                        "query": {"match": {"parent.child.name": "matt"}}
                    }
                }
            }
         }
      }
   },
   "sort" : [
      {
         "parent.child.age" : {
            "mode" :  "min",
            "order" : "asc",
            "nested": {
               "path": "parent",
               "filter": {
                  "range": {"parent.age": {"gte": 21}}
               },
               "nested": {
                  "path": "parent.child",
                  "filter": {
                     "match": {"parent.child.name": "matt"}
                  }
               }
            }
         }
      }
   ]
}

missing：缺失值，可用于设置为_last、_first值，默认是_last

GET /_search
{
  "sort" : [
    { "price" : {"missing" : "_last"} }
  ],
  "query" : {
    "term" : { "product" : "chocolate" }
  }
}

unmapped_type：当字段没有映射类型时，设置为制定的类型，因为没有映射类型的字段将无法排序

GET /_search
{
  "sort" : [
    { "price" : {"unmapped_type" : "long"} }
  ],
  "query" : {
    "term" : { "product" : "chocolate" }
  }
}

_geo_distance：地理位置排序，含有经纬度数据，用于geo_point类型字段

distance_type：如何计算距离。可以是arc（默认值），也可以是plane（更快，但在长距离和极点附近不准确）。
mode：min，max，median和avg
unit：计算排序值时使用的单位。默认值为m（米）。
ignore_unmapped：指示是否应将未映射的字段视为缺失值。将其设置true为等于unmapped_type在字段排序中指定。默认值为false（未映射的字段会导致搜索失败）。
pin.location：标记目标地点，经纬数据，可用hash、数组、map等类型表示，可用于多地点

GET /_search
{
  "sort" : [
    {
      "_geo_distance" : {
          "pin.location" : [-70, 40],
          "order" : "asc",
          "unit" : "km",
          "mode" : "min",
          "distance_type" : "arc",
          "ignore_unmapped": true
      }
    }
  ],
  "query" : {
    "term" : { "user" : "kimchy" }
  }
}

script：脚本排序，自定义排序

GET /_search
{
  "query": {
    "term": { "user": "kimchy" }
  },
  "sort": {
    "_script": {
      "type": "number",
      "script": {
        "lang": "painless",
        "source": "doc['field_name'].value * params.factor",
        "params": {
          "factor": 1.1
        }
      },
      "order": "asc"
    }
  }
}

track_scores：计分，通常排序会忽略分数，启动此属性仍会计分

GET /_search
{
  "track_scores": true,
  "sort" : [
    { "post_date" : {"order" : "desc"} },
    { "name" : "desc" },
    { "age" : "desc" }
  ],
  "query" : {
    "term" : { "user" : "kimchy" }
  }
}

跨集群搜索

cluster：需要提交配置，指定集群。集群索引写法：cluster_name:index_name，多集群索引用逗号隔开，如：cluster_one:twitter,cluster_two:twitter

PUT _cluster/settings
{
  "persistent": {
    "cluster": {
      "remote": {
        "cluster_one": {
          "seeds": [
            "127.0.0.1:9300"
          ]
        },
        "cluster_two": {
          "seeds": [
            "127.0.0.1:9301"
          ]
        },
        "cluster_three": {
          "seeds": [
            "127.0.0.1:9302"
          ]
        }
      }
    }
  }
}

skip_unavailable：跳过不可用集群，若搜索中的任意集群不可用，都会返回错误，此设置用来跳过不可用集群

PUT _cluster/settings
{
  "persistent": {
    "cluster.remote.cluster_two.skip_unavailable": true
  }
}

查询优化

boosting：用来自定义搜索命中分数

positive：搜索条件
negative：搜索命中时再根据此条件可按比例调整分数，根据官方文档描述是用来减分的，但是实际上可以把negative_boost设置大于1，导致此条件命中的文档分数更高
negative_boost：调整分数比例

GET /_search
{
  "query": {
    "boosting": {
      "positive": {
        "term": {
          "text": "apple"
        }
      },
      "negative": {
        "term": {
          "text": "pie tart fruit crumble tree"
        }
      },
      "negative_boost": 0.5
    }
  }
}

constant_score：恒定分数，搜索文档会得出匹配分数，当搜索的一个词在文档中出现越多，则匹配分数越高，当使用这个条件，即可忽略检索词频，给出同样的分数，按理论能提升效率

boost：恒定分数值，自定义设置的分数，默认1.0
filter：满足搜索的条件

GET /_search
{
  "query": {
    "constant_score": {
      "filter": {
        "term": { "user.id": "kimchy" }
      },
      "boost": 1.2
    }
  }
}

dis_max：获得多匹配子句的最高得分，queries中可写多个查询子句，取最大匹配得分为准

queries：写多个查询子句
tie_breaker：0.0~1.0之间相关性得分，默认0.0

GET /_search
{
  "query": {
    "dis_max": {
      "queries": [
        { "term": { "title": "Quick pets" } },
        { "term": { "body": "Quick pets" } }
      ],
      "tie_breaker": 0.7
    }
  }
}

自定义分数

太多了，懒得记了，见官方文档吧

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html

posted on 2020-08-01 20:21 SaltFishYe 阅读(485) 评论(0) 编辑收藏举报

刷新页面返回顶部

SaltFishYe

elasticsearch学习笔记——DSL查询

导航

公告