elasticsearch学习笔记——DSL查询

基于elasticsearch7.8.0,指令来自官网。个人笔记,备忘。

批操作

POST /_bulk
POST /<index>/_bulk

POST _bulk
{ "index" : { "_index" : "test", "_id" : "1" } }
{ "field1" : "value1" }
{ "delete" : { "_index" : "test", "_id" : "2" } }
{ "create" : { "_index" : "test", "_id" : "3" } }
{ "field1" : "value3" }
{ "update" : {"_id" : "1", "_index" : "test"} }
{ "doc" : {"field2" : "value2"} }

一般搜索

  • 默认查询10条记录,match_all:所有,sort:排序
GET /bank/_search
{
  "query": { "match_all": {} },
  "sort": [
    { "account_number": "asc" }
  ],
  "from": 10,
  "size": 10
}
  • match:字段全文搜索
GET /bank/_search
{
  "query": { "match": { "address": "mill lane" } }
}
  • match_phrase:短语查询
GET /bank/_search
{
  "query": { "match_phrase": { "address": "mill lane" } }
}
  • bool:组合多个查询条件
  • must:必须
  • should:应该
  • must_not:必须不
GET /bank/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "age": "40" } }
      ],
      "must_not": [
        { "match": { "state": "ID" } }
      ]
    }
  }
}
  • filter:过滤,常用与范围筛选
GET /bank/_search
{
  "query": {
    "bool": {
      "must": { "match_all": {} },
      "filter": {
        "range": {
          "balance": {
            "gte": 20000,
            "lte": 30000
          }
        }
      }
    }
  }
}
  • terms\term:精准匹配,不分词,常用语聚合
  • group_by_state:聚合,需要聚合条件,无论写不写集合内容,都会自动显示聚合数据量
  • aggs:最外层出现表示要做聚合,内部出现表示聚合计算显示于自定义字段
GET /bank/_search
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword"
      }
    }
  }
}
  •  avg:平均值值
GET /bank/_search
{
  "size":0,
  "aggs":{
    "group_by_state":{
      "terms":{
        "field":"state.keyword"
      },
      "aggs":{
        "average_balance":{
          "avg":{
            "field":"balance"
          }
        }
      }
    }
  }
}
average_balance表示自定义字段
  • order:排序,可对聚合计算后的自定义字段排序
GET /bank/_search
{
  "size":0,
  "aggs":{
    "group_by_state":{
      "terms":{
        "field":"state.keyword",
        "order":{
          "average_balance":"desc"
        }
      },
      "aggs":{
        "average_balance":{
          "avg":{
            "field":"balance"
          }
        }
      }
    }
  }
}
  • _source:指定搜索结果字段includes
  • includes:在_source中,包含的字段,可以写通配符
  • excludes:不包含
GET /_search
{
  "_source": {
    "includes": [ "obj1.*", "obj2.*" ],
    "excludes": [ "*.description" ]
  },
  "query": {
    "term": {
      "user.id": "kimchy"
    }
  }
}
  • docvalue_fields:直接从doc_value返回值,即直接从索引获得数据,不去找_source效率高
  • format:自定返回的doc格式
GET /_search
{
  "query": {
    "match_all": {}
  },
  "docvalue_fields": [
    "my_ip*",                     
    {
      "field": "my_keyword_field" 
    },
    {
      "field": "*_date_field",
      "format": "epoch_millis"    
    }
  ]
}

 

分页

  • collapse:折叠结果,可在inner_hitts中嵌套,二级嵌套后不可再用inner_hiits
  • inner_hits:扩展展示内容,展示内部匹配内容,可写负数内容
  • max_concurrent_group_searches:每个组允许检索并发inner_hits的并发请求数
GET /twitter/_search
{ "query": { "match": { "message": "elasticsearch" } }, "collapse": { "field": "user", "inner_hits": { "name": "last_tweets", "size": 5, "sort": [ { "date": "asc" } ] }, "max_concurrent_group_searches": 4 }, "sort": [ "likes" ] }
  • scroll:滚动搜索,类似关系型数据库的游标,查询出大量数据时提升效率,只需要在初始查询末尾加上?scroll=xxx即可,xxx表示游标存活时间,之后的返回值会得到scorll_id。删除仅需DELETE /_search/scroll中制定scroll_id即可
POST /saltfish_index/_search?scroll=1m
{
  "size": 1,
  "query":{
    "match_all": {}
  }
}
  • scroll_id:第一次启动滚动查询请求后返回的id,用于同一个查询的下一页,使用此查询中不可写{index}
POST /_search/scroll
{
  "scroll" : "1m",                                                                 
  "scroll_id" : "FGluY2x1ZGVfY29udGV4dF91dWlkDXF1ZXJ5QW5kRmV0Y2gBFFVMdFRybk1CbFljakZVWDIzZ1Q0AAAAAAAAA_MWZkpPRlYzMWNTdkdKbnE0V01LUjhQdw==" 
}
  • slice:对于大量数据的查询,在scorll中可用slice切片,切片可独立使用,多个切片根据设置的切片值需要做多次请求,获得多个切片scorll_id;默认切片计算:slice(doc) = floorMod(hashCode(doc._uid), max),默认切片会自动分配给自定义的切片;切片数量大于分片时,效率会慢,具体处理见官网:https://www.elastic.co/guide/en/elasticsearch/reference/current/paginate-search-results.html#slice-scroll
GET /twitter/_search?scroll=1m
{ "slice": { "id": 0, "max": 2 }, "query": { "match": { "title": "elasticsearch" } } } GET /twitter/_search?scroll=1m { "slice": { "id": 1, "max": 2 }, "query": { "match": { "title": "elasticsearch" } } }

max切片数必须大于1

  •  search_after:查询某个向量性字段某个值之后的值,相等值也会匹配上,类似于关系型数据库使用主键排序后查询某主键值之后的数据;因Scroll开销大,适用实时查询。
GET /twitter/_search

{
"size": 10, "query": { "match" : { "title" : "elasticsearch" } }, "search_after": [1463538857, "654323"], "sort": [ {"date": "asc"}, {"tie_breaker_id": "asc"} ] } '

 

排序

  • sort:排序
  • mode:排序模式
  1. min:仅适用数组字段
  2. max:仅适用数组字段
  3. sum:仅适用数组字段
  4. avg:仅适用数组字段
  5. median:中位数,仅适用数组字段
POST /_search
{
   "query" : {
      "term" : { "product" : "chocolate" }
   },
   "sort" : [
      {"price" : {"order" : "asc", "mode" : "avg"}}
   ]
}
  • numeric_type:将一个字段类型转为另一个,对于交叉索引搜索排序有用
POST /index_long,index_double/_search
{
   "sort" : [
      {
        "field" : {
            "numeric_type" : "date_nanos"
        }
      }
   ]
}
  • nested:嵌套内容排序,必须是对嵌套字段,外层的过滤不能对里层产生效果,必须手写里层的过滤规则,切仅用于排序时过滤,不会减少结果
  1. path:确定嵌套对象
  2. filter:嵌套过滤
  3. max_children
POST /_search
{
   "query": {
      "nested": {
         "path": "parent",
         "query": {
            "bool": {
                "must": {"range": {"parent.age": {"gte": 21}}},
                "filter": {
                    "nested": {
                        "path": "parent.child",
                        "query": {"match": {"parent.child.name": "matt"}}
                    }
                }
            }
         }
      }
   },
   "sort" : [
      {
         "parent.child.age" : {
            "mode" :  "min",
            "order" : "asc",
            "nested": {
               "path": "parent",
               "filter": {
                  "range": {"parent.age": {"gte": 21}}
               },
               "nested": {
                  "path": "parent.child",
                  "filter": {
                     "match": {"parent.child.name": "matt"}
                  }
               }
            }
         }
      }
   ]
}
  • missing:缺失值,可用于设置为_last、_first值,默认是_last
GET /_search
{
  "sort" : [
    { "price" : {"missing" : "_last"} }
  ],
  "query" : {
    "term" : { "product" : "chocolate" }
  }
}
  • unmapped_type:当字段没有映射类型时,设置为制定的类型,因为没有映射类型的字段将无法排序
GET /_search
{
  "sort" : [
    { "price" : {"unmapped_type" : "long"} }
  ],
  "query" : {
    "term" : { "product" : "chocolate" }
  }
}
  • _geo_distance:地理位置排序,含有经纬度数据,用于geo_point类型字段
  1. distance_type:如何计算距离。可以是arc(默认值),也可以是plane(更快,但在长距离和极点附近不准确)。
  2. mode:min,max,median和avg
  3. unit:计算排序值时使用的单位。默认值为m(米)。
  4. ignore_unmapped:指示是否应将未映射的字段视为缺失值。将其设置true为等于unmapped_type在字段排序中指定。默认值为false(未映射的字段会导致搜索失败)。
  5. pin.location:标记目标地点,经纬数据,可用hash、数组、map等类型表示,可用于多地点
GET /_search
{
  "sort" : [
    {
      "_geo_distance" : {
          "pin.location" : [-70, 40],
          "order" : "asc",
          "unit" : "km",
          "mode" : "min",
          "distance_type" : "arc",
          "ignore_unmapped": true
      }
    }
  ],
  "query" : {
    "term" : { "user" : "kimchy" }
  }
}
  • script:脚本排序,自定义排序
GET /_search
{
  "query": {
    "term": { "user": "kimchy" }
  },
  "sort": {
    "_script": {
      "type": "number",
      "script": {
        "lang": "painless",
        "source": "doc['field_name'].value * params.factor",
        "params": {
          "factor": 1.1
        }
      },
      "order": "asc"
    }
  }
}
  • track_scores:计分,通常排序会忽略分数,启动此属性仍会计分
GET /_search
{
  "track_scores": true,
  "sort" : [
    { "post_date" : {"order" : "desc"} },
    { "name" : "desc" },
    { "age" : "desc" }
  ],
  "query" : {
    "term" : { "user" : "kimchy" }
  }
}

 跨集群搜索

  • cluster:需要提交配置,指定集群。集群索引写法:cluster_name:index_name,多集群索引用逗号隔开,如:cluster_one:twitter,cluster_two:twitter
PUT _cluster/settings
{
  "persistent": {
    "cluster": {
      "remote": {
        "cluster_one": {
          "seeds": [
            "127.0.0.1:9300"
          ]
        },
        "cluster_two": {
          "seeds": [
            "127.0.0.1:9301"
          ]
        },
        "cluster_three": {
          "seeds": [
            "127.0.0.1:9302"
          ]
        }
      }
    }
  }
}
  • skip_unavailable:跳过不可用集群,若搜索中的任意集群不可用,都会返回错误,此设置用来跳过不可用集群
PUT _cluster/settings
{
  "persistent": {
    "cluster.remote.cluster_two.skip_unavailable": true
  }
}

 查询优化

  • boosting:用来自定义搜索命中分数
  1. positive:搜索条件
  2. negative:搜索命中时再根据此条件可按比例调整分数,根据官方文档描述是用来减分的,但是实际上可以把negative_boost设置大于1,导致此条件命中的文档分数更高
  3. negative_boost:调整分数比例
GET /_search
{
  "query": {
    "boosting": {
      "positive": {
        "term": {
          "text": "apple"
        }
      },
      "negative": {
        "term": {
          "text": "pie tart fruit crumble tree"
        }
      },
      "negative_boost": 0.5
    }
  }
}
  • constant_score:恒定分数,搜索文档会得出匹配分数,当搜索的一个词在文档中出现越多,则匹配分数越高,当使用这个条件,即可忽略检索词频,给出同样的分数,按理论能提升效率
  1. boost:恒定分数值,自定义设置的分数,默认1.0
  2. filter:满足搜索的条件
GET /_search
{
  "query": {
    "constant_score": {
      "filter": {
        "term": { "user.id": "kimchy" }
      },
      "boost": 1.2
    }
  }
}
  • dis_max:获得多匹配子句的最高得分,queries中可写多个查询子句,取最大匹配得分为准
  1. queries:写多个查询子句
  2. tie_breaker:0.0~1.0之间相关性得分,默认0.0
GET /_search
{
  "query": {
    "dis_max": {
      "queries": [
        { "term": { "title": "Quick pets" } },
        { "term": { "body": "Quick pets" } }
      ],
      "tie_breaker": 0.7
    }
  }
}

自定义分数

太多了,懒得记了,见官方文档吧

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html

posted on 2020-08-01 20:21  SaltFishYe  阅读(485)  评论(0编辑  收藏  举报

导航