elasticSearch小结

1、matchQuery和termQuery区别

  matchQuery:会将搜索词分词,再与目标查询字段进行匹配,若分词中的任意一个词与目标字段匹配上,则可查询到。

  matchPhrasePrefix:将搜索词分词,再与目标查询字段进行匹配,当全部分词匹配上,且位置与分词相同,则匹配上

  termQuery:不会对搜索词进行分词处理,而是作为一个整体与目标字段进行匹配,若完全匹配,则可查询到。

  wildcardQuery:模糊匹配, 是 term 级别的 query,支持通配符,如:QueryBuilders.wildcardQuery("content", "?全*"),其中?表示一个字符,*表示0个或多个字符

  FuzzyQuery:模糊匹配  Term t = new Term("content""work"); FuzzyQuery query = new FuzzyQuery(t, 0.1f, 1);第一个参数当然是词条对象,第二个参数

        指的是levenshtein算法的最小相似度(默认的匹配度是0.5,当这个值越小时,通过模糊查找出的文档的匹配程度就越低,查出的文档量就越多,反之亦然),第三个

        参数指的是要有多少个前缀字母完全匹配

例如:

GET /test_index/_search
{
    "from": 0,
    "size": 100,
    "timeout": "60s",
    "query": {
        "bool": {
            "must": [{
                "match_phrase_prefix": {
                    "deviceUuidFristLogin": {
                        "query": "XXXXXXXXXXXXXXXXXXXXX",
                        "slop": 0,
                        "max_expansions": 50,
                        "boost": 1.0
                    }
                }
            }],
            "adjust_pure_negative": true,
            "boost": 1.0
        }
    }
}

2、must与should区别  

  must :  相当于 MySQL and 

  should : 相当于MySQL or (效率低)

3、如何查看一个字符串是如何分词的

GET /test_index/_analyze
{
  "field": "deviceUuidFristLogin",
  "text": "xxxxxxxxxxxxxxxxxxxxxxxxxxxx"
}

4、ElasticSearch 5.0以后,String字段被拆分成两种新的数据类型: text用于全文搜索,会分词,而keyword用于关键词搜索,不进行分词。对于字符串类型的字段,ES默认会再生成一个keyword字段用于精确索引。默认mapping如下:

"mapping": {
    "properties": {
      "id": {
        "type": "long"
      },
      "searchField": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      }
 }

  

 

5、fielddata

doc value 是在排序,分组等 需要文档映射到具体字段的一种正向索引,适用于很多类型字段,存储在磁盘上。 

field data 是单指text 类型 也就是可以分词的类型的字段 在使用排序或分组等情况下 在内存中形成的一种正向索引,耗内存,一般默认不使用。 

所以es查询在排序(sort)时的字段不推荐是text类型的;range范围查询时也不能使用text类型

另外es的聚合查询(max、min、avg、sum、terms/ranges--桶聚合等)也不能使用text类型字段

在ES5.x+里,一定要注意数值类型是否需要做范围查询,看似数值,但其实只用于Term或者Terms这类精确匹配的,应该定义为keyword类型,而不应该是long,例如userId,buyerId,sellerId 

 

 

es分页常用解决方案:

方案一:由于es使用from、size最大能查不超过10000条记录,且一般后面的分页不会查看,所以可以限制最大分页数,比如限制最大100页

方案二:不限制最大页码,但是不允许跳页(像百度、google),只允许点下一页:这种可以使用search_after的方式,比如:

 第一次查询,展示5条

GET /test_index/_search
{
    "size": 2,
    "query": {
        "match" : {
            "sellerUserId": "xxxxx"
        }
    },
    
    "sort": [
        {"orderAddtime": "desc"},
        {"_id": "desc"}
    ]

}
View Code

返回:

{
  "took" : 58,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 108783,
    "max_score" : null,
    "hits" : [
      {
        "_index" : "test_indext",
        "_type" : "elasticsearch",
        "_id" : "xxxxxxxxxxxxxxx",
        "_score" : null,
        "_source" : {
          "orderSubTypeId" : 0,
          "productId" : 22166,
          "orderId" : xxxxxxxxxxxxxxx,
          "orderAddtime" : "2020-04-02 10:25:32",
          "productSize" : "",
          "orderTradeStatus" : 2000,
          "poundage" : 60.33,
          "sellerName" : "大眼睛潮品代购",
          "orderNum" : "xxxxxxxxxxxxxxx",
          "orderTypeId" : 0,
          "sellerUserId" : xxxxxxx,
          "sku_id" : xxxxxxx,
          "buyerName" : "单身A锥195",
          "productName" : "Champion 袖口单标 加绒 男女款 带帽卫衣 美版 天蓝色",
          "paidMoney" : 219.0,
          "price" : 239.0,
          "freightCost" : 18.0,
          "buyerUserId" : xxxxxxx,
          "buyerReceiveAddress" : "河南省周口市淮阳县润德第一城c栋",
          "buyerReceiveMobile" : "xxxxxxxxxxxxx",
          "discountInfo" : "包邮券:18.0元,优惠券:20.0元",
          "discountValue" : 38.0,
          "buyerPayAccountId" : "-1"
        },
        "sort" : [
          "2020-04-02 10:25:32",
          "xxxxxxxxxxxxx"
        ]
      },
      {
        "_index" : "test_indext",
        "_type" : "elasticsearch",
        "_id" : "xxxxxxxxxxxxx",
        "_score" : null,
        "_source" : {
          "orderSubTypeId" : 0,
          "productId" : 30739,
          "orderId" : xxxxxxxxxxxxx,
          "orderAddtime" : "2020-04-02 10:22:15",
          "productSize" : "",
          "orderTradeStatus" : 2000,
          "poundage" : 59.23,
          "sellerName" : "大眼睛潮品代购",
          "orderNum" : "xxxxxxxxxxxxx",
          "orderTypeId" : 0,
          "sellerUserId" : xxxxxxxxxxxxx,
          "sku_id" : 184429239,
          "buyerName" : "发呆影子灰qRr",
          "productName" : "Champion 圆领 薄长袖 T恤  美版 黑色",
          "paidMoney" : 123.0,
          "price" : 129.0,
          "freightCost" : 14.0,
          "buyerUserId" : xxxxxxxxxxxxx,
          "discountInfo" : "优惠券:20.0元",
          "discountValue" : 20.0,
          "buyerReceiveAddress" : "北京市北京市顺义区旺泉街道 石门苑25栋3门302",
          "buyerReceiveMobile" : "xxxxxxxxxxxxx",
          "buyerPayAccount" : "152****0708",
          "buyerPayAccountId" : "-1"
        },
        "sort" : [
          "2020-04-02 10:22:15",
          "xxxxxxxxxxxxx"
        ]
      },
      {
        "_index" : "test_indext",
        "_type" : "elasticsearch",
        "_id" : "xxxxxxxxxxxxx",
        "_score" : null,
        "_source" : {
          "buyerReceiveAddress" : "贵州省遵义市正安县安场镇播州大道安页四井",
          "buyerReceiveMobile" : "xxxxxxxxxxxxx",
          "orderSubTypeId" : 0,
          "productId" : 27657,
          "orderId" : xxxxxxxxxxxxx,
          "orderAddtime" : "2020-04-02 10:15:01",
          "productSize" : "",
          "orderTradeStatus" : 2000,
          "poundage" : 60.63,
          "sellerName" : "大眼睛潮品代购",
          "orderNum" : "xxxxxxxxxxxxx",
          "orderTypeId" : 0,
          "sellerUserId" : xxxxxxxxxxxxx,
          "sku_id" : xxxxxxxxxxxxx,
          "buyerName" : "鹤鹤有鸣",
          "productName" : "Champion 半拉链刺绣小Logo草写 冲锋衣 美版 藏青色",
          "paidMoney" : 287.0,
          "price" : 269.0,
          "freightCost" : 18.0,
          "buyerUserId" : xxxxxxxxxxxxx,
          "buyerPayAccount" : "132****7962",
          "buyerPayAccountId" : "-1"
        },
        "sort" : [
          "2020-04-02 10:15:01",
          "xxxxxxxxxxxxx"
        ]
      },
      {
        "_index" : "test_indext",
        "_type" : "elasticsearch",
        "_id" : "xxxxxxxxxxxxx",
        "_score" : null,
        "_source" : {
          "orderSubTypeId" : 0,
          "productId" : 29148,
          "orderId" : xxxxxxxxxxxxx,
          "orderAddtime" : "2020-04-02 10:13:33",
          "productSize" : "",
          "orderTradeStatus" : 2000,
          "poundage" : 59.33,
          "sellerName" : "大眼睛潮品代购",
          "orderNum" : "xxxxxxxxxxxxx",
          "orderTypeId" : 0,
          "sellerUserId" : xxxxxxxxxxxxx,
          "sku_id" : xxxxxxxxxxxxx,
          "buyerName" : "重情义绿龙虾nJt",
          "productName" : "champion  冠军 腿标 短裤 黑色",
          "paidMoney" : 162.0,
          "price" : 139.0,
          "freightCost" : 23.0,
          "buyerUserId" : xxxxxxxxxxxxx,
          "buyerReceiveAddress" : "云南省玉溪市澄江县龙街镇高西村委会小官庄102号",
          "buyerReceiveMobile" : "xxxxxxxxxxxxx",
          "buyerPayAccount" : "187****7958",
          "buyerPayAccountId" : "-1"
        },
        "sort" : [
          "2020-04-02 10:13:33",
          "xxxxxxxxxxxxx"
        ]
      },
      {
        "_index" : "test_indext",
        "_type" : "elasticsearch",
        "_id" : "xxxxxxxxxxxxx",
        "_score" : null,
        "_source" : {
          "orderSubTypeId" : 0,
          "productId" : 33977,
          "orderId" : xxxxxxxxxxxxx,
          "orderAddtime" : "2020-04-02 10:09:56",
          "productSize" : "",
          "orderTradeStatus" : 2000,
          "poundage" : 59.03,
          "sellerName" : "大眼睛潮品代购",
          "orderNum" : "xxxxxxxxxxxxx",
          "orderTypeId" : 0,
          "sellerUserId" : xxxxxxxxxxxxx,
          "sku_id" : xxxxxxxxxxxxx,
          "buyerName" : "AAA送你到楼梯",
          "productName" : "Champion 袖口单标基础款打底衫短袖T恤  美版 白色",
          "paidMoney" : 132.0,
          "price" : 109.0,
          "freightCost" : 23.0,
          "buyerUserId" : xxxxxxxxxxxxx,
          "buyerReceiveAddress" : "四川省乐山市市中区八仙洞17号",
          "buyerReceiveMobile" : "xxxxxxxxxxxxx",
          "buyerPayAccount" : "182****3677",
          "buyerPayAccountId" : "xxxxxxxxxxxxx"
        },
        "sort" : [
          "2020-04-02 10:09:56",
          "xxxxxxxxxxxxx"
        ]
      }
    ]
  }
}
View Code

返回的数据_id和orderAddtime,这5条数据为:

xxxxxxxxxxxxxxx
2020-04-02 10:13:33

xxxxxxxxxxxxxxx
2020-04-02 10:09:56

xxxxxxxxxxxxxxx
2020-04-02 10:04:58

xxxxxxxxxxxxxxx
2020-04-02 10:03:26

xxxxxxxxxxxxxxx
2020-04-02 10:02:58
View Code

使用search_after

GET /test_index/_search
{
    "size": 2,
    "query": {
        "match" : {
            "sellerUserId": "xxxx"
        }
    },
    "search_after": ["2020-04-02 10:09:56", "xxxxxxx"],
    "sort": [
        {"orderAddtime": "desc"},
        {"_id": "desc"}
    ]

}
View Code

返回:

{
  "took" : 16,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 108784,
    "max_score" : null,
    "hits" : [
      {
        "_index" : "test_index",
        "_type" : "elasticsearch",
        "_id" : "xxxxxxxxxxxxx",
        "_score" : null,
        "_source" : {
          "orderSubTypeId" : 0,
          "productId" : 27027,
          "orderId" : xxxxxxxxxxxxx,
          "orderAddtime" : "2020-04-02 10:04:58",
          "productSize" : "",
          "orderTradeStatus" : 2000,
          "poundage" : 59.33,
          "sellerName" : "大眼睛潮品代购",
          "orderNum" : "xxxxxxxxxxxxx",
          "orderTypeId" : 0,
          "sellerUserId" : xxxxxxxxxxxxx,
          "sku_id" : xxxxxxxxxxxxx,
          "buyerName" : "可能是快跑",
          "productName" : "Champion 纯棉短裤 黑色",
          "paidMoney" : 157.0,
          "price" : 139.0,
          "freightCost" : 18.0,
          "buyerUserId" : xxxxxxxxxxxxx,
          "buyerReceiveAddress" : "辽宁省丹东市元宝区宗裕城c区天天超市",
          "buyerReceiveMobile" : "xxxxxxxxxxxxx",
          "buyerPayAccount" : "151****3091",
          "buyerPayAccountId" : "-1"
        },
        "sort" : [
          "2020-04-02 10:04:58",
          "xxxxxxxxxxxxx"
        ]
      },
      {
        "_index" : "test_index",
        "_type" : "elasticsearch",
        "_id" : "xxxxxxxxxxxxx",
        "_score" : null,
        "_source" : {
          "orderSubTypeId" : 0,
          "productId" : xxxxxxxxxxxxx,
          "orderId" : xxxxxxxxxxxxx,
          "orderAddtime" : "2020-04-02 10:03:26",
          "productSize" : "",
          "orderTradeStatus" : 2000,
          "poundage" : 59.43,
          "sellerName" : "大眼睛潮品代购",
          "orderNum" : "xxxxxxxxxxxxx",
          "orderTypeId" : 0,
          "sellerUserId" : xxxxxxxxxxxxx,
          "sku_id" : xxxxxxxxxxxxx,
          "buyerName" : "可能是快跑",
          "productName" : "Champion  圆领 薄长袖 T恤  美版 白色",
          "paidMoney" : 167.0,
          "price" : 149.0,
          "freightCost" : 18.0,
          "buyerUserId" : xxxxxxxxxxxxx,
          "buyerReceiveAddress" : "辽宁省丹东市元宝区宗裕城c区天天超市",
          "buyerReceiveMobile" : "xxxxxxxxxxxxx",
          "buyerPayAccount" : "151****3091",
          "buyerPayAccountId" : "-1"
        },
        "sort" : [
          "2020-04-02 10:03:26",
          "xxxxxxxxxxxxx"
        ]
      }
    ]
  }
}
View Code

 

综上:

es常见大数据量分页:
1、普通分页,from,size的方式,这种每次只能查1W条记录,会先加载到内存,比较吃内存
2、深度分页,又有两种,一种是scroll:每次查询返回scrollId,下次查询根据scrollId继续往下查10条;另一种是search_after,是以一个唯一的字段,每次根据这个只往下继续查10条
区别:

  scroll是创建一个快照,快照有指定生存时间,新写入的数据,不在快照中,是无法查到的
  search_after搜索的时候指定一个不重复的字段(一般是_id)进行排序,将返回的不重复的字段的值作为seach_after的值请求,
    优点:避免深度分页的性能问题,可以实时获取下一页文档信息,
    缺点:不支持指定页数,只可以一页一页的往下翻页

 

当全量查询时候也建议使用scroll,比from-size效率大概高一倍

 

6、ES性能优化:

https://blog.csdn.net/hellozhxy/article/details/90938381

 

7、ES搜索类型:

1)query and fetch

向索引的所有分片(shard)都发出查询请求,各分片返回的时候把元素文档(document)和计算后的排名信息一起返回。这种搜索方式是最快的。因为相比下面的几种搜索方式,这种查询方法只需要去shard查询一次。但是各个shard返回的结果的数量之和可能是用户要求的size的n倍。

2)query then fetch(默认的搜索方式)

如果你搜索时,没有指定搜索方式,就是使用的这种搜索方式。这种搜索方式,大概分两个步骤,第一步,先向所有的shard发出请求,各分片只返回排序和排名相关的信息(注意,不包括文档document),然后按照各分片返回的分数进行重新排序和排名,取前size个文档。然后进行第二步,去相关的shard取document。这种方式返回的document与用户要求的size是相等的。

3)DFS query and fetch

这种方式比第一种方式多了一个初始化散发(initial scatter)步骤,有这一步,据说可以更精确控制搜索打分和排名。

4)DFS query then fetch

比第2种方式多了一个初始化散发(initial scatter)步骤。

 

DSF是什么缩写?初始化散发是一个什么样的过程?

从es的官方网站我们可以指定,初始化散发其实就是在进行真正的查询之前,先把各个分片的词频率和文档频率收集一下,然后进行词搜索的时候,各分片依据全局的词频率和文档频率进行搜索和排名。显然如果使用DFS_QUERY_THEN_FETCH这种查询方式,效率是最低的,因为一个搜索,可能要请求3次分片。但,使用DFS方法,搜索精度应该是最高的。

至于DFS是什么缩写,没有找到相关资料,这个D可能是Distributed,F可能是frequency的缩写,至于S可能是Scatter的缩写,整个单词可能是分布式词频率和文档频率散发的缩写。

总结一下,从性能考虑QUERY_AND_FETCH是最快的,DFS_QUERY_THEN_FETCH是最慢的。从搜索的准确度来说,DFS要比非DFS的准确度更高。

 

 

参考:  

 

Kibana界面的API操作ES:https://www.cnblogs.com/xll970105/p/11561537.html

 

es各种查询:https://www.colabug.com/2018/0902/4334463/

 

8、ES的mapping创建原则:

 


     1)文档自动映射关掉

    

 

 

    默认,true,false允许插入陌生字段,但是mapping不同,默认和true会给陌生字段自定义mapping,

    但是为false时,陌生字段直接没有mapping,strict直接不允许插入陌生字段 

 

     2)自动刷新默认是1s,改成60s

 

     3)批量写ES可以抗住1w+QPS没问题

 

  4)开放索引的只读设置:

           PUT _settings { "index": { "blocks": { "read_only_allow_delete": "false" } } }

      5)analyzer 不指定分词时,会使用默认的standard

  6)建索引时候,注意指定分片数以及副本数(防止默认是1分片的--ES 6.7版本有的默认分片是1。我们集群是3节点,建议3分片1副本)

"settings": {
      "number_of_shards": 3,
      "number_of_replicas": 1
 },

  7) 时间类型--统一类型为date,设置format: yyyy-MM-dd HH:mm:ss(ES存储的是JSON,JSON没有date格式,写入时候可以传字符串来表示)

"auditTime": {
    "type": "date",
    "format": "yyyy-MM-dd HH:mm:ss"
}

 

 

    字段类型设计原则:

 

     1)字段建议使用keyword类型,查询快,支持排序

 

     2)content字段要用text字段,支持分词,不支持排序

 

     3)时间使用long类型,支持范围查询,建议到精确到分钟,会提高查询效率

 9、ES常用sql:

查询:

GET /trend_reply/_search
{"query":{
  "bool" : {
    "must" : [
      {
        "range":{
            "publishTime":{
                "gte":1577808000,
                "lt":1590940800
                
            }
        }
      }
      
    ],
    "adjust_pure_negative" : true,
    "boost" : 1.0
  }
}
}


GET /risk_order_his/_search
{"query":{
  "bool" : {
    "must" : [
      {
        "term" : {
          "orderNum" : {
            "value" : 1120010815637847,
            "boost" : 1.0
          }
        }
      }
      
    ],
    "adjust_pure_negative" : true,
    "boost" : 1.0
  }
}
}

GET /trend_reply/_search
{
    "from":0,
    "size":10000,
    "timeout":"60s",
    "query":{
        "bool":{
            "filter":[
                {
                    "match_phrase_prefix":{
                        "content":{
                            "query":"c",
                            "slop":0,
                            "max_expansions":50,
                            "boost":1
                        }
                    }
                },
                {
                    "range":{
                        "result":{
                            "from":"0",
                            "to":null,
                            "include_lower":true,
                            "include_upper":true,
                            "boost":1
                        }
                    }
                },
                {
                    "exists":{
                        "field":"auditResultId",
                        "boost":1
                    }
                }
            ],
            "must_not":[
                {
                    "term":{
                        "auditResultId":{
                            "value":"",
                            "boost":1
                        }
                    }
                }
            ],
            "adjust_pure_negative":true,
            "boost":1
        }
    }
}
View Code

更新:

POST /trend_reply/_update_by_query
{
    "script": {
    "source": "ctx._source['result']=8" 
  },
  "query":{
    "term":{
      "_id":"F-n4XnIB5Q-NfXXmj7pa"
    }
  }
  
}
View Code

 删除

POST /trend_reply/_delete_by_query?wait_for_completion=false
{"query":{
  "bool" : {
    "must" : [
      {
        "range":{
            "publishTime":{
                "gte":1575302400,
                "lt":1590940800
                
            }
        }
      }
      
    ],
    "adjust_pure_negative" : true,
    "boost" : 1.0
  }
}
}
View Code

查询两个字段相等的记录

#判断两个字符串
GET xxx_index/_search
{
  "query": {
        "bool": {
            "must": [{
                "match_all": {}
            }],
            "filter": [{
                "script": {
                    "script": {
                        "inline": "String.valueOf(doc['dataId'].value) == doc['_id'].value",
                        "lang": "painless"
                    }
                }
            }],
            "must_not": [],
            "should": []
        }
    }
}
 
#判断两个long类型
GET xxx_index/_search
{
  "query": {
        "bool": {
            "must": [{
                "match_all": {}
            }],
            "filter": [{
                "script": {
                    "script": {
                        "inline": "doc['dataId'].value - doc['userId'].value 
 == 0",
                        "lang": "painless"
                    }
                }
            }],
            "must_not": [],
            "should": []
        }
    }
}
View Code

 

 

 

 

posted @ 2020-02-25 13:36  乌瑟尔  阅读(851)  评论(0编辑  收藏  举报