Es分页查询:from+size

{ "query": { "bool": { "must": [ { "term": { "architect.keyword": { "value": "郭锋" } } }, { "range": { "NRunTime": { "lte": 100 } } } ] } }, "size": 10, "from": 100 }

from 相当于offset,size相当于每页多少个,上边例子中代表从第100个数据开始(第11页),查询出10条数据

 

Es多字段查询  multi_match

{

"query": {
"bool": {
"must": {
"multi_match" : {
"query" : "search_key",
"type" : "best_fields",
"fields" : ["column1", "column2"],   //字段column1、column2模糊匹配search_key
"analyzer" : "ik_smart"     //汉字按ik_smart分词
}
},
"filter": {    //filter range lte(小于) hte(大于)
"range":{
"column3":{
"lte":1//小于
}
}

}
}
},
"stored_fields": ["column1", "column2", "column3", "column4","column5"],
"highlight" : {//高亮显示
"fields" : {
"column1" : {},
"column2" : {}
}
}
}

 

Es match match_phrase查询

 match:会将查询字段分隔,比如查询java spark,采用match会分词 java/spark,将es中包含java、spark、以及java***spark的查询出来

 match_phrase:不会讲查询字段分隔,比如查询java spark,采用match_phrase会将es中包含 ***java spark***的内容查询出来

 match提高查询召回率,match_phrase提高查询精度

match查询例子:

1、match查询
GET /forum/article/_search 
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "content": "java spark"
          }
        }
      ]
    }
  }
}

查询结果
{
  "took": 54,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.68640786,
    "hits": [
      {
        "_index": "forum",
        "_type": "article",
        "_id": "2",
        "_score": 0.68640786,
        "_source": {
          "articleID": "KDKE-B-9947-#kL5",
          "userID": 1,
          "hidden": false,
          "postDate": "2017-01-02",
          "tag": [
            "java"
          ],
          "tag_cnt": 1,
          "view_cnt": 50,
          "title": "this is java blog",
          "content": "i think java is the best programming language",
          "sub_title": "learned a lot of course",
          "author_first_name": "Smith",
          "author_last_name": "Williams",
          "new_author_last_name": "Williams",
          "new_author_first_name": "Smith"
        }
      },
      {
        "_index": "forum",
        "_type": "article",
        "_id": "5",
        "_score": 0.68324494,
        "_source": {
          "articleID": "DHJK-B-1395-#Ky5",
          "userID": 3,
          "hidden": false,
          "postDate": "2017-03-01",
          "tag": [
            "elasticsearch"
          ],
          "tag_cnt": 1,
          "view_cnt": 10,
          "title": "this is spark blog",
          "content": "spark is best big data solution based on scala ,an programming language similar to java spark",
          "sub_title": "haha, hello world",
          "author_first_name": "Tonny",
          "author_last_name": "Peter Smith",
          "new_author_last_name": "Peter Smith",
          "new_author_first_name": "Tonny"
        }
      }
    ]
  }
}
View Code

 

match_phrase查询例子:

GET /forum/article/_search 
{
  "query": {
    "match_phrase": {
      "content": {
        "query": "java spark",
        "slop" : 50
      }
    }
  }
}
结果:
{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.5753642,
    "hits": [
      {
        "_index": "forum",
        "_type": "article",
        "_id": "5",
        "_score": 0.5753642,
        "_source": {
          "articleID": "DHJK-B-1395-#Ky5",
          "userID": 3,
          "hidden": false,
          "postDate": "2017-03-01",
          "tag": [
            "elasticsearch"
          ],
          "tag_cnt": 1,
          "view_cnt": 10,
          "title": "this is spark blog",
          "content": "spark is best big data solution based on scala ,an programming language similar to java spark",
          "sub_title": "haha, hello world",
          "author_first_name": "Tonny",
          "author_last_name": "Peter Smith",
          "new_author_last_name": "Peter Smith",
          "new_author_first_name": "Tonny"
        }
      }
    ]
  }
}
View Code

 

结果发现match_phrase只会返回既包含java又包含spark的数据,召回率降低

参考:https://www.jianshu.com/p/18d80cafe145

分词器

 通过analyzer指定分词器类型

 系统默认分词器:

   ① standard分词器:单词切分,将词汇转为小写、去掉标点符号  ------按单词分

POST _analyze
{
  "analyzer": "standard",
  "text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."
}

text分词结果:
[ the, 2, quick, brown, foxes, jumped, over, the, lazy, dog's, bone ]
View Code

 

  ② whitespace分词器:空格分词,对字符没有lowcase化   -----按空格分

POST _analyze
{
  "analyzer": "whitespace",
  "text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."
}
分词结果:
[ The, 2, QUICK, Brown-Foxes, jumped, over, the, lazy, dog's, bone. ]
View Code

 

  ③simple分词器:通过非字母字符来分隔文本信息,有lowcase化,该分词器去掉数字类型字符  ----按非首字母分  

POST _analyze
{
  "analyzer": "simple",
  "text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."
}
分词结果:
[ the, quick, brown, foxes, jumped, over, the, lazy, dog, s, bone ]
View Code

 

  ④stop分词器:通过非字母字符来分隔文本信息,同时去掉英文中a、an、the等常用字符,通过stopwords也可以自己设置需要过滤掉的单词, 该分词器去掉数字类型字符  ----按非首字母分 ,去a、an、the 

PUT my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_stop_analyzer": {
          "type": "stop",
          "stopwords": ["the", "over"]
        }
      }
    }
  }
}

POST my_index/_analyze
{
  "analyzer": "my_stop_analyzer",
  "text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."
}
结果:
[ quick, brown, foxes, jumped, lazy, dog, s, bone ]

 

 

中文分词器:

  ① ik-max-world:会将文本做最细粒度的拆分;尽可能多的拆分出词语

   ik-smart:做最粗粒度的拆分;已被拆出词语将不会再次被其他词语占用

   ik分词器热刺更新配置:

      修改 IK 的配置文件 :ES 目录/plugins/ik/config/ik/IKAnalyzer.cfg.xml

   

  ik-max-world VS ik-smart例子;   

curl -XGET 'http://localhost:9200/_analyze?pretty&analyzer=ik_max_word' -d '联想是全球最大的笔记本厂商'
#返回

{
  "tokens" : [
    {
      "token" : "联想",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "是",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "CN_CHAR",
      "position" : 1
    },
    {
      "token" : "全球",
      "start_offset" : 3,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "最大",
      "start_offset" : 5,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 3
    },
    {
      "token" : "的",
      "start_offset" : 7,
      "end_offset" : 8,
      "type" : "CN_CHAR",
      "position" : 4
    },
    {
      "token" : "笔记本",
      "start_offset" : 8,
      "end_offset" : 11,
      "type" : "CN_WORD",
      "position" : 5
    },
    {
      "token" : "笔记",
      "start_offset" : 8,
      "end_offset" : 10,
      "type" : "CN_WORD",
      "position" : 6
    },
    {
      "token" : "本厂",
      "start_offset" : 10,
      "end_offset" : 12,
      "type" : "CN_WORD",
      "position" : 7
    },
    {
      "token" : "厂商",
      "start_offset" : 11,
      "end_offset" : 13,
      "type" : "CN_WORD",
      "position" : 8
    }
  ]
}


# ik_smart

curl -XGET 'http://localhost:9200/_analyze?pretty&analyzer=ik_smart' -d '联想是全球最大的笔记本厂商'

# 返回

{
  "tokens" : [
    {
      "token" : "联想",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "是",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "CN_CHAR",
      "position" : 1
    },
    {
      "token" : "全球",
      "start_offset" : 3,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "最大",
      "start_offset" : 5,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 3
    },
    {
      "token" : "的",
      "start_offset" : 7,
      "end_offset" : 8,
      "type" : "CN_CHAR",
      "position" : 4
    },
    {
      "token" : "笔记本",
      "start_offset" : 8,
      "end_offset" : 11,
      "type" : "CN_WORD",
      "position" : 5
    },
    {
      "token" : "厂商",
      "start_offset" : 11,
      "end_offset" : 13,
      "type" : "CN_WORD",
      "position" : 6
    }
  ]
}
View Code

 

 

  

 

参考:https://segmentfault.com/a/1190000012553894

es聚合查询

aggs

"aggs": {
    "NAME": {# 指定结果的名称
      "AGG_TYPE": {# 指定具体的聚合方法,
        TODO: # 聚合体内制定具体的聚合字段
      }
    }
    TODO: # 该处可以嵌套聚合
  }

例子:

{
  "size": 0,
  "aggs": {
    "sum_install": {
      "date_histogram": {
        "field": "yyyymmdd",
        "interval": "day"
      },
      "aggs": {
        "types": {
          "terms": {
            "field": "type.keyword",
            "size": 10
          },
          "aggs": {
            "install": {
              "sum": {
                "field": "install"
              }
            }
          }
        }
      }
    }
  }
}
View Code

 

作用查询每天,不同type对应install总量

range用法 && must should

range :field:lte gte

must:and操作  should:or操作

{
  "query": {
    "bool": {
      "must": [
        {"range": {
          "recive_time": {
            "gte": "2017-12-25T01:00:00.000Z",
            "lte": "2017-12-25T02:10:00.000Z"
          }
        }},
        {
          "bool": {
            "should": [
              {"range": {
                "live_delay": {
                  "gte": 1500
                }
              }},
              {
                "range": {
                  "stream_break_count.keyword": {
                    "gte": 1
                  }
                }
              }
            ]
          }
        }
      ]
    }
  }
}
must的两个条件都必须满足,should中的两个条件至少满足一个就可以

 

Es 原来索引中添加字段

PUT /my_index/_mapping/my_type

{ "properties": { "new_field_name": { "type": "string" } } }

 

给新添加字段赋值

POST my_index/_update_by_query    //批量更新用_update_by_query语法

{ "script": { "lang": "painless", "inline": "ctx._source.new_field_name= '02'" } }   //通过painless更新对象值

 

参考:https://blog.csdn.net/qq_36330643/article/details/79771652

杂乱查询语法:

aggs+avg+max+min+order+cardinality(等价于count(distinct(a)))+filter+sort

terms相当于sql中的groupby

aggs:基于搜索查询,可以嵌套组合复杂查询语法

"aggs": {
    "NAME": {# 指定结果的名称
      "AGG_TYPE": {# 指定具体的聚合方法,
        TODO: # 聚合体内制定具体的聚合字段
      }
    }
    TODO: # 该处可以嵌套聚合
  }

 

一、聚合起步
1、创建索引
1.1 创建索引DSL实现
put cars
POST /cars/transactions/_bulk
{ "index": {}}
{ "price" : 10000, "color" : "red", "make" : "honda", "sold" : "2014-10-28" }
{ "index": {}}
{ "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" }
{ "index": {}}
{ "price" : 30000, "color" : "green", "make" : "ford", "sold" : "2014-05-18" }
{ "index": {}}
{ "price" : 15000, "color" : "blue", "make" : "toyota", "sold" : "2014-07-02" }
{ "index": {}}
{ "price" : 12000, "color" : "green", "make" : "toyota", "sold" : "2014-08-19" }
{ "index": {}}
{ "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" }
{ "index": {}}
{ "price" : 80000, "color" : "red", "make" : "bmw", "sold" : "2014-01-01" }
{ "index": {}}
{ "price" : 25000, "color" : "blue", "make" : "ford", "sold" : "2014-02-12" }
1.2 创建mysql库表sql实现
CREATE TABLE `cars` (
`id` int(11) NOT NULL,
`price` int(11) DEFAULT NULL,
`color` varchar(255) DEFAULT NULL,
`make` varchar(255) DEFAULT NULL,
`sold` date DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
2、统计不同颜色车的数目
2.1 统计不同颜色车的DSL实现
GET /cars/transactions/_search
{
"size": 0,
"aggs": {
"popular_colors": {
"terms": {
"field": "color.keyword"
}
}
}
}
返回结果:

lve

2.2 统计不同颜色的mysql实现
select color, count(color) as cnt from cars group by color order by cnt desc;
返回结果:

red 4
green 2
blue 2
3、统计不同颜色车的平均价格
3.1 统计不同颜色车的平均价格DSL实现:

terms 相当于groupby
GET /cars/transactions/_search
{
"size": 0,
"aggs": {
"colors": {
"terms": {
"field": "color.keyword"
},
"aggs": {
"avg_price": {
"avg": {
"field": "price"
}
}
}
}
}
}
返回聚合结果:

lve

3.2 统计不同颜色车的平均价格sql实现:
select color, count(color) as cnt, avg(price) as avg_price from cars group by color order by cnt desc;

color cnt avg_price
red 4 32500.0000
green 2 21000.0000
blue 2 20000.0000
4、每种颜色汽车制造商的分布
4.1 统计每种颜色汽车制造商的分布dsl实现
GET /cars/transactions/_search
{
"size": 0,
"aggs": {
"colors": {
"terms": {
"field": "color.keyword"
},
"aggs": {
"make": {
"terms": {
"field": "make.keyword"
}
}
}
}
}
}
返回结果:

4.2 统计每种颜色汽车制造商的分布sql实现
说明:和dsl的实现不严格对应

select color, make from cars order by color;

color make
blue toyota
blue ford
green ford
green toyota
red bmw
red honda
red honda
red honda
5、统计每个制造商的最低价格、最高价格
5.1 统计每个制造商的最低、最高价格的DSL实现
GET /cars/transactions/_search
{
"size": 0,
"aggs": {
"make_class": {
"terms": {
"field": "make.keyword"
},
"aggs": {
"min_price": {
"min": {
"field": "price"
}
},
"max_price": {
"max": {
"field": "price"
}
}
}
}
}
}
聚合结果:

5.2 统计每个制造商的最低、最高价格的sql实现
select make, min(price) as min_price, max(price) as max_price from cars group by make;

make min_price max_price
bmw 80000 80000
ford 25000 30000
honda 10000 20000
toyota 12000 15000
二、聚合进阶
1、条形图聚合
1.1 分段统计每个区间的汽车销售价格总和
GET /cars/transactions/_search
{
"size": 0,
"aggs": {
"price": {
"histogram": {
"field": "price",
"interval": 20000
},
"aggs": {
"revenue": {
"sum": {
"field": "price"
}
}
}
}
}
}
汽车销售价格区间:定义为20000;
分段统计price和用sum统计。

1.2 多维度度量不同制造商的汽车指标
GET /cars/transactions/_search
{
"size" : 0,
"aggs": {
"makes": {
"terms": {
"field": "make.keyword",
"size": 10
},
"aggs": {
"stats": {
"extended_stats": {
"field": "price"
}
}
}
}
}
}
输出截取片段:

{
"key": "ford",
"doc_count": 2,
"stats": {
"count": 2,
"min": 25000,
"max": 30000,
"avg": 27500,
"sum": 55000,
"sum_of_squares": 1525000000,
"variance": 6250000,
"std_deviation": 2500,
"std_deviation_bounds": {
"upper": 32500,
"lower": 22500
}
}
}
2、按时间统计聚合
2.1 按月份统计制造商汽车销量dsl实现
GET /cars/transactions/_search
{
"size" : 0,
"aggs": {
"sales":{
"date_histogram":{
"field":"sold",
"interval":"month",
"format":"yyyy-MM-dd"
}
}
}
}
返回结果:

2.2 按月份统计制造商汽车销量sql实现
SELECT make, count(make) as cnt, CONCAT(YEAR(sold),',',MONTH(sold)) AS data_time
FROM `cars`
GROUP BY YEAR(sold) DESC,MONTH(sold)

查询结果如下:
make cnt data_time
bmw 1 2014,1
ford 1 2014,2
ford 1 2014,5
toyota 1 2014,7
toyota 1 2014,8
honda 1 2014,10
honda 2 2014,11
2.3 包含12月份的处理DSL实现
以上2.1 中没有12月份的统计结果显示。

GET /cars/transactions/_search
{
"size" : 0,
"aggs": {
"sales":{
"date_histogram":{
"field":"sold",
"interval":"month",
"format":"yyyy-MM-dd",
"min_doc_count": 0,
"extended_bounds":{
"min":"2014-01-01",
"max":"2014-12-31"
}
}
}
}
}
2.4 以季度为单位统计DSL实现
GET /cars/transactions/_search
{
"size" : 0,
"aggs": {
"sales":{
"date_histogram":{
"field":"sold",
"interval":"quarter",
"format":"yyyy-MM-dd",
"min_doc_count": 0,
"extended_bounds":{
"min":"2014-01-01",
"max":"2014-12-31"
}
},
"aggs":{
"per_make_sum":{
"terms":{
"field": "make.keyword"
},
"aggs":{
"sum_price":{
"sum":{ "field": "price"}
}
}
},
"top_sum": {
"sum": {"field":"price"}
}
}
}
}
}
2.5 基于搜索的(范围限定)聚合操作
2.5.1 基础查询聚合

GET /cars/transactions/_search
{
"query" : {
"match" : {
"make.keyword" : "ford"
}
},
"aggs" : {
"colors" : {
"terms" : {
"field" : "color.keyword"
}
}
}
}
对应的sql实现:

select make, color from cars
where make = "ford";

结果返回如下:
make color
ford green
ford blue
三、过滤聚合
1. 过滤操作
统计全部汽车的平均价钱以及单品平均价钱;

GET /cars/transactions/_search
{
"size" : 0,
"query" : {
"match" : {
"make.keyword" : "ford"
}
},
"aggs" : {
"single_avg_price": {
"avg" : { "field" : "price" }
},
"all": {
"global" : {},
"aggs" : {
"avg_price": {
"avg" : { "field" : "price" }
}

}
}
}
}
等价于:

select make, color, avg(price) from cars
where make = "ford" ;
select avg(price) from cars;
2、范围限定过滤(过滤桶)
我们可以指定一个过滤桶,当文档满足过滤桶的条件时,我们将其加入到桶内。

GET /cars/transactions/_search
{
"size" : 0,
"query":{
"match": {
"make": "ford"
}
},
"aggs":{
"recent_sales": {
"filter": {
"range": {
"sold": {
"from": "now-100M"
}
}
},
"aggs": {
"average_price":{
"avg": {
"field": "price"
}
}
}
}
}
}
mysql的实现如下:

select *, avg(price) from cars where period_diff(date_format(now() , '%Y%m') , date_format(sold, '%Y%m')) > 30
and make = "ford";


mysql查询结果如下:
id price color make sold avg
3 30000 green ford 2014-05-18 27500.0000
3、后过滤器
只过滤搜索结果,不过滤聚合结果——post_filter实现

GET /cars/transactions/_search
{
"query": {
"match": {
"make": "ford"
}
},
"post_filter": {
"term" : {
"color.keyword" : "green"
}
},
"aggs" : {
"all_colors": {
"terms" : { "field" : "color.keyword" }
}
}
}
post_filter 会过滤搜索结果,只展示绿色 ford 汽车。这在查询执行过 后 发生,所以聚合不受影响。

小结
选择合适类型的过滤(如:搜索命中、聚合或两者兼有)通常和我们期望如何表现用户交互有关。选择合适的过滤器(或组合)取决于我们期望如何将结果呈现给用户。

在 filter 过滤中的 non-scoring 查询,同时影响搜索结果和聚合结果。
filter 桶影响聚合。
post_filter 只影响搜索结果。
四、多桶排序
4.1 内置排序
GET /cars/transactions/_search
{
"size" : 0,
"aggs" : {
"colors" : {
"terms" : {
"field" : "color.keyword",
"order": {
"_count" : "asc"
}
}
}
}
}
4.2 按照度量排序
以下是按照汽车平均售价的升序进行排序。
过滤条件:汽车颜色;
聚合条件:平均价格;
排序条件:汽车的平均价格升序。

GET /cars/transactions/_search
{
"size": 0,
"aggs": {
"colors": {
"terms": {
"field": "color.keyword",
"order": {
"avg_price": "asc"
}
},
"aggs": {
"avg_price": {
"avg": {
"field": "price"
}
}
}
}
}
}
多条件聚合后排序如下所示:

GET /cars/transactions/_search
{
"size": 0,
"aggs": {
"colors": {
"terms": {
"field": "color.keyword",
"order": {
"stats.variance": "asc"
}
},
"aggs": {
"stats": {
"extended_stats": {
"field": "price"
}
}
}
}
}
}
4.3 基于“深度”的度量排序
太复杂,不推荐!

五、近似聚合
cardinality的含义是“基数”;

5.1 统计去重后的数量
GET /cars/transactions/_search
{
"size": 0,
"aggs": {
"distinct_colors": {
"cardinality": {
"field": "color.keyword"
}
}
}
}
类似于:

SELECT COUNT(DISTINCT color) FROM cars;
以下:
以月为周期统计;

GET /cars/transactions/_search
{
"size": 0,
"aggs": {
"months": {
"date_histogram": {
"field": "sold",
"interval": "month"
},
"aggs": {
"distinct_colors": {
"cardinality": {
"field": "color.keyword"
}
}
}
}
}
}

posted on 2019-07-29 14:53  colorfulworld  阅读(26377)  评论(1编辑  收藏  举报