ELK入门以及常见指令

ES的资源:

https://www.elastic.co/guide/en/elasticsearch/reference/current/getting-started.html
https://www.elastic.co/webinars/getting-started-kibana?baymax=rtp&elektra=docs&storm=top-video&iesrc=ctr
https://www.elastic.co/webinars/getting-started-logstash?baymax=rtp&elektra=docs&storm=top-video&iesrc=ctr
es默认端口9200,可以看到es的基本信息
http://localhost:9200/

Elasticsearch: The Definitive Guide(第二个是master分支版本的权威指南)
https://www.elastic.co/guide/en/elasticsearch/guide/index.html
https://www.elastic.co/guide/en/elasticsearch/guide/master/index.html

shard代表一个索引(在主节点)存储到N个文件中,因为单个索引文件,太大了,查询将会有问题,所以分成多个文件来保存,其实有一种分割的味道,没有问题。
replica代表副本,其实主要是用于高可用;避免单点故障。

获取索引信息(_cat并不是cat猫,而是category)
GET /_cat/indices?v
创建一个索引
PUT /customer?pretty
GET /_cat/indices?v

创建一个文档;PUT指定ID,POST则是不指定ID创建一个文档,ID为随机数;这里面有个pretty?这个pretty代表pretty-print,是指返回有好的JSON串;

PUT /customer/_doc/1?pretty
    {
      "name": "John Doe"
    }
GET /customer/_doc/1?pretty

POST /customer/_doc?pretty
    {
      "name": "Jane Doe"
    }

 


修改文档(本质是先删除后添加)

POST /customer/_doc/1/_update?pretty
    {
      "doc": { "name": "Jane Doe" }
    }

POST /customer/_doc/1/_update?pretty
    {
      "doc": { "name": "Jane Doe", "age": 20 }
    }

POST /customer/_doc/1/_update?pretty
    {
      "script" : "ctx._source.age += 5"
    }

 


删除文档
 DELETE /customer/_doc/2?pretty 

批量处理(批量添加,以及批量修改)

 1 POST /customer/_doc/_bulk?pretty
 2     {"index":{"_id":"1"}}
 3     {"name": "John Doe" }
 4     {"index":{"_id":"2"}}
 5     {"name": "Jane Doe" }
 6 
 7 POST /customer/_doc/_bulk?pretty
 8     {"update":{"_id":"1"}}
 9     {"doc": { "name": "John Doe becomes Jane Doe" } }
10     {"delete":{"_id":"2"}}

 


批量导入数据

curl -H "Content-Type: application/json" -XPOST "localhost:9200/bank/_doc/_bulk?pretty&refresh" --data-binary "@accounts.json"

 


查询,注意这里用到了_search,还有在修改的时候,这个位是“_update"。q=*代表查询所有的文档,sort代表按照account_number做升序(asc)排列,pretty上面介绍了。返回结果中hits代表命中的documents,totals属性代表了返回条数;但是注意默认返回10条;可以由size属性来制定;
GET /bank/_search?q=*&sort=account_number:asc&pretty
等价查询

1 GET /bank/_search
2     {
3       "query": { "match_all": {} },
4       "sort": [
5         { "account_number": "asc" }
6       ]
7     }

 


如果想要从中间某段,通过指定from属性,代表从index=n开始;如果n=5.98,系统将会向下取整,取n=5;注意在此之前都是返回值max_score都是0,但是从这个查询开始因为引入了查询条件,max_score开始有值了。    

1 GET /bank/_search
2     {
3       "query": { "match_all": {} },
4       "from": 10, #代表从id=10开始
5       "size": 10
6     }

 


返回指定列(Select col1,col2...)

1 GET /bank/_search
2     {
3       "query": { "match_all": {} },
4       "_source": ["account_number", "balance"]
5     }

 


指定检索列(Where)

1 GET /bank/_search
2     {
3       "query": { "match": { "account_number": 20 } }
4     }

 


注意下面两组查询的差别,match和match phase之间的差别;前者是只要有任何一个匹配都是会作为检索结果的;并根据打分结果进行排序罗列;后者则要求短语全匹配,即位置之间关系必须严格按照mill在lane前一个位置;但是在操作中发现比如mill lane即使全匹配分值也不过是13.2,这个匹配是单词能够全部匹配,比如果198 Mill2 Lane,尽管只差一个Mill2,但是这样一来,分值是8.3,这个和其他数据,只匹配一个Lane的分值(Mill完全匹配不了)是一样的。

1 GET /bank/_search
2     {
3       "query": { "match": { "address": "198 Mill Lane" } }
4     }
5 
6 GET /bank/_search
7     {
8       "query": { "match_phrase": { "address": "198 Mill Lane" } }
9     }

 


bool查询,相当于where的“and”

 1 GET /bank/_search
 2     {
 3       "query": {
 4         "bool": {
 5           "must": [
 6             { "match": { "address": "mill" } },
 7             { "match": { "address": "lane" } }
 8           ]
 9         }
10       }
11     }

 


bool+should相当于where条件的“or”

 1 GET /bank/_search
 2     {
 3       "query": {
 4         "bool": {
 5           "should": [
 6             { "match": { "address": "mill" } },
 7             { "match": { "address": "lane" } }
 8           ]
 9         }
10       }
11     }

 


还有where条件取反,不包含呢

 1 GET /bank/_search
 2     {
 3       "query": {
 4         "bool": {
 5           "must_not": [
 6             { "match": { "address": "mill" } },
 7             { "match": { "address": "lane" } }
 8           ]
 9         }
10       }
11     }

 


还可以组合查询

 1 GET /bank/_search
 2     {
 3       "query": {
 4         "bool": {
 5           "must": [
 6             { "match": { "age": "40" } }
 7           ],
 8           "must_not": [
 9             { "match": { "state": "ID" } }
10           ]
11         }
12       }
13     }

 


过滤器
这个过滤器是在bool查询器里面的;但是filter并不会触发文档计分;这个查询score显示为1是因为bool查询导致的文档评分;

 1 get /bank/_search
 2 {
 3   "query":{
 4     "bool":{
 5       "must":{"match_all":{}},
 6       "filter":{
 7         "range":{
 8           "balance":{
 9             "gte":2000,
10             "lte":3000
11           }
12         }
13       }
14     }
15   }
16 }

 


分组
分组相当于groupby,下面的例子就是对于字段“state”值进行分组,去count值;group_by_state默认就是按照字段聚合计算count()值;
这里size设置为0是因为只要聚集函数的结果,而不要查询结果;如果设置了size>0将会将检索结果显示在response中;

 1 GET /bank/_search
 2 {
 3   "size": 0,
 4   "aggs": {
 5     "group_by_state": {
 6       "terms": {
 7         "field": "state.keyword"
 8       }
 9     }
10   }
11 }

 


再来一个复杂一些的,groupby做count合计之外,还做了balance字段取均值;注意均值是放在group_by_state里面的;同时在在groupby之后,按照均值进行排序。

 1 GET /bank/_search
 2 {
 3   "size": 0,
 4   "aggs": {
 5     "group_by_state": {
 6       "terms": {
 7         "field": "state.keyword",
 8         "order": {
 9           "average_balance": "desc"
10         }
11       },
12       "aggs": {
13         "average_balance": {
14           "avg": {
15             "field": "balance"
16           }
17         }
18       }
19     }
20   }
21 }

 


再上一个更加复杂的,指定范围进行排序,同时指定了二级聚合字段(gender)

 1 GET /bank/_search
 2 {
 3   "size": 0,
 4   "aggs": {
 5     "group_by_age": {
 6       "range": {
 7         "field": "age",
 8         "ranges": [
 9           {
10             "from": 20,
11             "to": 30
12           },
13           {
14             "from": 30,
15             "to": 40
16           },
17           {
18             "from": 40,
19             "to": 50
20           }
21         ]
22       },
23       "aggs": {
24         "group_by_gender": {
25           "terms": {
26             "field": "gender.keyword"
27           },
28           "aggs": {
29             "average_balance": {
30               "avg": {
31                 "field": "balance"
32               }
33             }
34           }
35         }
36       }
37     }
38   }
39 }

 


返回的片段

 1 "aggregations": {
 2     "group_by_age": {
 3       "buckets": [
 4         {
 5           "key": "20.0-30.0", #以及聚合字段
 6           "from": 20,
 7           "to": 30,
 8           "doc_count": 451,
 9           "group_by_gender": {
10             "doc_count_error_upper_bound": 0,
11             "sum_other_doc_count": 0,
12             "buckets": [ #二级聚合字段
13               {
14                 "key": "M",
15                 "doc_count": 232,
16                 "average_balance": {
17                   "value": 27374.05172413793
18                 }
19               },
20               {
21                 "key": "F",
22                 "doc_count": 219,
23                 "average_balance": {
24                   "value": 25341.260273972603
25                 }
26               }
27             ]
28           }
29         },
30 ... ...

 

posted on 2018-08-12 22:27  张叫兽的技术研究院  阅读(1314)  评论(0编辑  收藏  举报

导航