ELK入门以及常见指令

ES的资源：

https://www.elastic.co/guide/en/elasticsearch/reference/current/getting-started.html
https://www.elastic.co/webinars/getting-started-kibana?baymax=rtp&elektra=docs&storm=top-video&iesrc=ctr
https://www.elastic.co/webinars/getting-started-logstash?baymax=rtp&elektra=docs&storm=top-video&iesrc=ctr
es默认端口9200，可以看到es的基本信息
http://localhost:9200/

Elasticsearch: The Definitive Guide（第二个是master分支版本的权威指南）
https://www.elastic.co/guide/en/elasticsearch/guide/index.html
https://www.elastic.co/guide/en/elasticsearch/guide/master/index.html

shard代表一个索引（在主节点）存储到N个文件中，因为单个索引文件，太大了，查询将会有问题，所以分成多个文件来保存，其实有一种分割的味道，没有问题。
replica代表副本，其实主要是用于高可用；避免单点故障。

获取索引信息（_cat并不是cat猫，而是category）
GET /_cat/indices?v
创建一个索引
PUT /customer?pretty
GET /_cat/indices?v

创建一个文档；PUT指定ID，POST则是不指定ID创建一个文档，ID为随机数；这里面有个pretty？这个pretty代表pretty-print，是指返回有好的JSON串；

PUT /customer/_doc/1?pretty
    {
      "name": "John Doe"
    }
GET /customer/_doc/1?pretty

POST /customer/_doc?pretty
    {
      "name": "Jane Doe"
    }

修改文档（本质是先删除后添加）

POST /customer/_doc/1/_update?pretty
    {
      "doc": { "name": "Jane Doe" }
    }

POST /customer/_doc/1/_update?pretty
    {
      "doc": { "name": "Jane Doe", "age": 20 }
    }

POST /customer/_doc/1/_update?pretty
    {
      "script" : "ctx._source.age += 5"
    }

删除文档
DELETE /customer/_doc/2?pretty

批量处理（批量添加，以及批量修改）

 1 POST /customer/_doc/_bulk?pretty
 2     {"index":{"_id":"1"}}
 3     {"name": "John Doe" }
 4     {"index":{"_id":"2"}}
 5     {"name": "Jane Doe" }
 6 
 7 POST /customer/_doc/_bulk?pretty
 8     {"update":{"_id":"1"}}
 9     {"doc": { "name": "John Doe becomes Jane Doe" } }
10     {"delete":{"_id":"2"}}

批量导入数据

curl -H "Content-Type: application/json" -XPOST "localhost:9200/bank/_doc/_bulk?pretty&refresh" --data-binary "@accounts.json"

查询，注意这里用到了_search，还有在修改的时候，这个位是“_update"。q=*代表查询所有的文档，sort代表按照account_number做升序（asc）排列，pretty上面介绍了。返回结果中hits代表命中的documents，totals属性代表了返回条数；但是注意默认返回10条；可以由size属性来制定；
GET /bank/_search?q=*&sort=account_number:asc&pretty
等价查询

1 GET /bank/_search
2     {
3       "query": { "match_all": {} },
4       "sort": [
5         { "account_number": "asc" }
6       ]
7     }

如果想要从中间某段，通过指定from属性，代表从index=n开始；如果n=5.98，系统将会向下取整，取n=5；注意在此之前都是返回值max_score都是0，但是从这个查询开始因为引入了查询条件，max_score开始有值了。

1 GET /bank/_search
2     {
3       "query": { "match_all": {} },
4       "from": 10, #代表从id=10开始
5       "size": 10
6     }

返回指定列（Select col1，col2...)

1 GET /bank/_search
2     {
3       "query": { "match_all": {} },
4       "_source": ["account_number", "balance"]
5     }

指定检索列（Where）

1 GET /bank/_search
2     {
3       "query": { "match": { "account_number": 20 } }
4     }

注意下面两组查询的差别，match和match phase之间的差别；前者是只要有任何一个匹配都是会作为检索结果的；并根据打分结果进行排序罗列；后者则要求短语全匹配，即位置之间关系必须严格按照mill在lane前一个位置；但是在操作中发现比如mill lane即使全匹配分值也不过是13.2，这个匹配是单词能够全部匹配，比如果198 Mill2 Lane，尽管只差一个Mill2，但是这样一来，分值是8.3，这个和其他数据，只匹配一个Lane的分值（Mill完全匹配不了）是一样的。

1 GET /bank/_search
2     {
3       "query": { "match": { "address": "198 Mill Lane" } }
4     }
5 
6 GET /bank/_search
7     {
8       "query": { "match_phrase": { "address": "198 Mill Lane" } }
9     }

bool查询，相当于where的“and”

 1 GET /bank/_search
 2     {
 3       "query": {
 4         "bool": {
 5           "must": [
 6             { "match": { "address": "mill" } },
 7             { "match": { "address": "lane" } }
 8           ]
 9         }
10       }
11     }

bool+should相当于where条件的“or”

 1 GET /bank/_search
 2     {
 3       "query": {
 4         "bool": {
 5           "should": [
 6             { "match": { "address": "mill" } },
 7             { "match": { "address": "lane" } }
 8           ]
 9         }
10       }
11     }

还有where条件取反，不包含呢

 1 GET /bank/_search
 2     {
 3       "query": {
 4         "bool": {
 5           "must_not": [
 6             { "match": { "address": "mill" } },
 7             { "match": { "address": "lane" } }
 8           ]
 9         }
10       }
11     }

还可以组合查询

 1 GET /bank/_search
 2     {
 3       "query": {
 4         "bool": {
 5           "must": [
 6             { "match": { "age": "40" } }
 7           ],
 8           "must_not": [
 9             { "match": { "state": "ID" } }
10           ]
11         }
12       }
13     }

过滤器
这个过滤器是在bool查询器里面的；但是filter并不会触发文档计分；这个查询score显示为1是因为bool查询导致的文档评分；

 1 get /bank/_search
 2 {
 3   "query":{
 4     "bool":{
 5       "must":{"match_all":{}},
 6       "filter":{
 7         "range":{
 8           "balance":{
 9             "gte":2000,
10             "lte":3000
11           }
12         }
13       }
14     }
15   }
16 }

分组
分组相当于groupby，下面的例子就是对于字段“state”值进行分组，去count值；group_by_state默认就是按照字段聚合计算count()值；
这里size设置为0是因为只要聚集函数的结果，而不要查询结果；如果设置了size>0将会将检索结果显示在response中；

 1 GET /bank/_search
 2 {
 3   "size": 0,
 4   "aggs": {
 5     "group_by_state": {
 6       "terms": {
 7         "field": "state.keyword"
 8       }
 9     }
10   }
11 }

再来一个复杂一些的，groupby做count合计之外，还做了balance字段取均值；注意均值是放在group_by_state里面的；同时在在groupby之后，按照均值进行排序。

 1 GET /bank/_search
 2 {
 3   "size": 0,
 4   "aggs": {
 5     "group_by_state": {
 6       "terms": {
 7         "field": "state.keyword",
 8         "order": {
 9           "average_balance": "desc"
10         }
11       },
12       "aggs": {
13         "average_balance": {
14           "avg": {
15             "field": "balance"
16           }
17         }
18       }
19     }
20   }
21 }

再上一个更加复杂的，指定范围进行排序，同时指定了二级聚合字段（gender）

 1 GET /bank/_search
 2 {
 3   "size": 0,
 4   "aggs": {
 5     "group_by_age": {
 6       "range": {
 7         "field": "age",
 8         "ranges": [
 9           {
10             "from": 20,
11             "to": 30
12           },
13           {
14             "from": 30,
15             "to": 40
16           },
17           {
18             "from": 40,
19             "to": 50
20           }
21         ]
22       },
23       "aggs": {
24         "group_by_gender": {
25           "terms": {
26             "field": "gender.keyword"
27           },
28           "aggs": {
29             "average_balance": {
30               "avg": {
31                 "field": "balance"
32               }
33             }
34           }
35         }
36       }
37     }
38   }
39 }

返回的片段

 1 "aggregations": {
 2     "group_by_age": {
 3       "buckets": [
 4         {
 5           "key": "20.0-30.0", #以及聚合字段
 6           "from": 20,
 7           "to": 30,
 8           "doc_count": 451,
 9           "group_by_gender": {
10             "doc_count_error_upper_bound": 0,
11             "sum_other_doc_count": 0,
12             "buckets": [ #二级聚合字段
13               {
14                 "key": "M",
15                 "doc_count": 232,
16                 "average_balance": {
17                   "value": 27374.05172413793
18                 }
19               },
20               {
21                 "key": "F",
22                 "doc_count": 219,
23                 "average_balance": {
24                   "value": 25341.260273972603
25                 }
26               }
27             ]
28           }
29         },
30 ... ...

posted on 2018-08-12 22:27 张叫兽的技术研究院阅读(1314) 评论(0) 编辑收藏举报

刷新页面返回顶部

下士闻道

ELK入门以及常见指令

导航

公告