[elk]logstash grok原理
logstash语法
http://www.ttlsa.com/elk/elk-logstash-configuration-syntax/
https://www.elastic.co/guide/en/logstash/current/event-dependent-configuration.html
logstash grok原理
参考:
https://www.kancloud.cn/hanxt/elk/155901
https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html
正则表达式参考:
https://github.com/kkos/oniguruma/blob/master/doc/RE
grok的意思: (用感觉感知,而非动脑思考)to understand sth completely using your feelings rather than considering the facts
- 这个目录下有各种定义好的正则字段
/usr/local/logstash/vendor/bundle/jruby/1.9/gems/logstash-patterns-core-4.1.2/patterns
或者直接访问这个:
https://github.com/elastic/logstash/blob/v1.4.2/patterns/grok-patterns
$ ls /usr/local/logstash/vendor/bundle/jruby/1.9/gems/logstash-patterns-core-4.1.2/patterns/
aws bind exim grok-patterns httpd junos maven mcollective-patterns nagios rails ruby
bacula bro firewalls haproxy java linux-syslog mcollective mongodb postgresql redis squid
如apache日志解析: logstash过滤解析apache日志
filter {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}"}
}
}
logstash内置的pattern的定义(嵌套调用)
再举个例子
%{IP:client} 这里意思是: 用IP正则去匹配日志内容,匹配到的内容存储在key client里.
input {
file {
path => "/var/log/http.log"
}
}
filter {
grok {
match => { "message" => "%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}" }
}
}
output {
stdout { codec => rubydebug }
}
grok的remove_field
参考:
https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html
https://doc.yonyoucloud.com/doc/logstash-best-practice-cn/filter/grok.html
我们只需要request_time字段,默认仅match会读取message某字段赋给新字段,这样就造成了数据重复,为了解决这个问题,干掉message字段
input {stdin{}}
filter {
grok {
match => {
"message" => "\s+(?<request_time>\d+(?:\.\d+)?)\s+"
}
}
}
output {stdout{ codec => rubydebug }}
begin 123.456 end
{
"@version" => "1",
"host" => "ip-70.32.1.32.hosted.by.gigenet.com",
"@timestamp" => 2017-11-29T03:47:15.377Z,
"request_time" => "123.456",
"message" => "begin 123.456 end"
}
input {stdin{}}
filter {
grok {
match => {
"message" => "\s+(?<request_time>\d+(?:\.\d+)?)\s+"
}
remove_field => ["message"]
}
}
output {stdout{ codec => rubydebug }}
begin 123.456 end
{
"@version" => "1",
"host" => "ip-70.32.1.32.hosted.by.gigenet.com",
"@timestamp" => 2017-11-29T03:51:01.135Z,
"request_time" => "123.456"
}
自定义pattern
参考: https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html
可以写文件里,也可以直接指定,如上一个例子.
$ cat /var/sample.log
Jan 1 06:25:43 mailserver14 postfix/cleanup[21403]: BEF25A72965: message-id=<20130101142543.5828399CCAF@mailserver14.example.com>
$ cat ./patterns/postfix:
POSTFIX_QUEUEID [0-9A-F]{10,11}
input {
file {
path => "/var/sample.log"
}
}
filter {
grok {
patterns_dir => ["./patterns"]
match => { "message" => "%{SYSLOGBASE} %{POSTFIX_QUEUEID:queue_id}: %{GREEDYDATA:syslog_message}" }
}
}
output {
stdout { codec => rubydebug }
}
grok解析apache日志,并修改date格式
参考:http://blog.51cto.com/irow10/1828077 这里格式有问题,我修复了.
input {
stdin {}
}
filter {
grok {
match => { "message" => "%{IPORHOST:addre} %{USER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] \"%{WORD:http_method} %{NOTSPACE:request} HTTP/%{NUMBER:httpversion}\" %{NUMBER:status} (?:%{NUMBER:bytes}|-) \"(?:%{URI:http_referer}|-)\" \"%{GREEDYDATA:User_Agent}\"" }
remove_field => ["message"]
}
date {
match => [ "timestamp", "dd/MMM/YYYY:HH:mm:ss Z" ]
}
}
output {
stdout { codec => rubydebug }
}
192.168.10.97 - - [19/Jul/2016:16:28:52 +0800] "GET / HTTP/1.1" 200 23 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36"
{
"request" => "/",
"auth" => "-",
"ident" => "-",
"User_Agent" => "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36",
"addre" => "192.168.10.97",
"@timestamp" => 2016-07-19T08:28:52.000Z,
"http_method" => "GET",
"bytes" => "23",
"@version" => "1",
"host" => "no190.pp100.net",
"httpversion" => "1.1",
"timestamp" => "19/Jul/2016:16:28:52 +0800",
"status" => "200"
}
grok在线检测
参考: http://grokdebug.herokuapp.com/
192.168.10.97 - - [19/Jul/2016:16:28:52 +0800] "GET / HTTP/1.1" 200 23 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36"
%{IPORHOST:addre} %{USER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] \"%{WORD:http_method} %{NOTSPACE:request} HTTP/%{NUMBER:httpversion}\" %{NUMBER:status} (?:%{NUMBER:bytes}|-) \"(?:%{URI:http_referer}|-)\" \"%{GREEDYDATA:User_Agent}\"
logstash mutate插件-给整个条目添加个字段
参考: https://www.elastic.co/guide/en/logstash/current/event-dependent-configuration.html
input { stdin { } }
filter {
mutate { add_field => { "show" => "This data will be in the output" } }
}
output {
if [@metadata][test] == "Hello" {
stdout { codec => rubydebug }
}
}
sdf
{
"@version" => "1",
"host" => "ip-70.32.1.32.hosted.by.gigenet.com",
"show" => "This data will be in the output",
"@timestamp" => 2017-11-29T09:23:44.160Z,
"message" => "sdf"
}
logstash input添加字段-add_field
参考: http://www.21yunwei.com/archives/5296
input {
file {
path => "/logs/nginx/access.log"
type => "nginx"
start_position => "beginning"
add_field => { "key"=>"value"}
codec => "json"
}
}
output {
stdout{
codec => rubydebug{ }
}
}
logstash 5大插件--待了解
参考:
http://blog.51cto.com/irow10/1828077
https://segmentfault.com/a/1190000011721483
https://www.elastic.co/guide/en/logstash/current/plugins-filters-kv.html
date插件可以对日期格式定义
mutate插件可以增删字段,可以改写字段格式
kv插件可...
使用上面的日志作为示例,使用 mutate 插件的 lowercase 配置选项,我们可以将“log-level”字段转换为小写:
filter {
grok {...}
mutate { lowercase => [ "log-level" ] }
}
kv filter 来指示 Logstash 如何处理它
kv插件可以拆解
filter {
kv {
source => "metadata"
trim => "\""
include_keys => [ "level","service","customerid",”queryid” ]
target => "kv"
}
}