nginx的log文件通过filebeat和kafka收集后,都会加上一些自己的配置信息。

我们先把日志输出修改为标准输出进行调试!

vim /etc/logstash/conf.d/nginx.conf

output {
        #elasticsearch {
         #       hosts => "192.168.1.7:9200"
          #      index => "nginx1_log-%{+YYYY.MM.dd}"
      # }
        stdout {
                codec => rubydebug
        }
}

 

我们可以通过命令查看 日志输出格式:/usr/share/logstash/bin/logstash -rf  /etc/logstash/conf.d/nginx.conf

-r:表示实时监测nginx.conf的变更并重新加载

-f:指定要加载的配置文件

{
      "@version" => "1",
    "@timestamp" => 2020-02-29T09:16:03.569Z,
       "message" => "{\"@timestamp\":\"2020-02-29T09:15:55.324Z\",
\"@metadata\":{\"beat\":\"filebeat\",\"type\":\"doc\",\"version\":\"6.8.6\",\"topic\":\"nginx\"},
\"source\":\"/var/log/nginx/access.log\",
\"offset\":3694893,
\"message\":\"192.168.1.8 - - [29/Feb/2020:17:15:51 +0800] \\\"GET /cc HTTP/1.0\\\" 404 3650 \\\"-\\\" \\\"ApacheBench/2.3\\\" \\\"-\\\"\",
\"prospector\":{\"type\":\"log\"},
\"input\":{\"type\":\"log\"},
\"fields\":{\"log_topics\":\"nginx\"},
\"beat\":{\"name\":\"kafka03\",\"hostname\":\"kafka03\",\"version\":\"6.8.6\"},
\"host\":{\"name\":\"kafka03\"},
\"log\":{\"file\":{\"path\":\"/var/log/nginx/access.log\"}}
}
", "tags" => [ [0] "_grokparsefailure" ] }

1 编辑filebeat.yaml 文件

# 处理,移除这些字段 。filebeat 会在写入 kafka 的时候默认加上 。以 @ 开头的不可移除

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /var/log/messages
  fields:
    log_topics: messages
##
- type: log
  enabled: true
  paths:
    - /var/log/nginx/access.log
  fields:
    log_topics: nginx
##drop掉这些字段
processors:
- drop_fields:
    fields: ["beat","input","source","offset","topicname","timestamp","@metadata"]
output.kafka:
    enabled: true
    hosts: ["192.168.1.7:9092","192.168.1.8:9092","192.168.1.9:9092"]
    topic: '%{[fields][log_topics]}'

重启filebeat

systemctl restart filebeat

就会发现少了很多字段

{
      "@version" => "1",
    "@timestamp" => 2020-02-29T09:35:58.899Z,
       "message" => "{\"@timestamp\":\"2020-02-29T09:35:54.026Z\",
\"@metadata\":{\"beat\":\"filebeat\",
\"type\":\"doc\",
\"version\":\"6.8.6\",
\"topic\":\"nginx\"},
\"fields\":{\"log_topics\":\"nginx\"},
\"host\":{\"name\":\"kafka03\"},
\"log\":{\"file\":{\"path\":\"/var/log/nginx/access.log\"}},
\"message\":\"192.168.1.8 - - [29/Feb/2020:17:35:52 +0800] \\\"GET /cc HTTP/1.0\\\" 404 3650 \\\"-\\\" \\\"ApacheBench/2.3\\\" \\\"-\\\"\",
\"prospector\":{\"type\":\"log\"}}
", "tags" => [ [0] "_grokparsefailure" ] }

每经过一个中间件,就会加上一些字段,所以需要在logstash的filter中,配置各种规则,去过滤掉不想要的字段

最终nginx.conf 文件的配置

input { 
        kafka {
        bootstrap_servers => ["192.168.1.7:9092,192.168.1.8:9092,192.168.1.9:9092"]
        group_id => "logstash"
        topics => "nginx" 
         consumer_threads => 5
        }
}

filter {
#将会从message字段中解析json数据 json { source => "message" }
    #干掉以下多余的字段 mutate { remove_field => ["@version","fields","prospector","host","log"] } grok { match => { "message" => "%{NGINXACCESS}" } } } output { elasticsearch { hosts => "192.168.1.7:9200" index => "nginx1_log-%{+YYYY.MM.dd}" } #stdout { # codec => rubydebug #} }

 [root@localhost patterns]# pwd
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-patterns-core-4.1.2/patterns
将创建好的nginx_access文件上传至这个目录下

vim nginx_access

URIPARAM1 [A-Za-z0-9$.+!*'|(){},~@#%&/=:;_?\-\[\]]*
NGINXACCESS %{IPORHOST:client_ip} (%{USER:ident}|- ) (%{USER:auth}|-) \[%{HTTPDATE:timestamp}\] "(?:%{WORD:verb} (%{NOTSPACE:request}|-)(?: HTTP/%{NUMBER:http_version})?|-)" %{NUMBER:status} (?:%{NUMBER:bytes}|-) "(?:%{URI:referrer}|-)" "%{GREEDYDATA:agent}"