nginx的log文件通过filebeat和kafka收集后,都会加上一些自己的配置信息。
我们先把日志输出修改为标准输出进行调试!
vim /etc/logstash/conf.d/nginx.conf
output { #elasticsearch { # hosts => "192.168.1.7:9200" # index => "nginx1_log-%{+YYYY.MM.dd}" # } stdout { codec => rubydebug } }
我们可以通过命令查看 日志输出格式:/usr/share/logstash/bin/logstash -rf /etc/logstash/conf.d/nginx.conf
-r:表示实时监测nginx.conf的变更并重新加载
-f:指定要加载的配置文件
{ "@version" => "1", "@timestamp" => 2020-02-29T09:16:03.569Z, "message" => "{\"@timestamp\":\"2020-02-29T09:15:55.324Z\",
\"@metadata\":{\"beat\":\"filebeat\",\"type\":\"doc\",\"version\":\"6.8.6\",\"topic\":\"nginx\"},
\"source\":\"/var/log/nginx/access.log\",
\"offset\":3694893,
\"message\":\"192.168.1.8 - - [29/Feb/2020:17:15:51 +0800] \\\"GET /cc HTTP/1.0\\\" 404 3650 \\\"-\\\" \\\"ApacheBench/2.3\\\" \\\"-\\\"\",
\"prospector\":{\"type\":\"log\"},
\"input\":{\"type\":\"log\"},
\"fields\":{\"log_topics\":\"nginx\"},
\"beat\":{\"name\":\"kafka03\",\"hostname\":\"kafka03\",\"version\":\"6.8.6\"},
\"host\":{\"name\":\"kafka03\"},
\"log\":{\"file\":{\"path\":\"/var/log/nginx/access.log\"}}
}", "tags" => [ [0] "_grokparsefailure" ] }
1 编辑filebeat.yaml 文件
# 处理,移除这些字段 。filebeat 会在写入 kafka 的时候默认加上 。以 @ 开头的不可移除
filebeat.inputs: - type: log enabled: true paths: - /var/log/messages fields: log_topics: messages ## - type: log enabled: true paths: - /var/log/nginx/access.log fields: log_topics: nginx ##drop掉这些字段 processors: - drop_fields: fields: ["beat","input","source","offset","topicname","timestamp","@metadata"] output.kafka: enabled: true hosts: ["192.168.1.7:9092","192.168.1.8:9092","192.168.1.9:9092"] topic: '%{[fields][log_topics]}'
重启filebeat
systemctl restart filebeat
就会发现少了很多字段
{ "@version" => "1", "@timestamp" => 2020-02-29T09:35:58.899Z, "message" => "{\"@timestamp\":\"2020-02-29T09:35:54.026Z\",
\"@metadata\":{\"beat\":\"filebeat\",
\"type\":\"doc\",
\"version\":\"6.8.6\",
\"topic\":\"nginx\"},
\"fields\":{\"log_topics\":\"nginx\"},
\"host\":{\"name\":\"kafka03\"},
\"log\":{\"file\":{\"path\":\"/var/log/nginx/access.log\"}},
\"message\":\"192.168.1.8 - - [29/Feb/2020:17:35:52 +0800] \\\"GET /cc HTTP/1.0\\\" 404 3650 \\\"-\\\" \\\"ApacheBench/2.3\\\" \\\"-\\\"\",
\"prospector\":{\"type\":\"log\"}}", "tags" => [ [0] "_grokparsefailure" ] }
每经过一个中间件,就会加上一些字段,所以需要在logstash的filter中,配置各种规则,去过滤掉不想要的字段
最终nginx.conf 文件的配置
input { kafka { bootstrap_servers => ["192.168.1.7:9092,192.168.1.8:9092,192.168.1.9:9092"] group_id => "logstash" topics => "nginx" consumer_threads => 5 } } filter {
#将会从message字段中解析json数据 json { source => "message" }
#干掉以下多余的字段 mutate { remove_field => ["@version","fields","prospector","host","log"] } grok { match => { "message" => "%{NGINXACCESS}" } } } output { elasticsearch { hosts => "192.168.1.7:9200" index => "nginx1_log-%{+YYYY.MM.dd}" } #stdout { # codec => rubydebug #} }
[root@localhost patterns]# pwd
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-patterns-core-4.1.2/patterns
将创建好的nginx_access文件上传至这个目录下
vim nginx_access
URIPARAM1 [A-Za-z0-9$.+!*'|(){},~@#%&/=:;_?\-\[\]]* NGINXACCESS %{IPORHOST:client_ip} (%{USER:ident}|- ) (%{USER:auth}|-) \[%{HTTPDATE:timestamp}\] "(?:%{WORD:verb} (%{NOTSPACE:request}|-)(?: HTTP/%{NUMBER:http_version})?|-)" %{NUMBER:status} (?:%{NUMBER:bytes}|-) "(?:%{URI:referrer}|-)" "%{GREEDYDATA:agent}"