logstash7.9.1-官方教程1-快速开始
本文是对官方教程的实操+翻译。
机器配置
CentOS7.6,64位,16核,16GB
[sysoper@10-99-10-31 ~]$ cat /etc/redhat-release
CentOS Linux release 7.6.1810 (Core)
[sysoper@10-99-10-31 ~]$ cat /proc/cpuinfo | grep name | cut -f2 -d: | uniq -c
16 Intel(R) Xeon(R) Gold 5217 CPU @ 3.00GHz
[sysoper@10-99-10-31 ~]$ cat /proc/cpuinfo| grep "physical id"| sort| uniq| wc -l
16
[sysoper@10-99-10-31 ~]$ cat /proc/cpuinfo| grep "cpu cores"| uniq
cpu cores : 1
[sysoper@10-99-10-31 ~]$ cat /proc/meminfo |grep MemTotal
MemTotal: 16264896 kB
[sysoper@10-99-10-31 logstashdir]$ uname -i
x86_64
安装JDK
略
安装logstash
- 下载解压
wget https://artifacts.elastic.co/downloads/logstash/logstash-7.9.1.tar.gz tar -zxvf logstash-7.9.1.tar.gz
- 运行最基本的Logstash管道验证Logstash安装
cd logstash-7.9.1 bin/logstash -e 'input { stdin { } } output { stdout {} }' # 输入hello,看输出 ctrl+d 退出logstash
使用logstash分析日志
先下载官方提供的样例日志文件:
wget https://download.elastic.co/demos/logstash/gettingstarted/logstash-tutorial.log.gz
gzip logstash-tutorial.log.gz
使用Filebeats
本次创建Logstash管道之前,将配置并使用Filebeat将日志行发送到Logstash。
- 下载解压
wget https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-7.9.1-linux-x86_64.tar.gz tar -zxvf filebeat-7.9.1-linux-x86_64.tar.gz
- 修改/替换
filebeat.yml
文件内容# 备份 mv filebeat.yml filebeat.yml.bak vim filebeat.yml # 内容: filebeat.inputs: - type: log paths: - /path/to/file/logstash-tutorial.log output.logstash: hosts: ["localhost:5044"]
- 启动filebeat
- 执行
sudo ./filebeat -e -c filebeat.yml -d "publish"
- 可能碰到
Exiting: error loading config file: config file ("filebeat.yml") must be owned by the user identifier (uid=0) or root
问题 - 解决:
sudo su - root chown root filebeat.yml chmod go-w /etc/{beatname}/{beatname}.yml
- 参考官方地址
https://www.elastic.co/guide/en/beats/libbeat/5.3/config-file-permissions.html
- 再次执行
2020-09-18T09:19:37.870+0800 INFO [publisher] pipeline/retry.go:219 retryer: send unwait signal to consumer 2020-09-18T09:19:37.871+0800 INFO [publisher] pipeline/retry.go:223 done 2020-09-18T09:19:40.650+0800 ERROR [publisher_pipeline_output] pipeline/output.go:154 Failed to connect to backoff(async(tcp://localhost:5044)): dial tcp 127.0.0.1:5044: connect: connection refused
- Filebeat将尝试连接端口5044。在Logstash启动一个激活的Beats插件之前,该端口上不会有任何回应,所以现在你看到的任何关于该端口连接失败的消息都是正常的。
- 执行
配置logstash-beats插件
接下来,创建一个Logstash配置管道,它使用Beats input插件来接收来自Beats的事件。
- 创建一个简单的管道配置文件
first-pipeline.conf
input { beats { port => "5044" } } # The filter part of this file is commented out to indicate that it is # optional. # filter { # # } output { stdout { codec => rubydebug } }
- 上述配置:输入使用beats插件、输出使用标准控制台打印(以后再修改输出到ES)
- 验证配置文件
bin/logstash -f first-pipeline.conf --config.test_and_exit
- 验证通过,则使用刚刚的配置启动一个logstash实例
bin/logstash -f first-pipeline.conf --config.reload.automatic
--config.reload.automatic
允许自动配置重新加载,这样您不必在每次修改配置文件时停止并重新启动Logstash。
- 启动成功后,可以立即观测到之前的filebeats终端连接上5044了:
2020-09-18T09:43:11.326+0800 INFO [publisher_pipeline_output] pipeline/output.go:151 Connection to backoff(async(tcp://localhost:5044)) established
,并且logstash这边也拿到了日志输出:(其中一行){ "@version" => "1", "@timestamp" => 2020-09-18T01:19:35.207Z, "input" => { "type" => "log" }, "ecs" => { "version" => "1.5.0" }, "host" => { "name" => "10-99-10-31" }, "log" => { "file" => { "path" => "/app/sysoper/logstashdir/logstash-tutorial.log" }, "offset" => 21199 }, "tags" => [ [0] "beats_input_codec_plain_applied" ], "message" => "218.30.103.62 - - [04/Jan/2015:05:27:36 +0000] \"GET /projects/xdotool/xdotool.xhtml HTTP/1.1\" 304 - \"-\" \"Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)\"", "agent" => { "type" => "filebeat", "version" => "7.9.1", "hostname" => "10-99-10-31", "ephemeral_id" => "aad5f124-e37a-474a-9e50-c4317229df4b", "id" => "d7a76fd8-db13-45c8-99bd-3ae4dc3a3f92", "name" => "10-99-10-31" } }
使用Grok过滤器插件对数据进行结构化
grok过滤器插件是Logstash中默认可用的几个插件之一。logstash插件管理。
grok filter插件允许您将非结构化的日志数据解析为结构化的、可查询的数据。所以使用时,需要我们指定解析模式-即将有规律的文本数据结构化时遵循的模式。
- 修改
first-pipeline.conf
,添加过滤器配置,这里使用自带的%{COMBINEDAPACHELOG}
grok模式,该模式使用以下模式对来自Apache日志的行进行http请求模式解析。grok { match => { "message" => "%{COMBINEDAPACHELOG}"} }
- 之前启用了自动加载配置,所以logstash不用重启
[2020-09-18T10:23:52,212][INFO ][logstash.pipelineaction.reload] Reloading pipeline {"pipeline.id"=>:main}
- 但由于Filebeat将它捕获的每个文件的状态存储在注册表中,因此删除注册表文件将强制Filebeat从头读取它捕获的所有文件。回到filebeats会话,执行
sudo rm -fr data/registry sudo ./filebeat -e -c filebeat.yml -d "publish"
- 回到logstash控制台,可以看到输出的JOSN数据发生变化。
{ "@version" => "1", "log" => { "file" => { "path" => "/app/sysoper/logstashdir/logstash-tutorial.log" }, "offset" => 19617 }, "ecs" => { "version" => "1.5.0" }, "host" => { "name" => "10-99-10-31" }, "response" => "200", "verb" => "GET", "tags" => [ [0] "beats_input_codec_plain_applied" ], "httpversion" => "1.1", "@timestamp" => 2020-09-18T02:26:47.845Z, "input" => { "type" => "log" }, "auth" => "-", "clientip" => "218.30.103.62", "request" => "/projects/fex/", "bytes" => "14352", "timestamp" => "04/Jan/2015:05:27:15 +0000", "ident" => "-", "message" => "218.30.103.62 - - [04/Jan/2015:05:27:15 +0000] \"GET /projects/fex/ HTTP/1.1\" 200 14352 \"-\" \"Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)\"", "agent" => { "type" => "filebeat", "version" => "7.9.1", "hostname" => "10-99-10-31", "id" => "d7a76fd8-db13-45c8-99bd-3ae4dc3a3f92", "ephemeral_id" => "90901fd8-81aa-4271-ad11-1ca77cb455e5", "name" => "10-99-10-31" }, "referrer" => "\"-\"" }
使用Geoip过滤器插件加强信息输出
除了解析日志数据以便更好地搜索之外,过滤器插件还可以从现有数据中获得补充信息。例如,geoip插件查找IP地址,从地址获取地理位置信息,并将该位置信息添加到日志中。
- 修改管道配置,添加Geoip过滤器
filter { grok { match => { "message" => "%{COMBINEDAPACHELOG}"} } geoip { source => "clientip" } }
- 然后和前面操作一样,清空filebeats注册表并重启,就能看的logstash输出变化了。
"httpversion" => "1.0", "geoip" => { "postal_code" => "32963", "region_name" => "Florida", "location" => { "lon" => -80.3757, "lat" => 27.689799999999998 },
将数据输出导向(index)Elasticsearch
我们已经将web日志分解为特定的字段,并输出到控制台。现在可以将数据输出导向Elasticsearch了。
You can run Elasticsearch on your own hardware or use our hosted Elasticsearch Service that is available on AWS, GCP, and Azure. Try the Elasticsearch Service for free.
- 本地快速安装使用ES。
- 下载
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.9.1-linux-x86_64.tar.gz
- 解压
tar -xvf elasticsearch-7.9.1-linux-x86_64.tar.gz
- 运行
./elasticsearch-7.9.1/bin/elasticsearch
- 下载
- 编辑第一个pipeline.conf文件,并用下面的文本替换整个output部分:
output { elasticsearch { hosts => [ "localhost:9200" ] } }
- 和之前一样重启filebeats 。
至此,Logstash管道已经配置为将数据索引到一个Elasticsearch集群中(这里是本地单节点),可以查询Elasticsearch了。 - 查看ES中已有的index列表
curl 'localhost:9200/_cat/indices?v'
,结果:health status index uuid pri rep docs.count docs.deleted store.size pri.store.size yellow open logstash-2020.09.18-000001 7OxpUa9EQ9yv4w-_CqOYtw 1 1 100 0 335.7kb 335.7kb
- 根据index,查询ES中目标数据
curl -XGET 'localhost:9200/logstash-2020.09.18-000001/_search?pretty&q=geoip.city_name=Buffalo'
- OK
*使用Kibana可视化数据
- 下载解压
wget https://artifacts.elastic.co/downloads/kibana/kibana-7.9.1-linux-x86_64.tar.gz tar -xvf kibana-7.9.1-linux-x86_64.tar.gz
- 编辑
config/kibana.yml
,设置elasticsearch.hosts
指向运行的ES实例:# The URLs of the Elasticsearch instances to use for all your queries. elasticsearch.hosts: ["http://localhost:9200"]
- 启动Kibana
./kibana-7.9.1-linux-x86_64/bin/kibana
- 访问
http://localhost:5601
查看数据 - 问题1:kibana页面地址访问不到
修改kibana.yml
的server.host="0.0.0.0" - 问题2:找不到kibana的进程
使用ps -elf|grep node
或者netstat -tunlp|grep 5601