logstash7.9.1-官方教程1-快速开始

本文是对官方教程的实操+翻译。

机器配置

CentOS7.6,64位,16核,16GB

[sysoper@10-99-10-31 ~]$ cat /etc/redhat-release
CentOS Linux release 7.6.1810 (Core) 
[sysoper@10-99-10-31 ~]$ cat /proc/cpuinfo | grep name | cut -f2 -d: | uniq -c
16  Intel(R) Xeon(R) Gold 5217 CPU @ 3.00GHz
[sysoper@10-99-10-31 ~]$ cat /proc/cpuinfo| grep "physical id"| sort| uniq| wc -l
16
[sysoper@10-99-10-31 ~]$ cat /proc/cpuinfo| grep "cpu cores"| uniq
cpu cores       : 1
[sysoper@10-99-10-31 ~]$ cat /proc/meminfo |grep MemTotal
MemTotal:       16264896 kB
[sysoper@10-99-10-31 logstashdir]$ uname -i
x86_64

安装JDK

安装logstash

  • 下载解压
    wget https://artifacts.elastic.co/downloads/logstash/logstash-7.9.1.tar.gz
    tar -zxvf logstash-7.9.1.tar.gz
    
  • 运行最基本的Logstash管道验证Logstash安装
    cd logstash-7.9.1
    bin/logstash -e 'input { stdin { } } output { stdout {} }'
    # 输入hello,看输出
    ctrl+d 退出logstash
    

使用logstash分析日志

先下载官方提供的样例日志文件:

wget https://download.elastic.co/demos/logstash/gettingstarted/logstash-tutorial.log.gz
gzip logstash-tutorial.log.gz

使用Filebeats

本次创建Logstash管道之前,将配置并使用Filebeat将日志行发送到Logstash。

  • 下载解压
    wget https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-7.9.1-linux-x86_64.tar.gz
    tar -zxvf filebeat-7.9.1-linux-x86_64.tar.gz 
    
  • 修改/替换filebeat.yml文件内容
    # 备份
    mv filebeat.yml filebeat.yml.bak 
    vim filebeat.yml
    
    # 内容:
    filebeat.inputs:
    - type: log
      paths:
        - /path/to/file/logstash-tutorial.log 
    output.logstash:
      hosts: ["localhost:5044"]
    
  • 启动filebeat
    • 执行sudo ./filebeat -e -c filebeat.yml -d "publish"
    • 可能碰到Exiting: error loading config file: config file ("filebeat.yml") must be owned by the user identifier (uid=0) or root问题
    • 解决:
      sudo su - root
      chown root filebeat.yml
      chmod go-w /etc/{beatname}/{beatname}.yml
      
    • 参考官方地址https://www.elastic.co/guide/en/beats/libbeat/5.3/config-file-permissions.html
    • 再次执行
      2020-09-18T09:19:37.870+0800    INFO    [publisher]     pipeline/retry.go:219   retryer: send unwait signal to consumer
      2020-09-18T09:19:37.871+0800    INFO    [publisher]     pipeline/retry.go:223     done
      2020-09-18T09:19:40.650+0800    ERROR   [publisher_pipeline_output]     pipeline/output.go:154  Failed to connect to backoff(async(tcp://localhost:5044)): dial tcp 127.0.0.1:5044: connect: connection refused
      
    • Filebeat将尝试连接端口5044。在Logstash启动一个激活的Beats插件之前,该端口上不会有任何回应,所以现在你看到的任何关于该端口连接失败的消息都是正常的。

配置logstash-beats插件

接下来,创建一个Logstash配置管道,它使用Beats input插件来接收来自Beats的事件。

  • 创建一个简单的管道配置文件first-pipeline.conf
    input {
        beats {
            port => "5044"
        }
    }
    # The filter part of this file is commented out to indicate that it is
    # optional.
    # filter {
    #
    # }
    output {
        stdout { codec => rubydebug }
    }
    
  • 上述配置:输入使用beats插件、输出使用标准控制台打印(以后再修改输出到ES)
  • 验证配置文件 bin/logstash -f first-pipeline.conf --config.test_and_exit
  • 验证通过,则使用刚刚的配置启动一个logstash实例 bin/logstash -f first-pipeline.conf --config.reload.automatic
    • --config.reload.automatic允许自动配置重新加载,这样您不必在每次修改配置文件时停止并重新启动Logstash。
  • 启动成功后,可以立即观测到之前的filebeats终端连接上5044了:2020-09-18T09:43:11.326+0800 INFO [publisher_pipeline_output] pipeline/output.go:151 Connection to backoff(async(tcp://localhost:5044)) established,并且logstash这边也拿到了日志输出:(其中一行)
    {
      "@version" => "1",
        "@timestamp" => 2020-09-18T01:19:35.207Z,
             "input" => {
            "type" => "log"
        },
               "ecs" => {
            "version" => "1.5.0"
        },
              "host" => {
            "name" => "10-99-10-31"
        },
               "log" => {
              "file" => {
                "path" => "/app/sysoper/logstashdir/logstash-tutorial.log"
            },
            "offset" => 21199
        },
              "tags" => [
            [0] "beats_input_codec_plain_applied"
        ],
           "message" => "218.30.103.62 - - [04/Jan/2015:05:27:36 +0000] \"GET /projects/xdotool/xdotool.xhtml HTTP/1.1\" 304 - \"-\" \"Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)\"",
             "agent" => {
                    "type" => "filebeat",
                 "version" => "7.9.1",
                "hostname" => "10-99-10-31",
            "ephemeral_id" => "aad5f124-e37a-474a-9e50-c4317229df4b",
                      "id" => "d7a76fd8-db13-45c8-99bd-3ae4dc3a3f92",
                    "name" => "10-99-10-31"
        }
    }
    

使用Grok过滤器插件对数据进行结构化

grok过滤器插件是Logstash中默认可用的几个插件之一。logstash插件管理
grok filter插件允许您将非结构化的日志数据解析为结构化的、可查询的数据。所以使用时,需要我们指定解析模式-即将有规律的文本数据结构化时遵循的模式。

  • 修改first-pipeline.conf,添加过滤器配置,这里使用自带的%{COMBINEDAPACHELOG} grok模式,该模式使用以下模式对来自Apache日志的行进行http请求模式解析。
    grok {
        match => { "message" => "%{COMBINEDAPACHELOG}"}
    }
    
  • 之前启用了自动加载配置,所以logstash不用重启 [2020-09-18T10:23:52,212][INFO ][logstash.pipelineaction.reload] Reloading pipeline {"pipeline.id"=>:main}
  • 但由于Filebeat将它捕获的每个文件的状态存储在注册表中,因此删除注册表文件将强制Filebeat从头读取它捕获的所有文件。回到filebeats会话,执行
    sudo rm -fr data/registry
    sudo ./filebeat -e -c filebeat.yml -d "publish"
    
  • 回到logstash控制台,可以看到输出的JOSN数据发生变化。
    {
       "@version" => "1",
            "log" => {
              "file" => {
                "path" => "/app/sysoper/logstashdir/logstash-tutorial.log"
            },
            "offset" => 19617
        },
                "ecs" => {
            "version" => "1.5.0"
        },
               "host" => {
            "name" => "10-99-10-31"
        },
           "response" => "200",
               "verb" => "GET",
               "tags" => [
            [0] "beats_input_codec_plain_applied"
        ],
        "httpversion" => "1.1",
         "@timestamp" => 2020-09-18T02:26:47.845Z,
              "input" => {
            "type" => "log"
        },
               "auth" => "-",
           "clientip" => "218.30.103.62",
            "request" => "/projects/fex/",
              "bytes" => "14352",
          "timestamp" => "04/Jan/2015:05:27:15 +0000",
              "ident" => "-",
            "message" => "218.30.103.62 - - [04/Jan/2015:05:27:15 +0000] \"GET /projects/fex/ HTTP/1.1\" 200 14352 \"-\" \"Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)\"",
              "agent" => {
                    "type" => "filebeat",
                 "version" => "7.9.1",
                "hostname" => "10-99-10-31",
                      "id" => "d7a76fd8-db13-45c8-99bd-3ae4dc3a3f92",
            "ephemeral_id" => "90901fd8-81aa-4271-ad11-1ca77cb455e5",
                    "name" => "10-99-10-31"
        },
           "referrer" => "\"-\""
    }
    

使用Geoip过滤器插件加强信息输出

除了解析日志数据以便更好地搜索之外,过滤器插件还可以从现有数据中获得补充信息。例如,geoip插件查找IP地址,从地址获取地理位置信息,并将该位置信息添加到日志中。

  • 修改管道配置,添加Geoip过滤器
    filter {
        grok {
            match => { "message" => "%{COMBINEDAPACHELOG}"}
        }
        geoip {
            source => "clientip"
        }
    }
    
  • 然后和前面操作一样,清空filebeats注册表并重启,就能看的logstash输出变化了。
    "httpversion" => "1.0",
          "geoip" => {
           "postal_code" => "32963",
           "region_name" => "Florida",
              "location" => {
            "lon" => -80.3757,
            "lat" => 27.689799999999998
        },
    

将数据输出导向(index)Elasticsearch

我们已经将web日志分解为特定的字段,并输出到控制台。现在可以将数据输出导向Elasticsearch了。

You can run Elasticsearch on your own hardware or use our hosted Elasticsearch Service that is available on AWS, GCP, and Azure. Try the Elasticsearch Service for free.

  • 本地快速安装使用ES
    • 下载 wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.9.1-linux-x86_64.tar.gz
    • 解压 tar -xvf elasticsearch-7.9.1-linux-x86_64.tar.gz
    • 运行 ./elasticsearch-7.9.1/bin/elasticsearch
  • 编辑第一个pipeline.conf文件,并用下面的文本替换整个output部分:
    output {
        elasticsearch {
            hosts => [ "localhost:9200" ]
        }
    }
    
  • 和之前一样重启filebeats 。
    至此,Logstash管道已经配置为将数据索引到一个Elasticsearch集群中(这里是本地单节点),可以查询Elasticsearch了。
  • 查看ES中已有的index列表 curl 'localhost:9200/_cat/indices?v',结果:
    health status index                      uuid                   pri rep docs.count docs.deleted store.size pri.store.size
    yellow open   logstash-2020.09.18-000001 7OxpUa9EQ9yv4w-_CqOYtw   1   1        100            0    335.7kb        335.7kb
    
  • 根据index,查询ES中目标数据 curl -XGET 'localhost:9200/logstash-2020.09.18-000001/_search?pretty&q=geoip.city_name=Buffalo'
  • OK

*使用Kibana可视化数据

  • 下载解压
    wget https://artifacts.elastic.co/downloads/kibana/kibana-7.9.1-linux-x86_64.tar.gz
    tar -xvf kibana-7.9.1-linux-x86_64.tar.gz
    
  • 编辑config/kibana.yml,设置elasticsearch.hosts指向运行的ES实例:
    # The URLs of the Elasticsearch instances to use for all your queries.
    elasticsearch.hosts: ["http://localhost:9200"]
    
  • 启动Kibana ./kibana-7.9.1-linux-x86_64/bin/kibana
  • 访问 http://localhost:5601 查看数据
  • 问题1:kibana页面地址访问不到
    修改kibana.yml的server.host="0.0.0.0"
  • 问题2:找不到kibana的进程
    使用 ps -elf|grep node 或者 netstat -tunlp|grep 5601
posted @ 2020-10-16 17:13  summaster  阅读(960)  评论(0编辑  收藏  举报