ELK+Filebeat 部署安装

一、ELK+Filebeat介绍

ELK是Elasticsearch、Logstash、Kibana三大开源框架首字母大写简称(但是后期出现的filebeat(beats中的一种)可以用来替代logstash的数据收集功能，比较轻量级)。市面上也被成为Elastic Stack。

Elasticsearch是Elastic Stack核心的分布式搜索和分析引擎,是一个基于Lucene、分布式、通过Restful方式进行交互的近实时搜索平台框架。Elasticsearch为所有类型的数据提供近乎实时的搜索和分析。无论您是结构化文本还是非结构化文本，数字数据或地理空间数据，Elasticsearch都能以支持快速搜索的方式有效地对其进行存储和索引。

Filebeat是用于转发和集中日志数据的轻量级传送工具。Filebeat监视您指定的日志文件或位置，收集日志事件，并将它们转发到Elasticsearch或 Logstash进行索引。Filebeat的工作方式如下：启动Filebeat时，它将启动一个或多个输入，这些输入将在为日志数据指定的位置中查找。对于Filebeat所找到的每个日志，Filebeat都会启动收集器。每个收集器都读取单个日志以获取新内容，并将新日志数据发送到libbeat，libbeat将聚集事件，并将聚集的数据发送到为Filebeat配置的输出。

Logstash是免费且开放的服务器端数据处理管道，能够从多个来源采集数据，转换数据，然后将数据发送到您最喜欢的“存储库”中。Logstash能够动态地采集、转换和传输数据，不受格式或复杂度的影响。利用Grok从非结构化数据中派生出结构，从IP地址解码出地理坐标，匿名化或排除敏感字段，并简化整体处理过程。

Kibana是一个针对Elasticsearch的开源分析及可视化平台，用来搜索、查看交互存储在Elasticsearch索引中的数据。使用Kibana，可以通过各种图表进行高级数据分析及展示。并且可以为 Logstash 和 ElasticSearch 提供的日志分析友好的 Web 界面，可以汇总、分析和搜索重要数据日志。还可以让海量数据更容易理解。它操作简单，基于浏览器的用户界面可以快速创建仪表板（dashboard）实时显示Elasticsearch查询动态。

这4个组件的流程图如下：

1. Filebeat负责收集应用写到磁盘上的日志，并将日志发送给Logstash。
2. Logstash处理来自Filebeat的日志，并将处理后的日志保存到Elasticsearch索引库。
3. Elasticsearch存储来自logstash的日志。
4. Kbana从Elasticsearch搜索日志，并展示到页面。

二、 ELK部署

环境概要：
指定环境信息版本：7.16.1。
新建一个.env文件，添加以下内容：

ELK_VERSION=7.16.1

执行生效

source .env

本机的示例IP为：192.168.1.240

部署Elasticsearch

Elasticsearch 是一个实时的分布式搜索和分析引擎，它可以用于全文搜索，结构化搜索以及分析。它是一个建立在全文搜索引擎 Apache Lucene 基础上的搜索引擎，使用 Java 语言编写。

1. docker-compose.yml 新增Elasticsearch内容

version: '3'
services:
  elasticsearch:                    # 服务名称
    image: "elasticsearch:${ELK_VERSION}"      # 使用的镜像
    container_name: elasticsearch   # 容器名称
    restart: always                 # 失败自动重启策略
    environment:
      - node.name=node-1                   # 节点名称，集群模式下每个节点名称唯一
      - network.publish_host=192.168.1.240  # 用于集群内各机器间通信,对外使用，其他机器访问本机器的es服务，一般为本机宿主机IP
      - network.host=0.0.0.0                # 设置绑定的ip地址，可以是ipv4或ipv6的，默认为0.0.0.0，即本机      
      - discovery.seed_hosts=192.168.1.240          # es7.0之后新增的写法，写入候选主节点的设备地址，在开启服务后，如果master挂了，哪些可以被投票选为主节点
      - cluster.initial_master_nodes=192.168.1.240  # es7.0之后新增的配置，初始化一个新的集群时需要此配置来选举master
      - cluster.name=es-cluster     # 集群名称，相同名称为一个集群， 三个es节点须一致
      - bootstrap.memory_lock=true  # 内存交换的选项，官网建议为true
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m" # 设置内存，如内存不足，可以尝试调低点
    ulimits:        # 栈内存的上限
      memlock:
        soft: -1    # 不限制
        hard: -1    # 不限制
    volumes:
      - $PWD/elasticsearch/config:/usr/share/elasticsearch/config  # 将容器中es的配置文件映射到本地，本地必须提前先有该文件，设置跨域， 否则head插件无法连接该节点
      - $PWD/elasticsearch/data:/usr/share/elasticsearch/data  # 存放数据的文件
      - $PWD/elasticsearch/plugins:/usr/share/elasticsearch/plugins  # 存放插件的文件
      - /etc/localtime:/etc/localtime:ro
    ports:
      - 9200:9200    # http端口，可以直接浏览器访问
      - 9300:9300    # es集群之间相互访问的端口，jar之间就是通过此端口进行tcp协议通信，遵循tcp协议。

2. 新建或拷贝elasticsearch.yml 文件到本地

mkdir -p elasticsearch/{data,plugins}
docker run -d --rm --name elasticsearch  docker.elastic.co/elasticsearch/elasticsearch:7.16.1
docker cp elasticsearch:/usr/share/elasticsearch/config/ elasticsearch/
docker stop elasticsearch
chown -R 1000:root elasticsearch/config/
chown 1000:root elasticsearch/data/
chmod 775 elasticsearch/data/
chown 1000:root elasticsearch/plugins/

3. 修改内存设置

/etc/sysctl.conf 添加以下内容：

vm.max_map_count=655360

保存后，执行刷新命令生效：

sysctl -p

4. 在 elasticsearch/config/elasticsearch.yml 文件新增如下内容：

http.cors.enabled: true        # 是否支持跨域
http.cors.allow-origin: "*"    # 表示支持所有域名

5. 启动 elasticsearch

docker-compose up -d

6. 查看状态

浏览器打开：http://192.168.1.240:9200/_cluster/health?pretty

查看状态,"status": "green" 为正常。

{
"cluster_name": "es-cluster",
"status": "green",
"timed_out": false,
"number_of_nodes": 1,
"number_of_data_nodes": 1,
"active_primary_shards": 0,
"active_shards": 0,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 0,
"delayed_unassigned_shards": 0,
"number_of_pending_tasks": 0,
"number_of_in_flight_fetch": 0,
"task_max_waiting_in_queue_millis": 0,
"active_shards_percent_as_number": 100
}

部署Filebeat

Filebeat是本地文件的日志数据采集器，可监控日志目录或特定日志文件（tail file），并将它们转发给Elasticsearch或Logstatsh进行索引、kafka等。带有内部模块（auditd，Apache，Nginx，System和MySQL），可通过一个指定命令来简化通用日志格式的收集，解析和可视化。

1. 创建目录

mkdir -p filebeat/{conf,data,logs}

2. 创建自定义文件

创建 filebeat/conf/filebeat.yml 文件，（这里指定容器内nginx的日志目录），添加以下内容：

filebeat.inputs:
- type: log　　　　　　# filebeat的类型，log(具体路径的日志),stdin(键盘输入),redis,udp,docker,tcp,syslog,可以同时配置多个(包括相同类型的)
  enabled: true　　　 # 配置是否生效
  scan_frequency: 120s　　　 # 扫描频率
  paths:　　　　　　# 需要扫描收集的目标文件路径
  - /var/log/nginx/*.log　　# （这个路径是docker-compose中由真实路径映射到容器中的虚拟路径）
  fields:　　      # 额外的字段会一起输出，方便检索
    log_source: nginx　　　　　

output.logstash: 　# 输出到logstash
  hosts: ["192.168.1.240:5044"]　　# logstash地址

模板样例解释：

###################### Filebeat Configuration Example #########################
# This file is an example configuration file highlighting only the most common
# options. The filebeat.reference.yml file from the same directory contains all the
# supported options with more comments. You can use it as a reference.
#
# You can find the full configuration reference here:
# https://www.elastic.co/guide/en/beats/filebeat/index.html
# For more available modules and options, please see the filebeat.reference.yml sample
# configuration file.
#=========================== Filebeat inputs =============================
#=========================== Filebeat 输入配置 ===========================
filebeat.inputs:
# Each - is an input. Most options can be set at the input level, so
# you can use different inputs for various configurations.
# Below are the input specific configurations.
# 输入filebeat的类型，包括log(具体路径的日志),stdin(键盘输入),redis,udp,docker,tcp,syslog,可以同时配置多个(包括相同类型的)
# 具体的每种类型的配置信息可以通过官网:https://www.elastic.co/guide/en/beats/filebeat/current/configuration-filebeat-options.html 了解
- type: log
  # Change to true to enable this input configuration.
  # 配置是否生效
  enabled: true
  # Paths that should be crawled and fetched. Glob based paths.
  # 指定要监控的日志，可以指定具体得文件或者目录
  paths:
    #- /var/log/*.log (这是默认的,自行可以修改）
    - /usr/local/tomcat/logs/catalina.out
  # Exclude lines. A list of regular expressions to match. It drops the lines that are
  # matching any regular expression from the list.
  # 在输入中排除符合正则表达式列表的那些行。
  #exclude_lines: ['^DBG']
  # Include lines. A list of regular expressions to match. It exports the lines that are
  # matching any regular expression from the list.
  # 包含输入中符合正则表达式列表的那些行（默认包含所有行），include_lines执行完毕之后会执行exclude_lines
  #include_lines: ['^ERR', '^WARN']
  # Exclude files. A list of regular expressions to match. Filebeat drops the files that
  # are matching any regular expression from the list. By default, no files are dropped.
  # 忽略掉符合正则表达式列表的文件
  #exclude_files: ['.gz$']
  # Optional additional fields. These fields can be freely picked
  # to add additional information to the crawled log files for filtering
  # 向输出的每一条日志添加额外的信息，比如“level:debug”，方便后续对日志进行分组统计。
  # 默认情况下，会在输出信息的fields子目录下以指定的新增fields建立子目录，例如fields.level
  # 这个得意思就是会在es中多添加一个字段，格式为 "filelds":{"level":"debug"}
  #fields:
  #  level: debug
  #  review: 1
  #  module: mock 
  ### Multiline options
  ### 日志中经常会出现多行日志在逻辑上属于同一条日志的情况，所以需要multiline参数来详细阐述。
  # Multiline can be used for log messages spanning multiple lines. This is common
  # for Java Stack Traces or C-Line Continuation
  # The regexp Pattern that has to be matched. The example pattern matches all lines starting with [
  # 多行匹配正则表达式，比如：用空格开头(^[[:space:]]),或者是否以[开头(^\[)。正则表达式是非常复杂的，详细见filebeat的正则表达式官方链接：https://www.elastic.co/guide/en/beats/filebeat/current/regexp-support.html
  multiline.pattern: ^\[
  # Defines if the pattern set under pattern should be negated or not. Default is false.
  # 该参数意思是是否否定多行融入。
  #multiline.negate: false
  # Match can be set to "after" or "before". It is used to define if lines should be append to a pattern
  # that was (not) matched before or after or as long as a pattern is not matched based on negate.
  # Note: After is the equivalent to previous and before is the equivalent to to next in Logstash
  # 取值为after或before。该值与上面的pattern与negate值配合使用
  # ----------------------------------------------------------------------------------------------------
  #|multiline.pattern|multiline.negate|multiline.match|                      结论                      |
  # ----------------------------------------------------------------------------------------------------
  #|      true      |    true        |    before    |表示匹配行是结尾,和前面不匹配的组成一行完整的日志|
  # ----------------------------------------------------------------------------------------------------
  #|      true      |    true        |    after    |表示匹配行是开头,和后面不匹配的组成一行完整的日志|
  # ----------------------------------------------------------------------------------------------------
  #|      true      |    false      |    before    |表示匹配的行和后面不匹配的一行组成一行完整的日志 |
  # ----------------------------------------------------------------------------------------------------
  #|      true      |    false      |    after    |表示匹配的行和前面不匹配的一行组成一行完整的日志 |
  # ----------------------------------------------------------------------------------------------------
  multiline.match: after
  # Specifies a regular expression, in which the current multiline will be flushed from memory, ending the multiline-message.
  # 表示符合该正则表达式的，将从内存刷入硬盘。
  #multiline.flush_pattern
  # The maximum number of lines that can be combined into one event.
  # If the multiline message contains more than max_lines, any additional lines are discarded. The default is 500.
  # 表示如果多行信息的行数超过该数字，则多余的都会被丢弃。默认值为500行
  #multiline.max_lines: 500
  # After the specified timeout, Filebeat sends the multiline event even if no new pattern is found to start a new event. The default is 5s.
  # 表示超过timeout的时间(秒)还没有新的一行日志产生，则自动结束当前的多行、形成一条日志发出去
  #multiline.timeout: 5
#============================= Filebeat modules ===============================
# 引入filebeat的module配置
filebeat.config.modules:
  # Glob pattern for configuration loading
  path: ${path.config}/modules.d/*.yml
  # Set to true to enable config reloading
  # 是否允许重新加载
  reload.enabled: false
  # Period on which files under path should be checked for changes
  # 重新加载的时间间隔
  #reload.period: 10s
#==================== Elasticsearch template setting ==========================
# Elasticsearch模板配置
setup.template.settings:
  # 数据分片数
  index.number_of_shards: 3
  # 数据分片备份数
  #index.number_of_replicas: 1
  #index.codec: best_compression
  #_source.enabled: false
#================================ General =====================================
# The name of the shipper that publishes the network data. It can be used to group
# all the transactions sent by a single shipper in the web interface.
# 设置filebeat的名字，如果配置为空，则用该服务器的主机名
#name:
# The tags of the shipper are included in their own field with each
# transaction published.
# 额外添加的tag标签
#tags: ["service-X", "web-tier"]
# Optional fields that you can specify to add additional information to the
# output.
# 额外添加的字段和值
#fields:
#  env: staging
#============================== Dashboards =====================================
# dashboards的相关配置
# These settings control loading the sample dashboards to the Kibana index. Loading
# the dashboards is disabled by default and can be enabled either by setting the
# options here, or by using the `-setup` CLI flag or the `setup` command.
# 是否启用仪表盘
#setup.dashboards.enabled: false
# The URL from where to download the dashboards archive. By default this URL
# has a value which is computed based on the Beat name and version. For released
# versions, this URL points to the dashboard archive on the artifacts.elastic.co
# website.
# 仪表盘地址
#setup.dashboards.url:
#============================== Kibana =====================================
# kibana的相关配置
# Starting with Beats version 6.0.0, the dashboards are loaded via the Kibana API.
# This requires a Kibana endpoint configuration.
setup.kibana:
  # Kibana Host
  # Scheme and port can be left out and will be set to the default (http and 5601)
  # In case you specify and additional path, the scheme is required: http://localhost:5601/path
  # IPv6 addresses should always be defined as: https://[2001:db8::1]:5601
  # kibana地址
  #host: "localhost:5601"
  # Kibana Space ID
  # ID of the Kibana Space into which the dashboards should be loaded. By default,
  # the Default Space will be used.
  # kibana的空间ID
  #space.id:
#============================= Elastic Cloud ==================================
# These settings simplify using filebeat with the Elastic Cloud (https://cloud.elastic.co/).
# The cloud.id setting overwrites the `output.elasticsearch.hosts` and
# `setup.kibana.host` options.
# You can find the `cloud.id` in the Elastic Cloud web UI.
#cloud.id:
# The cloud.auth setting overwrites the `output.elasticsearch.username` and
# `output.elasticsearch.password` settings. The format is `<user>:<pass>`.
#cloud.auth:
#================================ Outputs =====================================
# 输出配置
# Configure what output to use when sending the data collected by the beat.
#-------------------------- Elasticsearch output ------------------------------
# 输出到es
#output.elasticsearch:
  # Array of hosts to connect to.
  # ES地址
  # hosts: ["localhost:9200"]
  # ES索引
  # index: "filebeat-%{[beat.version]}-%{+yyyy.MM.dd}"
  # Optional protocol and basic auth credentials.
  # 协议
  #protocol: "https"
  # ES用户名
  #username: "elastic"
  # ES密码
  #password: "changeme"
#----------------------------- Logstash output --------------------------------
# 输出到logstash
output.logstash:
  # The Logstash hosts
  # logstash地址
  hosts: ["localhost:5044"]
  # Optional SSL. By default is off.
  # List of root certificates for HTTPS server verifications
  #ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]
  # Certificate for SSL client authentication
  #ssl.certificate: "/etc/pki/client/cert.pem"
  # Client Certificate Key
  #ssl.key: "/etc/pki/client/cert.key"
#================================ Procesors =====================================
# Configure processors to enhance or manipulate events generated by the beat.
processors:
  #主机相关 信息
  - add_host_metadata: ~
# 云服务器的元数据信息,包括阿里云ECS 腾讯云QCloud AWS的EC2的相关信息 
  - add_cloud_metadata: ~
  #k8s元数据采集
  #- add_kubernetes_metadata: ~
  # docker元数据采集
  #- add_docker_metadata: ~
  # 执行进程的相关数据
  #- - add_process_metadata: ~
#================================ Logging =====================================
# Sets log level. The default log level is info.
# Available log levels are: error, warning, info, debug
#logging.level: debug
# At debug level, you can selectively enable logging only for some components.
# To enable all selectors use ["*"]. Examples of other selectors are "beat",
# "publish", "service".
#logging.selectors: ["*"]
#============================== Xpack Monitoring ===============================
# filebeat can export internal metrics to a central Elasticsearch monitoring
# cluster.  This requires xpack monitoring to be enabled in Elasticsearch.  The
# reporting is disabled by default.
# Set to true to enable the monitoring reporter.
#xpack.monitoring.enabled: false
# Uncomment to send the metrics to Elasticsearch. Most settings from the
# Elasticsearch output are accepted here as well. Any setting that is not set is
# automatically inherited from the Elasticsearch output configuration, so if you
# have the Elasticsearch output configured, you can simply uncomment the
# following line.
#xpack.monitoring.elasticsearch:

View Code

3. docker-compose.yml 新增Filebeat内容

  filebeat:
    image: "docker.elastic.co/beats/filebeat:${ELK_VERSION}"
    container_name: filebeat
    user: root # 必须为root，否则会因为无权限而无法启动
    restart: always
    depends_on: #需要先启动elasticsearch和logstash
      - elasticsearch
      - logstash
    environment:
      - strict.perms=false  # 设置是否对配置文件进行严格权限校验
    volumes:
      - $PWD/filebeat/conf/filebeat.yml:/usr/share/filebeat/filebeat.yml
      # 映射到容器中[作为数据源]
      - $PWD/filebeat/logs:/usr/share/filebeat/logs:rw
      #- $PWD/filebeat/data:/usr/share/filebeat/data:rw
      - /etc/localtime:/etc/localtime:ro
      - /usr/local/nginx/logs/:/var/log/nginx/  # 需要扫描收集的目标文件路径
    # 将指定容器连接到当前连接，可以设置别名，避免ip方式导致的容器重启动态改变的无法连接情况
    links:
      - logstash

这里指定一个本地的/usr/local/nginx/logs/nginx日志目录，映射到容器的/var/log/nginx/里面。

部署Logstash

Logstash 是一个具有实时渠道能力的数据收集引擎，主要用于日志的收集与解析，并将其存入 ElasticSearch中。

1. 新建目录：

mkdir -p logstash/conf.d/

2. 创建logstash配置文件

新建一个 logstash/logstash.yml 文件，并添加以下内容：

path.config: /usr/share/logstash/conf.d/*.conf   # 配置文件集合的路径 
path.logs: /var/log/logstash  #Logstash的日志目录

Logstash的脚本由input，filter，output三个部分构成，同时这三个部分都支持众多的插件。

a. input Logstash的数据来源,可以是文件、Kafka、RabbitMQ、socket等等。

b. filter 从input接收到的数据经过filter进行数据类型转换、字段增减和修改、以及一些逻辑处理。虽然 filter模块是非必选的部分，但由于其可以将收集的日志格式化，合理的字段类型定义和辅助字段的创建可以使得以后的查询统计更加容易和高效。所以filter模块的配置是整个Logstash配置文件最重要的地方。

c. output 将filter得到的结果输出，可以是文件，Elasticsearch，Kafka等等。

收集到的日志能否发挥最大的价值就在filter这里，特别是以后需要统计的一些字段，比如用户id、设备信息、ip地址；还有就是一些有必要转换成数字类型的字段最好在filter这里就转换完成，数字类型的字段可以在统计的时候使用数学运算，例如求均值、求90Line等。

解析nginx日志通常使用grok组件。表达式使用的基本语法是下面这样的：

%{SYNTAX:SEMANTIC}

用%{} 扩起来的就表示一组正则匹配规则，SYNTAX是指grok里已经预定义好的正则表达式，SEMANTIC是指匹配之后要放的字段名字。

下面是NUMBER表达式的使用示例，NUMBER 表示在grok中该正则表达式匹配的别名。冒号后面的是准备输出的字段名称，duration将在检索时可以被分析到。

%{NUMBER:duration}

在 grok-patterns 里面NUMBER是用下面的正则表达式定义的，它基于了BASE10NUM的定义。

BASE10NUM (?&lt;![0-9.+-])(?&gt;[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+)))
NUMBER (?:%{BASE10NUM})

下面我们列举了一段极简的nginx日志片段：

55.3.244.1 GET /index.html 15824 0.043

那么使用下面的正则匹配组合就能成功匹配上面的日志片段：

%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}

grok给我们提供了很多实用的预定义的正则表达式，下面是截取的一小部分

# Networking
MAC (?:%{CISCOMAC}|%{WINDOWSMAC}|%{COMMONMAC})
CISCOMAC (?:(?:[A-Fa-f0-9]{4}\.){2}[A-Fa-f0-9]{4})
WINDOWSMAC (?:(?:[A-Fa-f0-9]{2}-){5}[A-Fa-f0-9]{2})
COMMONMAC (?:(?:[A-Fa-f0-9]{2}:){5}[A-Fa-f0-9]{2})
IPV6 ((([0-9A-Fa-f]{1,4}:){7}([0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){6}(:[0-9A-Fa-f]{1,4}|((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){5}(((:[0-9A-Fa-f]{1,4}){1,2})|:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){4}(((:[0-9A-Fa-f]{1,4}){1,3})|((:[0-9A-Fa-f]{1,4})?:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){3}(((:[0-9A-Fa-f]{1,4}){1,4})|((:[0-9A-Fa-f]{1,4}){0,2}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){2}(((:[0-9A-Fa-f]{1,4}){1,5})|((:[0-9A-Fa-f]{1,4}){0,3}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){1}(((:[0-9A-Fa-f]{1,4}){1,6})|((:[0-9A-Fa-f]{1,4}){0,4}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(:(((:[0-9A-Fa-f]{1,4}){1,7})|((:[0-9A-Fa-f]{1,4}){0,5}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:)))(%.+)?
IPV4 (?&lt;[外链图片转存失败,源站可能有防盗0-9])(?:!链机(img-V?:[0E1]?[9],2}|2[0-4][0-9]|25[0-{5336])[.](?:[0-8]?[0-)]{1,(}|2[0-:0[0-9]|25[0-5])-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])][.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5]))(?![0-9])
IP (?:%{IPV6}|%{IPV4})
HOSTNAME \b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b)
IPORHOST (?:%{IP}|%{HOSTNAME})
HOSTPORT %{IPORHOST}:%{POSINT}

想要了解更多的grok正则匹配请参考这里

实际的nginx日志如下：

192.168.1.137 - - [23/May/2022:14:33:18 +0800] "GET /index.php?m=message&f=ajaxGetMessage&t=html&windowBlur=1 HTTP/1.1" 200 5 "http://192.168.1.240:10003/index.php?m=bug&f=browse&productID=2" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36 Edg/100.0.1185.50"

所以对应的grok表达式是：

%{IPORHOST:remote_ip} - %{DATA:user_name} \[%{HTTPDATE:time}\] \"%{WORD:method} %{DATA:url} HTTP/%{NUMBER:http_version}\" %{NUMBER:response_code} %{NUMBER:body_sent:bytes} \"%{DATA:referrer}\" \"%{DATA:user_agent}\"

可以先拿其中一条日志，使用 Grok Debugger 工具在线调试下，看看写的grok是否正确

没问题后，就可以接着创建 logstash/conf.d/nginx.conf，添加以下内容：

input {       # 数据来源
    beats {
        port => 5044
        codec => "json"
    }
}
filter {     # 数据转换（最核心）
    grok {
        match => { "message" => '%{IPORHOST:remote_ip} - %{DATA:user_name} \[%{HTTPDATE:time}\] \"%{WORD:method} %{DATA:url} HTTP/%{NUMBER:http_version}\" %{NUMBER:response_code} %{NUMBER:body_sent:bytes} \"%{DATA:referrer}\" \"%{DATA:user_agent}\"' }
        remove_field => "message"    # 作用是删除message字段，message字段是默认的字段记录的是原始nginx日志的一整行，删除后可以减少es的存储占用
    }
}
output {     # 数据输出
    elasticsearch {
        hosts => ["192.168.1.240:9200"]
        index => "nginx-access-log-%{+YYYY.MM.dd}"   # 按天存储到Elasticsearch索引中
    }
    stdout { codec => rubydebug }
}

注意：里面有个 user_agent 字段名，不能用 agent 代替。因为在下面看 logstash 的日志中会发现filebeat 会推送一个 agent 字段名。如果自己定义一个agent字段名，会把 grok 中的 agent 覆盖掉。

"agent" => {
              "id" => "dbfdd21f-edf6-4e04-beca-78d20dbea316",
            "type" => "filebeat",
         "version" => "7.16.1",
    "ephemeral_id" => "35990523-3475-4235-bbae-a577c583b46e",
            "name" => "69760ea94693",
        "hostname" => "69760ea94693"
},

然后可以启动logstash了。

3. docker-compose.yml 新增Logstash内容

  logstash:
    image: "docker.elastic.co/logstash/logstash:${ELK_VERSION}"
    container_name: logstash
    restart: always
    volumes:
      - $PWD/logstash/logstash.yml:/usr/share/logstash/config/logstash.yml
      - $PWD/logstash/conf.d/:/usr/share/logstash/conf.d/
      - /etc/localtime:/etc/localtime:ro
    depends_on:
      - elasticsearch #kibana在elasticsearch启动之后再启动
    links:
      - elasticsearch
    ports:
      - "5044:5044"

4. 启动容器

docker-compose up -d

5. 测试logstash配置

通过docker exec进入logstash容器中，通过 logstash -f /usr/share/logstash/conf.d/nginx.conf --config.test_and_exit 测试logstash配置是否正确

# docker exec -it logstash bash
bash-4.2$ logstash -f /usr/share/logstash/conf.d/nginx.conf --config.test_and_exit
Using bundled JDK: /usr/share/logstash/jdk
OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release.

Sending Logstash logs to /var/log/logstash which is now configured via log4j2.properties
[2022-05-23T09:30:34,217][INFO ][logstash.runner          ] Log4j configuration path used is: /usr/share/logstash/config/log4j2.properties
[2022-05-23T09:30:34,225][INFO ][logstash.runner          ] Starting Logstash {"logstash.version"=>"7.16.1", "jruby.version"=>"jruby 9.2.20.1 (2.5.8) 2021-11-30 2a2962fbd1 OpenJDK 64-Bit Server VM 11.0.13+8 on 11.0.13+8 +indy +jit [linux-x86_64]"}
[2022-05-23T09:30:35,233][WARN ][logstash.config.source.multilocal] Ignoring the 'pipelines.yml' file because modules or command line options are specified
[2022-05-23T09:30:36,796][INFO ][org.reflections.Reflections] Reflections took 154 ms to scan 1 urls, producing 119 keys and 417 values 
[2022-05-23T09:30:38,043][WARN ][deprecation.logstash.codecs.json] Relying on default value of `pipeline.ecs_compatibility`, which may change in a future major release of Logstash. To avoid unexpected changes when upgrading Logstash, please explicitly declare your desired ECS Compatibility mode.
[2022-05-23T09:30:38,114][WARN ][deprecation.logstash.inputs.beats] Relying on default value of `pipeline.ecs_compatibility`, which may change in a future major release of Logstash. To avoid unexpected changes when upgrading Logstash, please explicitly declare your desired ECS Compatibility mode.
[2022-05-23T09:30:38,314][WARN ][deprecation.logstash.codecs.plain] Relying on default value of `pipeline.ecs_compatibility`, which may change in a future major release of Logstash. To avoid unexpected changes when upgrading Logstash, please explicitly declare your desired ECS Compatibility mode.
[2022-05-23T09:30:38,350][WARN ][deprecation.logstash.outputs.elasticsearch] Relying on default value of `pipeline.ecs_compatibility`, which may change in a future major release of Logstash. To avoid unexpected changes when upgrading Logstash, please explicitly declare your desired ECS Compatibility mode.
Configuration OK
[2022-05-23T09:30:38,526][INFO ][logstash.runner          ] Using config.test_and_exit mode. Config Validation Result: OK. Exiting Logstash

最后提示 sing config.test_and_exit mode. Config Validation Result: OK. Exiting Logstash 测试通过。

6. 查看Logstash 日志

通过执行以下命令查看日志

docker-compose logs -f logstash

出现如下内容表示已经获取到从filebeat传来的nginx访问日志

logstash         |{
logstash         |               "log" => {
logstash         |         "offset" => 2323638,
logstash         |           "file" => {
logstash         |             "path" => "/var/log/nginx/zentaopms.access.log"
logstash         |         }
logstash         |     },
logstash         |      "http_version" => "1.1",
logstash         |         "body_sent" => "0",
logstash         |             "input" => {
logstash         |         "type" => "log"
logstash         |     },
logstash         |         "remote_ip" => "192.168.1.137",
logstash         |             "agent" => {
logstash         |                   "id" => "dbfdd21f-edf6-4e04-beca-78d20dbea316",
logstash         |                 "type" => "filebeat",
logstash         |              "version" => "7.16.1",
logstash         |         "ephemeral_id" => "35990523-3475-4235-bbae-a577c583b46e",
logstash         |                 "name" => "69760ea94693",
logstash         |             "hostname" => "69760ea94693"
logstash         |     },
logstash         |              "host" => {
logstash         |         "name" => "69760ea94693"
logstash         |     },
logstash         |         "user_name" => "-",
logstash         |     "response_code" => "304",
logstash         |          "referrer" => "http://192.168.1.240:10003/js/kindeditor//kindeditor.min.css",
logstash         |            "method" => "GET",
logstash         |               "ecs" => {
logstash         |         "version" => "1.12.0"
logstash         |     },
logstash         |        "@timestamp" => 2022-05-23T09:30:54.385Z,
logstash         |               "url" => "/js/kindeditor//themes/default/default.png",
logstash         |          "@version" => "1",
logstash         |              "tags" => [
logstash         |         [0] "_jsonparsefailure",
logstash         |         [1] "beats_input_codec_json_applied"
logstash         |     ],
logstash         |            "fields" => {
logstash         |         "log_source" => "nginx"
logstash         |     },
logstash         |              "time" => "20/Oct/2021:17:45:41 +0800",
logstash         |        "user_agent" => "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36 Edg/94.0.992.47"
logstash         | }

部署Kibana

Kibana 是一个为 Logstash 和 ElasticSearch 提供的日志分析的 Web 接口。可使用它对日志进行高效的搜索、可视化、分析等各种操作。

1. 新建目录

mkdir kibana

拷贝配置文件

docker run -d --rm --name kibana docker.elastic.co/kibana/kibana:7.16.1
docker cp kibana:/opt/kibana/config/kibana.yml kibana/
docker stop kibana

2. docker-compose.yml 新增kibana内容

  kibana:
    image: "docker.elastic.co/kibana/kibana:${ELK_VERSION}"
    container_name: kibana
    restart: always
    depends_on:
      - elasticsearch
    #environment:
    #  - ELASTICSEARCH_URL=http://elasticsearch:9200
    ports:
      - "5601:5601"
    volumes:
      - $PWD/kibana/kibana.yml:/opt/kibana/config/kibana.yml
      - /etc/localtime:/etc/localtime:ro

3. 修改配置文件

修改kibana/kibana.yml配置文件，在 elasticsearch.hosts 上设置 elasticsearch 的IP：

server.host: "0.0.0.0"
server.shutdownTimeout: "5s"
server.publicBaseUrl: "http://192.168.52.11:5601"
elasticsearch.hosts: [ "http://192.168.1.240:9200" ]
monitoring.ui.container.elasticsearch.enabled: true
#i18n.locale: "zh-CN   # 中文

4. 启动容器

docker-compose up -d

至此EKL + Filebeat 部署完成，完整docker-compose.yml文件内容如下：

version: '3'
services:
  elasticsearch:                    # 服务名称
    image: "docker.elastic.co/elasticsearch/elasticsearch:${ELK_VERSION}"      # 使用的镜像
    container_name: elasticsearch   # 容器名称
    restart: always                 # 失败自动重启策略
    environment:
      - node.name=node-1                   # 节点名称，集群模式下每个节点名称唯一
      - network.publish_host=192.168.1.240  # 用于集群内各机器间通信,对外使用，其他机器访问本机器的es服务，一般为本机宿主机IP
      - network.host=0.0.0.0                # 设置绑定的ip地址，可以是ipv4或ipv6的，默认为0.0.0.0，即本机
      - discovery.seed_hosts=192.168.1.240          # es7.0之后新增的写法，写入候选主节点的设备地址，在开启服务后，如果master挂了，哪些可以被投票选为主节点
      - cluster.initial_master_nodes=192.168.1.240  # es7.0之后新增的配置，初始化一个新的集群时需要此配置来选举master
      - cluster.name=es-cluster     # 集群名称，相同名称为一个集群， 三个es节点须一致
      - bootstrap.memory_lock=true  # 内存交换的选项，官网建议为true
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m" # 设置内存，如内存不足，可以尝试调低点
    ulimits:        # 栈内存的上限
      memlock:
        soft: -1    # 不限制
        hard: -1    # 不限制
    volumes:
      - $PWD/elasticsearch/config:/usr/share/elasticsearch/config  # 将容器中es的配置文件映射到本地，本地必须提前先有该文件，设置跨域， 否则head插件无法连接该节点
      - $PWD/elasticsearch/data:/usr/share/elasticsearch/data  # 存放数据的文件
      - $PWD/elasticsearch/plugins:/usr/share/elasticsearch/plugins  # 存放插件的文件
      - /etc/localtime:/etc/localtime:ro
    ports:
      - "9200:9200"    # http端口，可以直接浏览器访问
      - "9300:9300"    # es集群之间相互访问的端口，jar之间就是通过此端口进行tcp协议通信，遵循tcp协议。


  filebeat:
    image: "docker.elastic.co/beats/filebeat:${ELK_VERSION}"
    container_name: filebeat
    user: root # 必须为root，否则会因为无权限而无法启动
    depends_on:
      - elasticsearch
      - logstash
    environment:
      - strict.perms=false  # 设置是否对配置文件进行严格权限校验
    volumes:
      - $PWD/filebeat/conf/filebeat.yml:/usr/share/filebeat/filebeat.yml
      # 映射到容器中[作为数据源]
      - $PWD/filebeat/logs:/usr/share/filebeat/logs:rw
      #- $PWD/filebeat/data:/usr/share/filebeat/data:rw
      - /etc/localtime:/etc/localtime:ro
      - /data/chandao/nginx/logs/:/var/log/nginx/
    # 将指定容器连接到当前连接，可以设置别名，避免ip方式导致的容器重启动态改变的无法连接情况
    links:
      - logstash

  logstash:
    image: "docker.elastic.co/logstash/logstash:${ELK_VERSION}"
    container_name: logstash
    restart: always
    volumes:
      - $PWD/logstash/logstash.yml:/usr/share/logstash/config/logstash.yml
      - $PWD/logstash/conf.d/:/usr/share/logstash/conf.d/
      - /etc/localtime:/etc/localtime:ro
    depends_on:
      - elasticsearch #kibana在elasticsearch启动之后再启动
    links:
      - elasticsearch
    ports:
      - "5044:5044"


  kibana:
    image: "docker.elastic.co/kibana/kibana:${ELK_VERSION}"
    container_name: kibana
    restart: always
    depends_on:
      - elasticsearch
    #environment:
    #  - ELASTICSEARCH_URL=http://elasticsearch:9200
    ports:
      - "5601:5601"
    volumes:
      - $PWD/kibana/kibana.yml:/opt/kibana/config/kibana.yml
      - /etc/localtime:/etc/localtime:ro