ELK日志报警插件ElastAlert并配置钉钉报警
文章转载自:https://www.cnblogs.com/uglyliu/p/13118386.html
ELK日志报警插件ElastAlert
它通过将Elasticsearch与两种类型的组件(规则类型和警报)结合使用。定期查询Elasticsearch,并将数据传递到规则类型,该规则类型确定何时找到匹配项。发生匹配时,将为该警报提供一个或多个警报,这些警报将根据匹配采取行动。
这是由一组规则配置的,每个规则定义一个查询,一个规则类型和一组警报。
ElastAlert包含几种具有常见监视范例的规则类型:
匹配Y时间内至少有X个事件的地方”(frequency类型)
当事件发生率增加或减少时匹配”(spike类型
在Y时间内少于X个事件时进行匹配”(flatline类型
当某个字段与黑名单/白名单匹配时匹配”(blacklist并whitelist输入)
匹配任何与给定过滤器匹配的事件”(any类型)
当某个字段在一段时间内具有两个不同的值时进行匹配”(change类型)
当字段中出现从未见过的术语时进行匹配”(new_term类型)
当字段的唯一值数量大于或小于阈值(cardinality类型)时匹配
告警支持邮件、钉钉、微信、自定义等多种告警方式;能灵活从es中查询出来的内容
python3.6安装
tar xf Python-3.6.8.tar.xz
yum -y install wget sqlite-devel xz gcc automake zlib-devel openssl-devel epel-release
cd Python-3.6.8/
./configure && make && make install
mkdir -p /app/elastalert/rule
安装elastalert
cd /app/elastalert && git clone https://github.com/Yelp/elastalert.git
cd /app/elastalert/elastalert && pip3 install -r requirements.txt
pip3 uninstall elasticsearch
pip3 install "elasticsearch>=5.0.0"
#这里注意elasticsearch的版本,elasticsearch6的版本可能用不了pip安装的最新的elasticsearch包,卸载最新的执行pip3 install "elasticsearch>=5.0.0"即可,或者安装之前修改下requirements.txt里的elasticsearch版本
python3 setup.py install
配置elastalert
cp /app/elastalert/elastalert/config.yaml.example /app/elastalert/elastalert/config.yaml
按需修改即可
# This is the folder that contains the rule yaml files
# Any .yaml file will be loaded as a rule
rules_folder: /app/elastalert/rule
# How often ElastAlert will query Elasticsearch
# The unit can be anything from weeks to seconds
run_every:
minutes: 1
# ElastAlert will buffer results from the most recent
# period of time, in case some log sources are not in real time
buffer_time:
minutes: 1
# The Elasticsearch hostname for metadata writeback
# Note that every rule can have its own Elasticsearch host
es_host: 某一个es节点的IP
# The Elasticsearch port
es_port: 9200
# The AWS region to use. Set this when using AWS-managed elasticsearch
#aws_region: us-east-1
# The AWS profile to use. Use this if you are using an aws-cli profile.
# See http://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html
# for details
#profile: test
# Optional URL prefix for Elasticsearch
#es_url_prefix: elasticsearch
# Connect with TLS to Elasticsearch
#use_ssl: True
# Verify TLS certificates
#verify_certs: True
# GET request with body is the default option for Elasticsearch.
# If it fails for some reason, you can pass 'GET', 'POST' or 'source'.
# See http://elasticsearch-py.readthedocs.io/en/master/connection.html?highlight=send_get_body_as#transport
# for details
#es_send_get_body_as: GET
# Option basic-auth username and password for Elasticsearch
#es_username: someusername
#es_password: somepassword
# Use SSL authentication with client certificates client_cert must be
# a pem file containing both cert and key for client
#verify_certs: True
#ca_certs: /path/to/cacert.pem
#client_cert: /path/to/client_cert.pem
#client_key: /path/to/client_key.key
# The index on es_host which is used for metadata storage
# This can be a unmapped index, but it is recommended that you run
# elastalert-create-index to set a mapping
writeback_index: elastalert_status
writeback_alias: elastalert_alerts
# If an alert fails for some reason, ElastAlert will retry
# sending the alert until this time period has elapsed
alert_time_limit:
days: 1
# Custom logging configuration
# If you want to setup your own logging configuration to log into
# files as well or to Logstash and/or modify log levels, use
# the configuration below and adjust to your needs.
# Note: if you run ElastAlert with --verbose/--debug, the log level of
# the "elastalert" logger is changed to INFO, if not already INFO/DEBUG.
#logging:
# version: 1
# incremental: false
# disable_existing_loggers: false
# formatters:
# logline:
# format: '%(asctime)s %(levelname)+8s %(name)+20s %(message)s'
#
# handlers:
# console:
# class: logging.StreamHandler
# formatter: logline
# level: DEBUG
# stream: ext://sys.stderr
#
# file:
# class : logging.FileHandler
# formatter: logline
# level: DEBUG
# filename: elastalert.log
#
# loggers:
# elastalert:
# level: WARN
# handlers: []
# propagate: true
#
# elasticsearch:
# level: WARN
# handlers: []
# propagate: true
#
# elasticsearch.trace:
# level: WARN
# handlers: []
# propagate: true
#
# '': # root logger
# level: WARN
# handlers:
# - console
# - file
# propagate: false
#字段解释
rules_folder: 是ElastAlert从中加载规则配置文件的位置。它将尝试加载文件夹中的每个.yaml文件。没有任何有效规则,ElastAlert将无法启动。随着此文件夹中文件的更改,ElastAlert还将加载新规则,停止运行缺少的规则并重新启动修改后的规则
run_every: 是ElastAlert多久查询一次Elasticsearch的时间
buffer_time: 用来设置请求里时间字段的范围,默认是45分钟
Es_host: elasticsearch的host地址
Es_port: elasticsearch对应的端口号
writeback_index: 是ElastAlert将在其中存储数据的索引的名称
writeback_alias: 别名
alert_time_limit: 是失败警报的重试窗口
配置完成后,执行下elastalert-create-index --config config.yaml
钉钉报警插件安装
wget https://github.com/xuyaoqiang/elastalert-dingtalk-plugin/archive/master.zip
unzip elastalert-dingtalk-plugin-master.zip
cd elastalert-dingtalk-plugin-master
pip3 install pyOpenSSL==16.2.0
pip3 install setuptools==46.1.3
cp -r elastalert_modules /app/elastalert/
规则范例
#可以在example_rules /中找到不同类型的规则的示例。
example_spike.yaml
是“峰值”规则类型的示例,它使您可以警告某个时间段内的平均事件发生率增加给定因子的时间。当在过去2个小时内发生与过滤器匹配的事件比前2个小时的事件数多3倍时,此示例将发送电子邮件警报。
example_frequency.yaml
是“频率”规则类型的示例,它将在一个时间段内发生给定数量的事件时发出警报。此示例将在4小时内出现50个与给定过滤器匹配的文档时发送电子邮件。
example_change.yaml
是“更改”规则类型的示例,当两个文档中的某个字段发生更改时,它将发出警报。在此示例中,当两个文档具有相同的“用户名”字段但“ country_name”字段的值不同时,会在24小时之内发送警报电子邮件。
example_new_term.yaml
是“新术语”规则类型的示例,当一个或多个新值出现在一个或多个字段中时,它将发出警报。在此示例中,在示例登录日志中遇到新值(“用户名”,“计算机”)时,将发送一封电子邮件。
配置告警规则
检查nginx 5XX状态,一分钟内大于5次便发送钉钉告警
cat /app/elastalert/rule/nginx.yaml
name: the count of servnginx log that reponse status code is 5xx is greater than 5 in the period 1 minute
index: nginx-*
type: frequency
num_events: 5
timeframe: {minutes: 1}
filter:
- range:
status:
from: 500
to: 599
alert_text: "
域 名: {}\n
调用方式: {}\n
请求链接: {}\n
状 态 码: {}\n
后端服务器: {}\n
数 量: {}
"
alert_text_type: alert_text_only
alert_text_args:
- host
- method
- request
- status
- upstream
- num_hits
alert:
- "elastalert_modules.dingtalk_alert.DingTalkAlerter"
dingtalk_webhook: "XXXXXX"
dingtalk_msgtype: "text"
#字段解释
name: 是此规则的唯一名称。如果两个规则共享相同的名称,则ElastAlert将不会启动
type: 每个规则具有不同的类型,可能采用不同的参数。该frequency类型的意思是“当num_events出现多个警报时发出警报timeframe
index: 要查询的索引的名称
num_events: 此参数特定于frequency类型,并且是触发警报时的阈值。
timeframe: 是num_events必须发生的时间段。
filter: 是用于过滤结果的Elasticsearch过滤器列表
alert_text: 自定义需要报警发送的内容
alert_text_args: 对应alert_text的内容
#详细参考
https://elastalert.readthedocs.io/en/latest/recipes/writing_filters.html#writingfilters
alert: 是在每次规则中运行的警报的列表
#详细参考
https://elastalert.readthedocs.io/en/latest/ruletypes.html#alerts
测试规则
elastalert-test-rule example_rules/my_rule.yaml
调试运行
/app/elastalert/bin/python3 -m elastalert.elastalert --verbose --rule /app/elastalert/rule/nginx.yaml
生产运行
官方建议用supervise启动,测试的时候老是读不到配置,就放弃了
nohup /app/elastalert/bin/python3 -m elastalert.elastalert --config /app/elastalert/elastalert/config.yaml --verbose >>/app/elastalert/nohup.out 2>&1 &