elk报警监控之sentinl 钉钉+邮件告警
注:我的elk sentinl版本都是6.5.1
前期知识 es的查询语法、es watcher使用方法。
https://www.cnblogs.com/pilihaotian/p/5830754.html
https://www.cnblogs.com/ghj1976/p/5293250.html
https://www.cnblogs.com/wihainan/p/7064943.html
钉钉告警设置
1、钉钉里先建一个群,然后群内添加一个机器人,最后登录电脑版钉钉获取钉钉地址
2、安装sentinl
可以在线安装 ./kibana-plugin install https://github.com/sirensolutions/sentinl/releases/download/tag-6.5.0-0/sentinl-v6.5.1.zip
也可以离线安装 ./kibana-plugin install file:../../sentinl-v6.5.1.zip file 关键字不能漏掉
3、安装好重启kinaba,然后在打开页面就可以看到sentinl了
4、配置sentinl
在sentinl添加一个watcher,我使用的高级配置
配置如下:
1 { 2 "actions": { 3 "Webhook_683bd385-86b3-46ba-8e1b-f89cccccbbec": { 4 "name": "Tomcat异常告警", 5 "throttle_period": "1m", 6 "webhook": { 7 "priority": "high", 8 "stateless": false, 9 "method": "POST", 10 "host": "oapi.dingtalk.com", 11 "port": "443", 12 "path": "/robot/send?access_token=*********", #写你自己的钉钉机器人地址 13 "body": " {\"msgtype\": \"text\",\r\n \"text\": {\r\n \"content\":\" 异常发生,请处理~ \r\n 主机:{{payload.hits.hits.0._index}} \r\n IP:{{payload.hits.hits.0._source.type}} \r\n 告警内容:{{payload.hits.hits.0._source.message}} \r\n 最近一分钟发生次数:{{payload.hits.total}}\"\r\n } \r\n }", 14 "params": { 15 "watcher": "{{watcher.title}}", 16 "payload_count": "{{payload.hits.total}}" 17 }, 18 "headers": { 19 "Content-Type": "application/json" 20 }, 21 "auth": "钉钉账号:钉钉密码", #这个验证可以不要,删掉也没事 22 "message": "业务功能告警", 23 "use_https": true 24 } 25 } 26 }, 27 "input": { 28 "search": { 29 "request": { 30 "index": [ 31 "*-tomcat" 32 ], 33 "body": { 34 "query": { 35 "bool": { 36 "must": [ 37 { 38 "match": { 39 "level": "ERROR" 40 } 41 }, 42 { 43 "range": { 44 "@timestamp": { 45 "gte": "now-1m", 46 "lte": "now", 47 "format": "epoch_millis" 48 } 49 } 50 } 51 ], 52 "must_not": [] 53 } 54 } 55 } 56 } 57 } 58 }, 59 "condition": { 60 "script": { 61 "script": "payload.hits.total >=1" 62 } 63 }, 64 "trigger": { 65 "schedule": { 66 "later": "every 1 minutes" 67 } 68 }, 69 "disable": true, 70 "report": false, 71 "title": "钉钉告警", 72 "save_payload": false, 73 "spy": true, 74 "impersonate": false 75 }
其中actions是发生触发报警时的动作用什么告警,我这里用的是钉钉,也可以邮件报警
钉钉报警内容里参数可以按照elk里参数获取。
payload.hits.hits.0._index
payload.hits.hits 是查询的到所有报警信息,0表示第一条报警信息
input就是去es里查询数据,相关使用方法参数文章前的链接,下面只是简单说明。
index是需要去哪个es索引里查询数据,可以用正则 .文中配置是查询最近一分钟内level等级是ERROR的所有数据。
condition 是对查询结果进行计算,payload.hits.total >=1是查询结果条数如果大于等于1则报警。
trigger是查询频率 , "later": "every 1 minutes" 表示每隔一分钟则查询一次。
spy表示是否在关闭网页后仍然监控运行.默认情况只有在打开网页的情况下才能周期报警。
5、验证
如果有数据,则显示watcher executed,否则显示no data。
显示watcher executed,则钉钉会收到报警信息
如果显示watcher executed但钉钉没有收到信息,可以查看日志报什么错。其中no transform found表示actions里body内语法错误,可以检查下语法。
邮箱告警设置
前提是要设置服务器能发送邮件,可参考https://www.cnblogs.com/abkn/p/9720143.html
1、配置kibana.yml后重启kibana
sentinl: settings: email: active: true user: ****@****.com password: ******* host: smtp.exmail.qq.com port: 465 ssl: true #根据实际情况添加 report: active: true
2、配置sentinl
{ "actions": { "email_html_alarm_9c8f6d7f-55c7-49f0-863d-ad3363726978": { "name": "api tomcat异常", "throttle_period": "1m", "email_html": { "from": "*****@tan66.com", "to": [ "*****@tan66.com", "*****@tan66.com" ], "stateless": false, "subject": "api tomcat异常", "priority": "high", "html": "<p>异常发生,请处理~ </p> <br> 主机:{{payload.hits.hits.0._index}} <br> IP:{{payload.hits.hits.0._source.type}} <br> 告警内容:{{payload.hits.hits.0._source.message}} <br> 最近一分钟发生次数:{{payload.hits.total}}" } } }, "input": { "search": { "request": { "index": [ "kyb-api-tomcat" ], "body": { "query": { "bool": { "must": [ { "match": { "level": "ERROR" } }, { "range": { "@timestamp": { "gte": "now-1m", "lte": "now", "format": "epoch_millis" } } } ], "must_not": [] } } } } } }, "condition": { "script": { "script": "payload.hits.total >= 1" } }, "trigger": { "schedule": { "later": "every 30 seconds" } }, "disable": false, "report": false, "title": "api tomcat异常", "save_payload": false, "spy": false, "impersonate": false }
3、告警结果显示