prometheus告警配置

这是prometheus告警规则配置,实际告警要结合alertmanager使用,请看下一篇文章。

rule
https://samber.github.io/awesome-prometheus-alerts/rules

jvm案例
wget https://raw.githubusercontent.com/samber/awesome-prometheus-alerts/master/dist/rules/jvm/jvm-exporter.yml

文件内容

点击查看代码

groups:
- name: exceptionRule
  rules:
  - alert: exceptionAlert
    expr: application_exception{application="userDemo"} < 10
    for: 1m
    labels:
      severity: warning
      team: frontend
    annotations:
      summary: "服务器频繁报错"
      description: "报错的频率达到(当前值:{{ $value }}%)"
- name: ckExceptionRule
  rules:
  - alert: ckExceptionAlert
    expr: sum(increase(bbc_request_timer_ID_seconds_count{}[5m])) by (business_name) > 10
    for: 2m
    labels:
      severity: warning
      app: "gateway"
    annotations:
      summary: "test系统最近5分钟服务异常"
      description: "报错的频率达到(当前值:{{ $value }})"

检查模版
./promtool check rules first_rules.yml
./promtool check rules jvm-exporter.yml
关闭
ps -ef |grep prometheus |awk '{print $2}'|xargs kill -9
启动
nohup ./prometheus --config.file=./prometheus.yml --web.enable-lifecycle --storage.tsdb.retention.time=20d --web.external-url=http://8.219.198.22:9090 > server_prometheus.log 2>&1 &

重启
curl -X POST http://localhost:9090/-/reload

posted @   kunchengs  阅读(19)  评论(0编辑  收藏  举报
努力加载评论中...
点击右上角即可分享
微信分享提示