Prometheus Alertmanager 集成钉钉告警
安装Prometheus、Altermanager
1.使用docker-compose集成Prometheus和Altermanager
cat docker-compose.yml
| version: "3" |
| services: |
| prometheus: |
| image: prom/prometheus:v2.35.0 |
| container_name: prometheus |
| hostname: prometheus |
| volumes: |
| |
| - "./prometheus:/etc/prometheus" |
| - "/etc/localtime:/etc/localtime" |
| - "./data:/prometheus" |
| restart: on-failure |
| network_mode: "host" |
| logging: |
| driver: "json-file" |
| options: |
| tag: prometheus |
| cap_add: |
| - ALL |
| command: |
| - '--config.file=/etc/prometheus/prometheus.yml' |
| - '--web.enable-admin-api' |
| - '--web.enable-lifecycle' |
| alertmanager: |
| image: prom/alertmanager:v0.24.0 |
| container_name: alertmanager |
| hostname: alertmanager |
| restart: on-failure |
| network_mode: "host" |
| logging: |
| driver: "json-file" |
| options: |
| tag: prometheus |
| cap_add: |
| - ALL |
| volumes: |
| - ./alertmanager/:/etc/alertmanager/ |
| command: |
| - '--config.file=/etc/alertmanager/config.yml' |
| - '--storage.path=/alertmanager' |
2.准备Prometheus配置文件
prometheus.yml
cat prometheus.yml
| |
| global: |
| scrape_interval: 15s |
| evaluation_interval: 15s |
| |
| |
| |
| alerting: |
| alertmanagers: |
| - static_configs: |
| - targets: |
| - alertmanagers:9093 |
| |
| |
| rule_files: |
| - "/etc/prometheus/rules/*.yml" |
| |
| |
| |
| |
| |
| scrape_configs: |
| |
| - job_name: "prometheus" |
| |
| |
| |
| |
| static_configs: |
| - targets: ["localhost:9090"] |
| |
| - job_name: "actuator_health" |
| metrics_path: '/actuator/prometheus' |
| file_sd_configs: |
| - refresh_interval: 1m |
| files: |
| - "./service_endpoint*.yml" |
| |
| - job_name: "docker" |
| file_sd_configs: |
| - refresh_interval: 1m |
| files: |
| - "./docker_endpoint*.yml" |
| |
| - job_name: "node-exporter" |
| file_sd_configs: |
| - refresh_interval: 1m |
| files: |
| - "./node-exporter*.yml" |
| |
service_endpoint_all.yml
cat service_endpoint_all.yml
| - targets: |
| - ip:20006 |
| labels: |
| servicename: sname01 |
| - targets: |
| - ip:20005 |
| labels: |
| servicename: sname01 |
node-exporter-all.yml
cat node-exporter-all.yml
| - targets: ['ip:7100'] |
| labels: |
| hostname: "node-01" |
| - targets: ['ip:7100'] |
| labels: |
| hostname: "node-02" |
| - targets: ['ip:7100'] |
| labels: |
| hostname: "node-03" |
docker_endpoint_all.yml
cat docker_endpoint_all.yml
| - targets: ['ip:7080'] |
| labels: |
| hostname: "env-mid" |
| - targets: ['ip:7080'] |
| labels: |
| hostname: "env-ap-02" |
| - targets: ['ip:7080'] |
| labels: |
| hostname: "env-ap-01" |
3.准备告警规则
service_alter.yml
cat service_alter.yml # 此规则的labels与annotations将用于下面的告警模板
| groups: |
| - name: Service_Down |
| rules: |
| - alert: 服务下线通知 |
| |
| expr: up{job="actuator_health",servicename!="iot-aircraft_192.168.0.22"}==0 |
| for: 10s |
| labels: |
| user: prometheus |
| severity: warning |
| env: "prod" |
| sname: "{{ $labels.servicename }}" |
| annotations: |
| summary: "{{ $labels.servicename }} 服务下线" |
| description: "{{ $labels.servicename }} of job {{ $labels.job }} has been Down." |
| title: "{{ $labels.servicename }} 服务状态告警" |
| [root@prometheus rules] |
4.准备altermanager配置文件
config.yml
cat config.yml
| global: |
| |
| resolve_timeout: 1m |
| |
| |
| templates: |
| - '/etc/alertmanager/dingtalk.tmpl' |
| route: |
| |
| receiver: 'devops' |
| group_by: ['Service_Down'] |
| |
| group_wait: 10s |
| |
| group_interval: 10s |
| |
| repeat_interval: 1h |
| |
| routes: |
| - receiver: devops |
| group_wait: 10s |
| match: |
| team: DevOps |
| receivers: |
| - name: 'devops' |
| webhook_configs: |
| - url: http://192.168.0.28:8060/dingtalk/devops/send |
| |
| send_resolved: true |
启动容器
安装dingtalk
1.下载安装包
| cd /opt |
| wget https://github.com/timonwong/prometheus-webhook-dingtalk/releases/download/v2.1.0/prometheus-webhook-dingtalk-2.1.0.linux-amd64.tar.gz |
| tar xvf prometheus-webhook-dingtalk-2.1.0.linux-amd64.tar.gz |
| mv prometheus-webhook-dingtalk-2.1.0.linux-amd64 prometheus-webhook-dingtalk |
2.配置系统服务托管
| |
| |
| [Unit] |
| Descripton=dingtalk |
| Documentation=https://github.com/timonwong/prometheus-webhook-dingtalk/ |
| After=network.target |
| |
| [Service] |
| Restart=on-failure |
| WorkingDirectory=/opt/prometheus-webhook-dingtalk |
| ExecStart=/opt/prometheus-webhook-dingtalk/prometheus-webhook-dingtalk --config.file=/opt/prometheus-webhook-dingtalk/config.yml --web.enable-ui |
| |
| [Install] |
| WantedBy=multi-user.target |
| |
| |
| systemctl daemon-reload |
| systemctl enable dingtalk.service |
| systemctl start dingtalk.service |
| systemctl status dingtalk.service |
| ss -tnl | grep 8060 |
3.准备配置文件
配置模板路径:
| /opt/prometheus-webhook-dingtalk/config.example.yml |
复制模板:
| cp /opt/prometheus-webhook-dingtalk/config.example.yml /opt/prometheus-webhook-dingtalk/config.yml |
修改配置模板: cat config.yml
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| templates: |
| - /opt/prometheus/alertmanager/dingtalk.tmpl |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| targets: |
| devops: |
| url: https://oapi.dingtalk.com/robot/send?access_token=631dbf86f484df72d92311e1664d08feef84334b8a668535f0bc8e7cce91a718 |
| secret: 钉钉key |
| message: |
| title: '{{ template "ops.title" . }}' |
| text: '{{ template "ops.content" . }}' |
4.准备自定义消息模板
注意:此模板变量与告警规则相关联
| {{ define "__subject" }} |
| [{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] |
| {{ end }} |
| |
| |
| {{ define "__alert_list" }}{{ range . }} |
| --- |
| **告警名称**: {{ index .Annotations "title" }} |
| |
| **告警环境**: {{ .Labels.env }} |
| |
| **告警级别**: {{ .Labels.severity }} |
| |
| **告警主机**: {{ .Labels.instance }} |
| |
| **告警服务**: {{ .Labels.sname }} |
| |
| **告警信息**: {{ index .Annotations "description" }} |
| |
| **告警时间**: {{ (.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }} |
| {{ end }}{{ end }} |
| |
| {{ define "__resolved_list" }}{{ range . }} |
| --- |
| **告警名称**: {{ index .Annotations "title" }} |
| |
| **告警环境**: {{ .Labels.env }} |
| |
| **告警级别**: {{ .Labels.severity }} |
| |
| **告警主机**: {{ .Labels.instance }} |
| |
| **告警服务**: {{ .Labels.sname }} |
| |
| **告警信息**: {{ index .Annotations "description" }} |
| |
| **告警时间**: {{ (.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }} |
| |
| **恢复时间**: {{ (.EndsAt.Add 28800e9).Format "2006-01-02 15:04:05" }} |
| {{ end }}{{ end }} |
| |
| |
| {{ define "ops.title" }} |
| {{ template "__subject" . }} |
| {{ end }} |
| |
| {{ define "ops.content" }} |
| {{ if gt (len .Alerts.Firing) 0 }} |
|  |
| **====侦测到{{ .Alerts.Firing | len }}个故障====** |
| {{ template "__alert_list" .Alerts.Firing }} |
| --- |
| {{ end }} |
| |
| {{ if gt (len .Alerts.Resolved) 0 }} |
|  |
| **====恢复{{ .Alerts.Resolved | len }}个故障====** |
| {{ template "__resolved_list" .Alerts.Resolved }} |
| {{ end }} |
| {{ end }} |
| |
| {{ define "ops.link.title" }}{{ template "ops.title" . }}{{ end }} |
| {{ define "ops.link.content" }}{{ template "ops.content" . }}{{ end }} |
| {{ template "ops.title" . }} |
| {{ template "ops.content" . }} |
| |
模板可以使用dingtalk插件的ui界面:http://altermanager:8060/ui调试,开启方法是启动参数添加 --web.enable-ui

5.配置完成后重启服务并检查服务状态
| systemctl restart dingtalk.service |
| systemctl status dingtalk.service |
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· Manus爆火,是硬核还是营销?
· 终于写完轮子一部分:tcp代理 了,记录一下
· 震惊!C++程序真的从main开始吗?99%的程序员都答错了
· 别再用vector<bool>了!Google高级工程师:这可能是STL最大的设计失误
· 单元测试从入门到精通