kube-prometheus-stack 自定义 alertmanager 配置推送webhook
创建AlertmanagerConfig资源
在没有使用 prometheus-operator 的情况下,需要手动配置 alertmanager.yaml 来路由&发送从 prometheus 接收的警报。
使用 prometheus-operator 之后,事情变得简单一些。只需要创建 AlertmanagerConfig 资源,prometheus-operator 会自动 merge 所有的 AlertmanagerConfig 资源生成/更新 alertmanager.yaml
,并通知 alertmanager 重载配置。
默认情况下,prometheus-operator会关注所有namespace下的所有AlertmanagerConfig:
kubectl get -n kube-prom alertmanagers kubectl get -n kube-prom alertmanagers/kube-promethues-stack-kube-alertmanager -o yaml # spec.alertmanagerConfigNamespaceSelector: {},表示不作筛选 # spec.alertmanagerConfigSelector: {},表示不作筛选
创建一个简单警报路由规则
apiVersion: monitoring.coreos.com/v1alpha1 kind: AlertmanagerConfig metadata: name: testwebhook namespace: kube-prom spec: route: receiver: webhook groupBy: ["instance", "job"] groupWait: "10s" groupInterval: "20s" repeatInterval: "30s" receivers: - name: webhook webhookConfigs: - url: "http://10.0.2.11:8080/webhook/send" sendResolved: true inhibitRules: - sourceMatch: - name: severity value: 'critical' targetMatch: - name: severity value: 'warning' equal: ['instance']
参考:
https://github.com/prometheus-community/helm-charts/issues/2224 https://kkgithub.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#alertmanagerconfig
kubectl apply -f alertmanager-config.yaml
kubectl edit svc kube-promethues-stack-kube-alertmanager -n kube-prom kubectl get svc kube-promethues-stack-kube-alertmanager -n kube-prom
创建资源后,打开alertmanager管理后台 http://10.0.2.12:32466/#/status
页面,确认 Config 已经包含相关的配置信息(可能需要稍等一会)。
AlertmanagerConfig 资源详情参考:https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/api.md#alertmanagerconfig
创建 PrometheusRule 资源
类似 AlertmanagerConfig,可以通过创建 PrometheusRule 资源来创建警报规则(rule),prometheus-operator 会自动把所有 rule 配置 merge 到 prometheus.yml。
默认情况下,prometheus-operator 会关注所有 namespace 下匹配 label release=kube-prometheus-stack
的 PrometheusRule :
kubectl get -n kube-prom prometheuses kubectl get -n kube-prom prometheuses/kube-promethues-stack-kube-prometheus -o yaml # spec.ruleNamespaceSelector: {},表示不作筛选 # spec.ruleSelector: # matchLabels: # release: kube-prometheus-stack
创建一个能立即触发报警的规则:
apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: labels: prometheus: k8s ole: alert-rules name: kube-prom-kube-prom-stack-kube-prome-prometheus.rules namespace: kube-prom spec: groups: - name: disk rules: - alert: diskFree annotations: value: "{{$value}}" summary: "{{ $labels.job }} 项目实例 {{ $labels.instance }} 磁盘使用率大于 80%" description: "{{ $labels.instance }} {{ $labels.mountpoint }} 磁盘使用率大于80% (当前的值: {{ $value }}%),请及时处理" expr: | (1-(node_filesystem_free_bytes{fstype=~"ext4|xfs",mountpoint!="/boot"} / node_filesystem_size_bytes{fstype=~"ext4|xfs",mountpoint!="/boot"}) )*100 > 80 for: 1m labels: severity: warning
kubectl apply -f prometheus-rule.yaml
注意:labels 的severity: warning
和前面创建 AlertmanagerConfig 的 inhibitRules 配置匹配,为什么需要namespace: kube-prom
?prometheus-operator 会在 AlertmanagerConfig 的 matchers 强制加上这个标签,issue 讨论:https://github.com/prometheus-operator/prometheus-operator/issues/3737
kubectl edit svc kube-promethues-stack-kube-prometheus -n kube-prom kubectl get svc kube-promethues-stack-kube-prometheus -n kube-prom
创建资源后,打开prometheus管理后台 http://10.0.2.12:30133/rules
页面,搜索diskFree确认能找到新添加的规则(可能需要稍等一会)。
PrometheusRule 资源详情参考:https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/api.md#prometheusrule
编写 /webhook/send 接口
创建springboot项目,添加如下依赖
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.olive</groupId> <artifactId>test-promethues</artifactId> <version>0.0.1-SNAPSHOT</version> <dependencies> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-web</artifactId> <version>3.2.0</version> </dependency> <dependency> <groupId>com.alibaba.fastjson2</groupId> <artifactId>fastjson2</artifactId> <version>2.0.49</version> </dependency> </dependencies> </project>
创建 controller
package com.olive; import java.time.LocalDateTime; import java.util.HashMap; import java.util.Map; import org.springframework.web.bind.annotation.PostMapping; import org.springframework.web.bind.annotation.RequestBody; import org.springframework.web.bind.annotation.RestController; import com.alibaba.fastjson2.JSON; @RestController public class RevcController { @PostMapping("/webhook/send") public Map<String, String> create(@RequestBody Map<String, Object> entity) { System.out.println(LocalDateTime.now()); System.out.println(JSON.toJSONString(entity)); Map<String, String> result = new HashMap<String, String>(); result.put("code", "success"); return result; } }
创建springboot引导类
package com.olive; import org.springframework.boot.SpringApplication; import org.springframework.boot.autoconfigure.SpringBootApplication; @SpringBootApplication public class App { public static void main(String[] args) { SpringApplication.run(App.class, args); } }
参考:
https://www.cnblogs.com/roy2220/p/14867024.html
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· DeepSeek 开源周回顾「GitHub 热点速览」
· 物流快递公司核心技术能力-地址解析分单基础技术分享
· .NET 10首个预览版发布:重大改进与新特性概览!
· AI与.NET技术实操系列(二):开始使用ML.NET
· 单线程的Redis速度为什么快?
2020-10-27 撬动offer:两个长字符串数字相加
2019-10-27 8、服务发现&服务消费者Feign