kube-prometheus-stack 自定义 alertmanager 配置推送webhook

创建AlertmanagerConfig资源

在没有使用 prometheus-operator 的情况下,需要手动配置 alertmanager.yaml 来路由&发送从 prometheus 接收的警报。

使用 prometheus-operator 之后,事情变得简单一些。只需要创建 AlertmanagerConfig 资源,prometheus-operator 会自动 merge 所有的 AlertmanagerConfig 资源生成/更新 alertmanager.yaml,并通知 alertmanager 重载配置。

默认情况下,prometheus-operator会关注所有namespace下的所有AlertmanagerConfig:

kubectl get -n kube-prom alertmanagers
kubectl get -n kube-prom alertmanagers/kube-promethues-stack-kube-alertmanager -o yaml
# spec.alertmanagerConfigNamespaceSelector: {},表示不作筛选
# spec.alertmanagerConfigSelector: {},表示不作筛选

创建一个简单警报路由规则

apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
name: testwebhook
namespace: kube-prom
spec:
route:
receiver: webhook
groupBy: ["instance", "job"]
groupWait: "10s"
groupInterval: "20s"
repeatInterval: "30s"
receivers:
- name: webhook
webhookConfigs:
- url: "http://10.0.2.11:8080/webhook/send"
sendResolved: true
inhibitRules:
- sourceMatch:
- name: severity
value: 'critical'
targetMatch:
- name: severity
value: 'warning'
equal: ['instance']

参考:

https://github.com/prometheus-community/helm-charts/issues/2224
https://kkgithub.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#alertmanagerconfig
kubectl apply -f alertmanager-config.yaml
kubectl edit svc kube-promethues-stack-kube-alertmanager -n kube-prom
kubectl get svc kube-promethues-stack-kube-alertmanager -n kube-prom

创建资源后,打开alertmanager管理后台 http://10.0.2.12:32466/#/status 页面,确认 Config 已经包含相关的配置信息(可能需要稍等一会)。

AlertmanagerConfig 资源详情参考:https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/api.md#alertmanagerconfig

创建 PrometheusRule 资源

类似 AlertmanagerConfig,可以通过创建 PrometheusRule 资源来创建警报规则(rule),prometheus-operator 会自动把所有 rule 配置 merge 到 prometheus.yml。

默认情况下,prometheus-operator 会关注所有 namespace 下匹配 label release=kube-prometheus-stack 的 PrometheusRule :

kubectl get -n kube-prom prometheuses
kubectl get -n kube-prom prometheuses/kube-promethues-stack-kube-prometheus -o yaml
# spec.ruleNamespaceSelector: {},表示不作筛选
# spec.ruleSelector:
# matchLabels:
# release: kube-prometheus-stack

创建一个能立即触发报警的规则:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
labels:
prometheus: k8s
ole: alert-rules
name: kube-prom-kube-prom-stack-kube-prome-prometheus.rules
namespace: kube-prom
spec:
groups:
- name: disk
rules:
- alert: diskFree
annotations:
value: "{{$value}}"
summary: "{{ $labels.job }} 项目实例 {{ $labels.instance }} 磁盘使用率大于 80%"
description: "{{ $labels.instance }} {{ $labels.mountpoint }} 磁盘使用率大于80% (当前的值: {{ $value }}%),请及时处理"
expr: |
(1-(node_filesystem_free_bytes{fstype=~"ext4|xfs",mountpoint!="/boot"} / node_filesystem_size_bytes{fstype=~"ext4|xfs",mountpoint!="/boot"}) )*100 > 80
for: 1m
labels:
severity: warning
kubectl apply -f prometheus-rule.yaml

注意:labels 的severity: warning和前面创建 AlertmanagerConfig 的 inhibitRules 配置匹配,为什么需要namespace: kube-prom?prometheus-operator 会在 AlertmanagerConfig 的 matchers 强制加上这个标签,issue 讨论:https://github.com/prometheus-operator/prometheus-operator/issues/3737

kubectl edit svc kube-promethues-stack-kube-prometheus -n kube-prom
kubectl get svc kube-promethues-stack-kube-prometheus -n kube-prom

创建资源后,打开prometheus管理后台 http://10.0.2.12:30133/rules页面,搜索diskFree确认能找到新添加的规则(可能需要稍等一会)。

PrometheusRule 资源详情参考:https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/api.md#prometheusrule

编写 /webhook/send 接口

创建springboot项目,添加如下依赖

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.olive</groupId>
<artifactId>test-promethues</artifactId>
<version>0.0.1-SNAPSHOT</version>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
<version>3.2.0</version>
</dependency>
<dependency>
<groupId>com.alibaba.fastjson2</groupId>
<artifactId>fastjson2</artifactId>
<version>2.0.49</version>
</dependency>
</dependencies>
</project>

创建 controller

package com.olive;
import java.time.LocalDateTime;
import java.util.HashMap;
import java.util.Map;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RestController;
import com.alibaba.fastjson2.JSON;
@RestController
public class RevcController {
@PostMapping("/webhook/send")
public Map<String, String> create(@RequestBody Map<String, Object> entity) {
System.out.println(LocalDateTime.now());
System.out.println(JSON.toJSONString(entity));
Map<String, String> result = new HashMap<String, String>();
result.put("code", "success");
return result;
}
}

创建springboot引导类

package com.olive;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
@SpringBootApplication
public class App {
public static void main(String[] args) {
SpringApplication.run(App.class, args);
}
}

参考:

https://www.cnblogs.com/roy2220/p/14867024.html
posted @   BUG弄潮儿  阅读(111)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· DeepSeek 开源周回顾「GitHub 热点速览」
· 物流快递公司核心技术能力-地址解析分单基础技术分享
· .NET 10首个预览版发布:重大改进与新特性概览!
· AI与.NET技术实操系列(二):开始使用ML.NET
· 单线程的Redis速度为什么快?
历史上的今天:
2020-10-27 撬动offer:两个长字符串数字相加
2019-10-27 8、服务发现&服务消费者Feign
点击右上角即可分享
微信分享提示