k8s容器中通过Prometheus Operator部署Kafka Exporter监控Kafka集群
写在前面
在按照下面步骤操作之前,请先确保服务器已经部署k8s,prometheus,prometheus operator以及kafka集群,关于这些环境的部署,可以自行查找相关资料安装部署,本文档便不在此赘述。
关于prometheus监控这部分,大致的系统架构图如下,感兴趣的同学可以自行研究一下,这里就不再具体说明。
1、Deployment(工作负载)以及Service(服务)部署
配置yaml可参考如下:
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: kafka-exporter
namespace: prometheus-exporter
labels:
app: kafka-exporter
spec:
replicas: 1
selector:
matchLabels:
app: kafka-exporter
template:
metadata:
labels:
app: kafka-exporter
spec:
containers:
- name: kafka-exporter
image: danielqsj/kafka-exporter:v1.6.0
imagePullPolicy: IfNotPresent
args: ["--kafka.server=kafkaCluster.monitorsoftware:9092"]
ports:
- containerPort: 9308
name: http
---
apiVersion: v1
kind: Service
metadata:
labels:
app: kafka-exporter
name: kafka-exporter
namespace: prometheus-exporter
spec:
type: ClusterIP
ports:
- name: http
port: 9308
protocol: TCP
targetPort: 9308
selector:
app: kafka-exporter
说明:
1> 关于kafka exporter中指标参数的含义可参看官网说明,地址如下:https://github.com/danielqsj/kafka_exporter
2> 关于kafka exporter 镜像版本可以根据需要选择对应的版本,镜像仓库地址如下:https://hub.docker.com/r/danielqsj/kafka-exporter/tags
3> 部署成功图如下:
(1)Deployment(工作负载)
(2)Service(服务)
2、创建ServiceMonitor配置文件
yaml配置文件如下:
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
app: kafka-exporter
name: prometheus-kafka-exporter
namespace: prometheus-exporter
spec:
endpoints:
- honorLabels: true
interval: 1m
path: /metrics
port: http
scheme: http
params:
target:
- 'kafkaCluster.monitorsoftware:9092'
relabelings:
- sourceLabels: [__param_target]
targetLabel: instance
namespaceSelector:
matchNames:
- prometheus-exporter
selector:
matchLabels:
app: kafka-exporter
说明:
1> prometheus operator是通过ServiceMonitor发现监控目标,并对其进行监控。serviceMonitor 是对service 获取数据的一种方式。
- promethus-operator可以通过serviceMonitor 自动识别带有某些 label 的service ,并从这些service 获取数据。
- serviceMonitor 也是由promethus-operator 自动发现的。
2> prometheus监控过程如下:
3> 部署成功图如下
(1)serviceMonitor部署
(2)Prometheus部署成功图
3、Prometheus告警规则配置
prometheus rule规则配置:
---
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
labels:
prometheus: k8s
role: alert-rules
name: kafka-exporter-rules
namespace: prometheus-exporter
spec:
groups:
- name: kafka-exporter
rules:
- alert: KafkaConsumersGroup延迟
expr: sum(kafka_consumergroup_lag) by (consumergroup,namespace,instance) > 1000
for: 1m
labels:
severity: critical
annotations:
summary: Kafka consumers group 延迟, (consumergroup {{ $labels.consumergroup }} in instance {{ $labels.instance }})
description: "Kafka consumers group\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
- alert: kafka集群节点减少
expr: kafka_brokers < 3
for: 3m
labels:
severity: critical
annotations:
summary: "kafka集群部分节点已停止,请尽快处理!(instance {{ $labels.instance }})"
description: "{{$labels.instance}} kafka集群节点减少"
说明:
1> prometheusRule规则配置,可以参考模板配置,模板网址如下:https://awesome-prometheus-alerts.grep.to/rules#kafka
2> 部署成功图如下:
4、Grafana部署图
4.1、grafana dashboard地址如下:https://grafana.com/grafana/dashboards
官方推荐模板ID为:7589
4.2、dashboard效果图如下