zabbix6监控k8s指标说明

kube.pod.status.phase获取不准确,经测试当pod状态为Crashloopbackoff时,数据仍为running(2),所以不能使用Kubernetes_test nodes by HTTP模板中的自动发现Node discovery中的Namesace [{#NAMESPACE}] Pod [{#POD}] Status: Phase 监控项原型。

在zabbix中模板只有告警规则没有恢复规则,所以需要自己设置恢复规则。

一.deploy中的指标

1.1 Deployment 副本数未达预期告警

min(/Kubernetes_test cluster state by HTTP/kube.deployment.replicas_mismatched[{#NAMESPACE}/{#NAME}],{$KUBE.REPLICA.MISMATCH.EVAL_PERIOD:"deployment:{#NAMESPACE}:{#NAME}"})>0
and last(/Kubernetes_test cluster state by HTTP/kube.deployment.replicas_desired[{#NAMESPACE}/{#NAME}])>=0
and last(/Kubernetes_test cluster state by HTTP/kube.deployment.replicas_available[{#NAMESPACE}/{#NAME}])>=0

说明:

1)min(/Kubernetes_test cluster state by HTTP/kube.deployment.replicas_mismatched[{#NAMESPACE}/{#NAME}],{$KUBE.REPLICA.MISMATCH.EVAL_PERIOD:"deployment:{#NAMESPACE}:{#NAME}"})>0

kube.deployment.replicas_mismatched为deployment副本数量不一致的数量,{$KUBE.REPLICA.MISMATCH.EVAL_PERIOD}为模板中的设置的宏设置为#5即5个监控周期,server默认的监控周期是30s,在其主要项Kubernetes: Get state metrics中设置的监控周期是1m,覆盖掉默认的20s监控,所以5个监控周期为5分钟。

在宏中可以通过配置{$KUBE.REPLICA.MISMATCH.EVAL_PERIOD}来配置不同的告警检测时间,如设置所有的deployment告警检测时间为5分钟{$KUBE.REPLICA.MISMATCH.EVAL_PERIOD:regex:"deployment:.*:.*"} = #5,设置default中deployment名为nginx的告警检测时间为为3分钟{$KUBE.REPLICA.MISMATCH.EVAL_PERIOD:"deployment:default:nginx"} = #3。

所以第一句即为5分钟之内最小副本不匹配数为大于0。

2)last(/Kubernetes_test cluster state by HTTP/kube.deployment.replicas_desired[{#NAMESPACE}/{#NAME}])>=0

kube.deployment.replicas_desired为deployment所需副本数,大于等于0
3)last(/Kubernetes_test cluster state by HTTP/kube.deployment.replicas_available[{#NAMESPACE}/{#NAME}])>=0

kube.deployment.replicas_available为deployment可用副本,大于等于0

1.2 Deployment 副本数未达预期恢复

max(/Kubernetes_test cluster state by HTTP/kube.deployment.replicas_mismatched[{#NAMESPACE}/{#NAME}],{$KUBE.REPLICA.MISMATCH.EVAL_PERIOD:"deployment:{#NAMESPACE}:{#NAME}"})=0

and last(/Kubernetes_test cluster state by HTTP/kube.deployment.replicas_desired[{#NAMESPACE}/{#NAME}])>=0

and last(/Kubernetes_test cluster state by HTTP/kube.deployment.replicas_available[{#NAMESPACE}/{#NAME}])>=0

说明

1)max(/Kubernetes_test cluster state by HTTP/kube.deployment.replicas_mismatched[{#NAMESPACE}/{#NAME}],{$KUBE.REPLICA.MISMATCH.EVAL_PERIOD:"deployment:{#NAMESPACE}:{#NAME}"})=0

5分钟内最大deployment副本数量不一致的数量为0

2)kube.deployment.replicas_desired为deployment所需副本数,大于等于0

3)kube.deployment.replicas_available为deployment可用副本,大于等于0

 

二.pod中的指标

1.告警指标:Pod 不健康

10分钟内最小pod失败状态大于0,或者10分钟内最小pod pending状态大于0,或者10分钟内最小pod未知状态大于0.。总结为10分钟内最小非正常状态大于0

min(/Kubernetes cluster state by HTTP/kube.pod.phase.failed[{#NAMESPACE}/{#NAME}],10m)>0 or min(/Kubernetes cluster state by HTTP/kube.pod.phase.pending[{#NAMESPACE}/{#NAME}],10m)>0 or min(/Kubernetes cluster state by HTTP/kube.pod.phase.unknown[{#NAMESPACE}/{#NAME}],10m)>0

恢复指标

5分钟内最pod的最小running或者成功的状态大于等于1

min(/Kubernetes cluster state by HTTP/kube.pod.phase.running[{#NAMESPACE}/{#NAME}],5m)>=1 or min(/Kubernetes cluster state by HTTP/kube.pod.phase.succeeded[{#NAMESPACE}/{#NAME}],5m)>=1

2.告警指标:pod崩溃循环

15分钟之内最新的重启次数减去最小的重启次数大于1

(last(/Kubernetes cluster state by HTTP/kube.pod.containers_restarts[{#NAMESPACE}/{#NAME}])-min(/Kubernetes cluster state by HTTP/kube.pod.containers_restarts[{#NAMESPACE}/{#NAME}],15m))>1

恢复告警

15分钟之内最新的重启次数减去最小的重启次数等于0

(last(/Kubernetes cluster state by HTTP/kube.pod.containers_restarts[{#NAMESPACE}/{#NAME}])-min(/Kubernetes cluster state by HTTP/kube.pod.containers_restarts[{#NAMESPACE}/{#NAME}],15m))=0

 三.StatefulSet中的指标

告警指标:StatefulSet down机

最近的ready副本数/最近当前副本数不为1

(last(/Kubernetes cluster state by HTTP/kube.statefulset.replicas_ready[{#NAMESPACE}/{#NAME}]) / last(/Kubernetes cluster state by HTTP/kube.statefulset.replicas_current[{#NAMESPACE}/{#NAME}]))<>1

恢复告警

最近的ready副本数/最近当前副本数为1

(last(/Kubernetes cluster state by HTTP/kube.statefulset.replicas_ready[{#NAMESPACE}/{#NAME}]) / last(/Kubernetes cluster state by HTTP/kube.statefulset.replicas_current[{#NAMESPACE}/{#NAME}]))=1

posted @   潇潇暮鱼鱼  阅读(231)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· DeepSeek “源神”启动!「GitHub 热点速览」
· 微软正式发布.NET 10 Preview 1:开启下一代开发框架新篇章
· 我与微信审核的“相爱相杀”看个人小程序副业
· C# 集成 DeepSeek 模型实现 AI 私有化(本地部署与 API 调用教程)
· DeepSeek R1 简明指南:架构、训练、本地部署及硬件要求
点击右上角即可分享
微信分享提示