k8s错误集合
1、etcd没有启动的
[root@mcwk8s03 ~]# kubectl get nodes
Unable to connect to the server: context deadline exceeded
启动之后就可以使用了
[root@mcwk8s03 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
mcwk8s05 NotReady <none> 404d v1.15.12
mcwk8s06 NotReady <none> 404d v1.15.12
[root@mcwk8s03 ~]#
2、pod一直重启。首先排查oom,上次重启原因
Containers: fab-browser-api: Container ID: docker://9xx Image: roc.xx79 Image ID: docxxd Port: 9090/TCP Host Port: 0/TCP Command: java -Xms1G -Xmx2G -XX:MetaspaceSize=64M -XX:MaxMetaspaceSize=128M -Xss256K -XX:+UseConcMarkSweepGC -XX:CMSFullGCsBeforeCompaction=5 -XX:+UseCMSCompactAtFullCollection -XX:CMSInitiatingOccupancyFraction=80 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=./logs/ -DserverName=fxx-api -jar /app.jar State: Running Started: Mon, 18 Dec 2023 10:12:11 +0800 Last State: Terminated Reason: OOMKilled Exit Code: 137 Started: Mon, 18 Dec 2023 04:09:38 +0800 Finished: Mon, 18 Dec 2023 10:12:09 +0800 Ready: True Restart Count: 36 Limits: cpu: 1 memory: 2000Mi Requests: cpu: 300m memory: 800Mi Liveness: http-get http://:9090/argus/health delay=120s timeout=1s period=20s #success=1 #failure=6 Readiness: http-get http://:9090/argus/health delay=120s timeout=1s period=20s #success=1 #failure=6
3、k8s排查问题思路
https://zhuanlan.zhihu.com/p/421693641
https://zhuanlan.zhihu.com/p/651299187
https://blog.csdn.net/weixin_45727359/article/details/128024686
4、有些服务启动慢,默认的健康检查时间需要调整长一些,等服务启动之后再健康检查
现象:发布应用,新起的pod,一直没有启动起来,反复重启。
kubectl describe pod ,查看上次报错,并且事件里提示健康检查没有通过
State: Running Started: Tue, 09 Jan 2024 10:32:56 +0800 Last State: Terminated Reason: Error Exit Code: 143 Started: Tue, 09 Jan 2024 10:29:03 +0800 Finished: Tue, 09 Jan 2024 10:32:55 +0800 Ready: True Restart Count: 1
Warning Unhealthy 6m30s (x22 over 10m) kubelet, qa-kube003.xx.x.com Readiness probe failed:
Get http://10.96.x.x:9090/argus/health: dial tcp 10.x.x.x:9090: connect: connection refused
解决方法:容器的健康检查失败,容器多次重启,偶尔才起来。这次这个是服务启动需要3分钟多,而健康检查200s的时候就开始了。服务还没正常启动起来就检测,导致探测失败。将探测时间延长到250s之后,再次发布,就没有发生重启的现象了,直接就好了
kubectl edit deploy ai-xxl-deploy
复制配置,然后在服务发布里面添加上配置,并且将时间改为250s,发布应用
5、pod不存在,deploy0/1问题排查
deployment.spec.template.spec。下面设置了服务账号配置,但是集群里并没有配置,这样会无法创建pod
serviceAccount: growth-config
serviceAccountName: growth-config
之前无法创建pod,后面删除这两行之后就可以了
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedCreate 5m18s (x22 over 77m) replicaset-controller Error creating: pods "growth-config-deploy-c9f6d5784-" is forbidden:
error looking up service account prodns/growth-config: serviceaccount "growth-config" not found [root@vm-qa-kube005.mcw.com mcw]# kubectl get deploy growth-config-deploy -n prodns NAME READY UP-TO-DATE AVAILABLE AGE growth-config-deploy 0/1 1 0 67s [root@vm-qa-kube005.mcw.com mcw]# kubectl get deploy growth-config-deploy -n prodns NAME READY UP-TO-DATE AVAILABLE AGE growth-config-deploy 0/1 1 0 69s [root@vm-qa-kube005.mcw.com mcw]# kubectl get rs -n prodns|grep grow growth-config-deploy-5fbbdb7b6b 1 1 0 98s [root@vm-qa-kube005.mcw.com mcw]# [root@vm-qa-kube005.mcw.com mcw]# kubectl describe rs growth-config-deploy-5fbbdb7b6b -n prodns vents: Type Reason Age From Message ---- ------ ---- ---- ------- Normal SuccessfulCreate 2m12s replicaset-controller Created pod: growth-config-deploy-5fbbdb7b6b-2d2ws [root@vm-qa-kube005.mcw.com mcw]# kubectl get pod --all-namespaces|grep growth prodns growth-config-deploy-5fbbdb7b6b-2d2ws 2/2 Running 0 2m28s [root@vm-qa-kube005.mcw.com mcw]#
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 无需6万激活码!GitHub神秘组织3小时极速复刻Manus,手把手教你使用OpenManus搭建本
· C#/.NET/.NET Core优秀项目和框架2025年2月简报
· 什么是nginx的强缓存和协商缓存
· 一文读懂知识蒸馏
· Manus爆火,是硬核还是营销?
2022-12-12 flask-wtf和WTForms官网翻译详解