返回总目录页

k8s错误集合

 


 

1、etcd没有启动的

 

[root@mcwk8s03 ~]# kubectl get nodes
Unable to connect to the server: context deadline exceeded

 

启动之后就可以使用了

 

[root@mcwk8s03 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
mcwk8s05 NotReady <none> 404d v1.15.12
mcwk8s06 NotReady <none> 404d v1.15.12
[root@mcwk8s03 ~]#

 

2、pod一直重启。首先排查oom,上次重启原因

复制代码
Containers:
  fab-browser-api:
    Container ID:  docker://9xx
    Image:         roc.xx79
    Image ID:      docxxd
    Port:          9090/TCP
    Host Port:     0/TCP
    Command:
      java
      -Xms1G
      -Xmx2G
      -XX:MetaspaceSize=64M
      -XX:MaxMetaspaceSize=128M
      -Xss256K
      -XX:+UseConcMarkSweepGC
      -XX:CMSFullGCsBeforeCompaction=5
      -XX:+UseCMSCompactAtFullCollection
      -XX:CMSInitiatingOccupancyFraction=80
      -XX:+HeapDumpOnOutOfMemoryError
      -XX:HeapDumpPath=./logs/
      -DserverName=fxx-api
      -jar
      /app.jar
    State:          Running
      Started:      Mon, 18 Dec 2023 10:12:11 +0800
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Mon, 18 Dec 2023 04:09:38 +0800
      Finished:     Mon, 18 Dec 2023 10:12:09 +0800
    Ready:          True
    Restart Count:  36
    Limits:
      cpu:     1
      memory:  2000Mi
    Requests:
      cpu:      300m
      memory:   800Mi
    Liveness:   http-get http://:9090/argus/health delay=120s timeout=1s period=20s #success=1 #failure=6
    Readiness:  http-get http://:9090/argus/health delay=120s timeout=1s period=20s #success=1 #failure=6
复制代码

 

3、k8s排查问题思路

 

https://zhuanlan.zhihu.com/p/421693641

https://zhuanlan.zhihu.com/p/651299187

https://blog.csdn.net/weixin_45727359/article/details/128024686

 

4、有些服务启动慢,默认的健康检查时间需要调整长一些,等服务启动之后再健康检查

 

现象:发布应用,新起的pod,一直没有启动起来,反复重启。

 

 kubectl describe pod ,查看上次报错,并且事件里提示健康检查没有通过

复制代码
    State:          Running
      Started:      Tue, 09 Jan 2024 10:32:56 +0800
    Last State:     Terminated
      Reason:       Error
      Exit Code:    143
      Started:      Tue, 09 Jan 2024 10:29:03 +0800
      Finished:     Tue, 09 Jan 2024 10:32:55 +0800
    Ready:          True
    Restart Count:  1

Warning  Unhealthy  6m30s (x22 over 10m)  kubelet, qa-kube003.xx.x.com  Readiness probe failed:
Get http://10.96.x.x:9090/argus/health: dial tcp 10.x.x.x:9090: connect: connection refused
复制代码

解决方法:容器的健康检查失败,容器多次重启,偶尔才起来。这次这个是服务启动需要3分钟多,而健康检查200s的时候就开始了。服务还没正常启动起来就检测,导致探测失败。将探测时间延长到250s之后,再次发布,就没有发生重启的现象了,直接就好了

 

 kubectl edit deploy ai-xxl-deploy

复制配置,然后在服务发布里面添加上配置,并且将时间改为250s,发布应用

 

5、pod不存在,deploy0/1问题排查

 

deployment.spec.template.spec。下面设置了服务账号配置,但是集群里并没有配置,这样会无法创建pod

serviceAccount: growth-config
serviceAccountName: growth-config

 

 

之前无法创建pod,后面删除这两行之后就可以了

复制代码
Events:
  Type     Reason        Age                   From                   Message
  ----     ------        ----                  ----                   -------
  Warning  FailedCreate  5m18s (x22 over 77m)  replicaset-controller  Error creating: pods "growth-config-deploy-c9f6d5784-" is forbidden:
error looking up service account prodns/growth-config: serviceaccount "growth-config" not found [root@vm-qa-kube005.mcw.com mcw]# kubectl get deploy growth-config-deploy -n prodns NAME READY UP-TO-DATE AVAILABLE AGE growth-config-deploy 0/1 1 0 67s [root@vm-qa-kube005.mcw.com mcw]# kubectl get deploy growth-config-deploy -n prodns NAME READY UP-TO-DATE AVAILABLE AGE growth-config-deploy 0/1 1 0 69s [root@vm-qa-kube005.mcw.com mcw]# kubectl get rs -n prodns|grep grow growth-config-deploy-5fbbdb7b6b 1 1 0 98s [root@vm-qa-kube005.mcw.com mcw]# [root@vm-qa-kube005.mcw.com mcw]# kubectl describe rs growth-config-deploy-5fbbdb7b6b -n prodns vents: Type Reason Age From Message ---- ------ ---- ---- ------- Normal SuccessfulCreate 2m12s replicaset-controller Created pod: growth-config-deploy-5fbbdb7b6b-2d2ws [root@vm-qa-kube005.mcw.com mcw]# kubectl get pod --all-namespaces|grep growth prodns growth-config-deploy-5fbbdb7b6b-2d2ws 2/2 Running 0 2m28s [root@vm-qa-kube005.mcw.com mcw]#
复制代码

 

posted @   马昌伟  阅读(130)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· 无需6万激活码!GitHub神秘组织3小时极速复刻Manus,手把手教你使用OpenManus搭建本
· C#/.NET/.NET Core优秀项目和框架2025年2月简报
· 什么是nginx的强缓存和协商缓存
· 一文读懂知识蒸馏
· Manus爆火,是硬核还是营销?
历史上的今天:
2022-12-12 flask-wtf和WTForms官网翻译详解
博主链接地址:https://www.cnblogs.com/machangwei-8/
点击右上角即可分享
微信分享提示

目录导航