记录一次生产环境因磁盘空间不足驱逐pod造成pod重建The node had condition: [DiskPressure]
#记录一次生产报The node had condition: [DiskPressure]造成pod无限重启的监控不停的报警
#进入k8s的管理机检查发现msg的pod重启重建pod多次 [root@VM_248_6_centos ~]# kubectl get pod -n cms-v2-prod NAME READY STATUS RESTARTS AGE 省略...... cms-msg-deploy-6987c5cb8d-6fczf 1/1 Running 0 9d cms-msg-deploy-6987c5cb8d-867ls 1/1 Running 0 69m刚刚被重建的pod cms-msg-deploy-6987c5cb8d-btlxd 1/1 Running 0 8h cms-msg-deploy-6987c5cb8d-k96xk 1/1 Running 0 4h18m cms-msg-deploy-6987c5cb8d-pjkx2 1/1 Running 0 165m cms-msg-deploy-6987c5cb8d-r4hdd 1/1 Running 0 55m刚刚被重建的pod 省略。。。。。。。。。。。。。。。。。 #查看看k8s集群的events事件,可以看到pod发生重建都是在10.169.248.131和10.169.248.132的服务器 [root@VM_248_6_centos ~]# kubectl get event LAST SEEN TYPE REASON OBJECT MESSAGE <unknown> Normal Scheduled pod/cms-msg-deploy-6987c5cb8d-4nk44 Successfully assigned cms-v2-prod/cms-msg-deploy-6987c5cb8d-4nk44 to 10.169.248.132 54m Warning Evicted pod/cms-msg-deploy-6987c5cb8d-4nk44 The node had condition: [DiskPressure]. <unknown> Normal Scheduled pod/cms-msg-deploy-6987c5cb8d-85wgh Successfully assigned cms-v2-prod/cms-msg-deploy-6987c5cb8d-85wgh to 10.169.248.132 54m Warning Evicted pod/cms-msg-deploy-6987c5cb8d-85wgh The node had condition: [DiskPressure]. <unknown> Normal Scheduled pod/cms-msg-deploy-6987c5cb8d-8929w Successfully assigned cms-v2-prod/cms-msg-deploy-6987c5cb8d-8929w to 10.169.248.132 54m Warning Evicted pod/cms-msg-deploy-6987c5cb8d-8929w The node had condition: [DiskPressure]. <unknown> Normal Scheduled pod/cms-msg-deploy-6987c5cb8d-8tvtx Successfully assigned cms-v2-prod/cms-msg-deploy-6987c5cb8d-8tvtx to 10.169.248.132 54m Warning Evicted pod/cms-msg-deploy-6987c5cb8d-8tvtx The node had condition: [DiskPressure]. <unknown> Normal Scheduled pod/cms-msg-deploy-6987c5cb8d-92l6g Successfully assigned cms-v2-prod/cms-msg-deploy-6987c5cb8d-92l6g to 10.169.248.132 54m Warning Evicted pod/cms-msg-deploy-6987c5cb8d-92l6g The node had condition: [DiskPressure]. 54m Warning Evicted pod/cms-msg-deploy-6987c5cb8d-bcqrs The node was low on resource: ephemeral-storage. Container sidecar-jdk was using 38073452Ki, which exceeds its request of 0. 54m Normal Killing pod/cms-msg-deploy-6987c5cb8d-bcqrs Stopping container sidecar-jdk <unknown> Normal Scheduled pod/cms-msg-deploy-6987c5cb8d-lvktl Successfully assigned cms-v2-prod/cms-msg-deploy-6987c5cb8d-lvktl to 10.169.248.132 54m Warning Evicted pod/cms-msg-deploy-6987c5cb8d-lvktl The node had condition: [DiskPressure]. <unknown> Normal Scheduled pod/cms-msg-deploy-6987c5cb8d-m664z Successfully assigned cms-v2-prod/cms-msg-deploy-6987c5cb8d-m664z to 10.169.248.132 54m Warning Evicted pod/cms-msg-deploy-6987c5cb8d-m664z The node had condition: [DiskPressure]. <unknown> Normal Scheduled pod/cms-msg-deploy-6987c5cb8d-mfdw2 Successfully assigned cms-v2-prod/cms-msg-deploy-6987c5cb8d-mfdw2 to 10.169.248.132 54m Warning Evicted pod/cms-msg-deploy-6987c5cb8d-mfdw2 The node had condition: [DiskPressure]. <unknown> Normal Scheduled pod/cms-msg-deploy-6987c5cb8d-nd9rt Successfully assigned cms-v2-prod/cms-msg-deploy-6987c5cb8d-nd9rt to 10.169.248.132 54m Warning Evicted pod/cms-msg-deploy-6987c5cb8d-nd9rt The node had condition: [DiskPressure]. <unknown> Normal Scheduled pod/cms-msg-deploy-6987c5cb8d-ngvhw Successfully assigned cms-v2-prod/cms-msg-deploy-6987c5cb8d-ngvhw to 10.169.248.132 54m Warning Evicted pod/cms-msg-deploy-6987c5cb8d-ngvhw The node had condition: [DiskPressure]. <unknown> Normal Scheduled pod/cms-msg-deploy-6987c5cb8d-r4hdd Successfully assigned cms-v2-prod/cms-msg-deploy-6987c5cb8d-r4hdd to 10.169.248.131 54m Normal Pulled pod/cms-msg-deploy-6987c5cb8d-r4hdd Container image "ccr.yxyun.yuexiu.com/idc1-yxhq-ump-registry/cms-v2-msg:release-2.0.27.20231010" already present on machine 54m Normal Created pod/cms-msg-deploy-6987c5cb8d-r4hdd Created container cms-msg-container 54m Normal Started pod/cms-msg-deploy-6987c5cb8d-r4hdd Started container cms-msg-container 54m Normal Pulled pod/cms-msg-deploy-6987c5cb8d-r4hdd Container image "ccr.yxyun.yuexiu.com/idc1-yxhq-ump-registry/openjdk:8u232-stretch-yak-dubbo-cmsapm" already present on machine 54m Normal Created pod/cms-msg-deploy-6987c5cb8d-r4hdd Created container sidecar-jdk 54m Normal Started pod/cms-msg-deploy-6987c5cb8d-r4hdd Started container sidecar-jdk <unknown> Normal Scheduled pod/cms-msg-deploy-6987c5cb8d-w8xsq Successfully assigned cms-v2-prod/cms-msg-deploy-6987c5cb8d-w8xsq to 10.169.248.132 54m Warning Evicted pod/cms-msg-deploy-6987c5cb8d-w8xsq The node had condition: [DiskPressure]. 54m Normal SuccessfulCreate replicaset/cms-msg-deploy-6987c5cb8d Created pod: cms-msg-deploy-6987c5cb8d-m664z 54m Normal SuccessfulCreate replicaset/cms-msg-deploy-6987c5cb8d Created pod: cms-msg-deploy-6987c5cb8d-lvktl 54m Normal SuccessfulCreate replicaset/cms-msg-deploy-6987c5cb8d Created pod: cms-msg-deploy-6987c5cb8d-8tvtx 54m Normal SuccessfulCreate replicaset/cms-msg-deploy-6987c5cb8d Created pod: cms-msg-deploy-6987c5cb8d-mfdw2 54m Normal SuccessfulCreate replicaset/cms-msg-deploy-6987c5cb8d Created pod: cms-msg-deploy-6987c5cb8d-ngvhw 54m Normal SuccessfulCreate replicaset/cms-msg-deploy-6987c5cb8d Created pod: cms-msg-deploy-6987c5cb8d-4nk44 54m Normal SuccessfulCreate replicaset/cms-msg-deploy-6987c5cb8d Created pod: cms-msg-deploy-6987c5cb8d-w8xsq 54m Normal SuccessfulCreate replicaset/cms-msg-deploy-6987c5cb8d Created pod: cms-msg-deploy-6987c5cb8d-nd9rt 54m Normal SuccessfulCreate replicaset/cms-msg-deploy-6987c5cb8d Created pod: cms-msg-deploy-6987c5cb8d-85wgh 54m Normal SuccessfulCreate replicaset/cms-msg-deploy-6987c5cb8d (combined from similar events): Created pod: cms-msg-deploy-6987c5cb8d-r4hdd #在10.169.248.131和10.169.248.132的服务器检查,提示磁盘到达了百分之85,超过了kubelet配置磁盘触发驱逐pod剩余的空间量 [root@VM_248_131_centos ~]# grep '85%' /var/log/messages Nov 27 08:22:28 VM_248_131_centos kubelet: I1127 08:22:28.134844 935 image_gc_manager.go:300] [imageGCManager]: Disk usage on image filesystem is at 85% which is over the high threshold (85%). Trying to free 5028521574 bytes down to the low threshold (80%). [root@VM_248_131_centos ~]# grep '85%' /var/log/messages Nov 27 08:22:28 VM_248_131_centos kubelet: I1127 08:22:28.134844 935 image_gc_manager.go:300] [imageGCManager]: Disk usage on image filesystem is at 85% which is over the high threshold (85%). Trying to free 5028521574 bytes down to the low threshold (80%). [root@VM_248_132_centos ~]# grep '85%' /var/log/messages Nov 27 09:41:20 VM_248_132_centos kubelet: I1127 09:41:20.609047 940 image_gc_manager.go:300] [imageGCManager]: Disk usage on image filesystem is at 85% which is over the high threshold (85%). Trying to free 4976412262 bytes down to the low threshold (80%). Nov 27 15:26:21 VM_248_132_centos kubelet: I1127 15:26:21.135458 940 image_gc_manager.go:300] [imageGCManager]: Disk usage on image filesystem is at 85% which is over the high threshold (85%). Trying to free 4357973606 bytes down to the low threshold (80%). 查看kubelet的pod驱逐策略 [root@VM_248_131_centos ~]# cat /var/lib/kubelet/config.yaml evictionHard: imagefs.available: 15% memory.available: 100Mi nodefs.available: 10% nodefs.inodesFree: 5% [root@VM_248_132_centos ~]# cat /var/lib/kubelet/config.yaml evictionHard: imagefs.available: 15%#磁盘空间小于百分之15就会发生pod驱逐 memory.available: 100Mi nodefs.available: 10% nodefs.inodesFree: 5% #查看132的磁盘情况 [root@VM_248_132_centos ~]# df -h Filesystem Size Used Avail Use% Mounted on devtmpfs 32G 0 32G 0% /dev tmpfs 32G 24K 32G 1% /dev/shm tmpfs 32G 2.8M 32G 1% /run tmpfs 32G 0 32G 0% /sys/fs/cgroup /dev/vda1 50G 32G 16G 67% / /dev/vdb 99G 43G 52G 46% /var/lib/docker 10.169.248.85:/ 200G 22G 179G 11% /paas tmpfs 6.3G 0 6.3G 0% /run/user/0 #查看131的磁盘情况 [root@VM_248_131_centos ~]# df -h Filesystem Size Used Avail Use% Mounted on devtmpfs 32G 0 32G 0% /dev tmpfs 32G 24K 32G 1% /dev/shm tmpfs 32G 2.6M 32G 1% /run tmpfs 32G 0 32G 0% /sys/fs/cgroup /dev/vda1 50G 15G 33G 30% / /dev/vdb 99G 71G 24G 76% /var/lib/docker 10.169.248.85:/ 200G 22G 179G 11% /paas tmpfs 6.3G 0 6.3G 0% /run/user/0
解决方法扩容/var/lib/docker分区的磁盘空间,调整msg的pod产生的日志清理脚本