节点基于资源压力要驱逐pod时,pod的状态是什么?
当pod的所在的主机出现资源压力的时候,比如我们模拟了一个磁盘使用率超过90%的场景
在pod正常运行时,pod的状态是Running
[root@nccztsjb-node-23 ~]# kubectl get pod
NAME READY STATUS RESTARTS AGE
nginx-ds-lw4vj 1/1 Running 0 6d19h
nginx-ds-nrf4t 1/1 Running 0 6d19h
nginx-ds-ql4s8 1/1 Running 0 6d19h
nginx-test-56745657-6wj5n 1/1 Running 0 6d19h
nginx-test-56745657-jx2kp 1/1 Running 0 6d19h
nginx-test-56745657-m6hm4 1/1 Running 0 6d19h
nginx-test-56745657-mhjsh 1/1 Running 0 6d19h
nginx-test-56745657-pqhqp 1/1 Running 0 6d19h
[root@nccztsjb-node-23 ~]# kubectl get pod nginx-test-56745657-6wj5n -o yaml | grep phase
phase: Running
即phase为: Running
然后,通过fallocate模拟一个190G的大文件
fallocate -l 190G bigfile
磁盘空间使用率涨到96%
[root@nccztsjb-node-24 data]# df -h /data
Filesystem Size Used Avail Use% Mounted on
/dev/vdb 200G 191G 9.2G 96% /data
[root@nccztsjb-node-24 data]#
pod被驱逐,查看pod的状态或者说阶段
[root@nccztsjb-node-23 ~]# kubectl get pod nginx-test-56745657-6wj5n -o yaml | grep -i phase
phase: Failed
[root@nccztsjb-node-23 ~]#
已经变为:Failed
查看pod的描述信息
[root@nccztsjb-node-23 ~]# kubectl describe pod nginx-test-56745657-6wj5n
Name: nginx-test-56745657-6wj5n
Namespace: default
Priority: 0
Node: nccztsjb-node-24/172.20.58.65
Start Time: Thu, 17 Mar 2022 15:13:17 +0800
Labels: app=nginx-test
pod-template-hash=56745657
Annotations: cni.projectcalico.org/containerID: cbd9967186479712f1e7c27112fc9b9a31e5628d21e2ec7e96c2a4c8a8a956ea
cni.projectcalico.org/podIP:
cni.projectcalico.org/podIPs:
Status: Failed
Reason: Evicted
Message: The node was low on resource: ephemeral-storage. Container nginx was using 24Ki, which exceeds its request of 0.
IP:
IPs: <none>
Controlled By: ReplicaSet/nginx-test-56745657
Containers:
nginx:
Container ID:
Image: 172.20.58.152/middleware/nginx:1.21.4
Image ID:
Port: <none>
Host Port: <none>
State: Terminated
Reason: ContainerStatusUnknown
Message: The container could not be located when the pod was terminated
Exit Code: 137
Started: Mon, 01 Jan 0001 00:00:00 +0000
Finished: Mon, 01 Jan 0001 00:00:00 +0000
Last State: Terminated
Reason: ContainerStatusUnknown
Message: The container could not be located when the pod was deleted. The container used to be Running
Exit Code: 137
Started: Mon, 01 Jan 0001 00:00:00 +0000
Finished: Mon, 01 Jan 0001 00:00:00 +0000
Ready: False
Restart Count: 1
Limits:
cpu: 500m
memory: 200Mi
Requests:
cpu: 500m
memory: 200Mi
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-cmp26 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-cmp26:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Guaranteed
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Evicted 82s kubelet The node was low on resource: ephemeral-storage. Container nginx was using 24Ki, which exceeds its request of 0.
Normal Killing 82s kubelet Stopping container nginx
可以看出,由于节点存在临时存储压力,所以,kubelet将停止nginx容器。
简单来说,磁盘压力是kubelet发出来的,停止的操作也是有kubelet发起的。