九、k8s入门系列---- Taints 、Tolerations
这节讲跟POD调度相关的另外2个概念:Taints (污点)、 Tolerations(容忍)
Taints
NodeAffinity节点亲和性是pod上定义的一种属性,让pod能够被调度到某些node上运行,Taint(污点)则刚好相反,它让Node拒绝Pod运行,taint也是针对node。
给节点设置污点的命令如下,其中 key/value 作用是使用Tolerations时作为匹配的标签存在:
kubectl taint node [node] key=value[effect]
其中 effect 有下列可取值:
- NoSchedule
如果一个POD没有声明容忍这个taint,则系统不会把该 POD 调度到这个Taint的node上
- PreferNoSchedule
NoSchedule的软限制版本,如果一个Pod 没有声明容忍这个Taint , 则系统会尽量避免把这个pod调度到这一节点上,但不是强制的
- NoExecute
定义Pod的驱逐行为,以应对节点故障。其对节点上正在运行的pod有以下影响:
-
- 没有设置Toleration的pod会被立刻驱逐
- 配置了对应的Toleration的pod,如果没有为TolerationSeconds赋值,则为一直留在这一节点上,配置的话,则会在指定时间后驱逐
实验一下,先查看node 上运行的 pod:
[root@ylserver10686071 ~]# kubectl get pods -n prod -o wide|grep ylserver10686073 affinity002-59b9b4cfcd-8ph9d 1/1 Running 0 147m 10.233.72.51 ylserver10686073 <none> <none> busybox-bbf7c9c98-2nph4 1/1 Running 0 5d5h 10.233.72.44 ylserver10686073 <none> <none> stateapp-0 1/1 Running 0 6d1h 10.233.72.43 ylserver10686073 <none> <none> web001-69bd6f8c5f-nvgmj 1/1 Running 0 24h 10.233.72.47 ylserver10686073 <none> <none> web002-79c6bc455-lsx6z 1/1 Running 0 3d22h 10.233.72.46 ylserver10686073 <none> <none> [root@ylserver10686071 ~]#
配置下污点,effect为 PreferNoSchedule,然后查看pod运行情况:
[root@ylserver10686071 ~]# kubectl taint node ylserver10686073 database=mysql:PreferNoSchedule node/ylserver10686073 tainted [root@ylserver10686071 ~]# kubectl get pods -n prod -o wide|grep ylserver10686073 affinity002-59b9b4cfcd-8ph9d 1/1 Running 0 150m 10.233.72.51 ylserver10686073 <none> <none> busybox-bbf7c9c98-2nph4 1/1 Running 0 5d5h 10.233.72.44 ylserver10686073 <none> <none> stateapp-0 1/1 Running 0 6d1h 10.233.72.43 ylserver10686073 <none> <none> web001-69bd6f8c5f-nvgmj 1/1 Running 0 24h 10.233.72.47 ylserver10686073 <none> <none> web002-79c6bc455-lsx6z 1/1 Running 0 3d23h 10.233.72.46 ylserver10686073 <none> <none> [root@ylserver10686071 ~]#
配置下污点 effect 为 NoSchedule ,看pod是否会被驱逐:
[root@ylserver10686071 ~]# kubectl taint node ylserver10686073 database=mysql:NoSchedule node/ylserver10686073 tainted [root@ylserver10686071 ~]# kubectl get pods -n prod -o wide|grep ylserver10686073 affinity002-59b9b4cfcd-8ph9d 1/1 Running 0 152m 10.233.72.51 ylserver10686073 <none> <none> busybox-bbf7c9c98-2nph4 1/1 Running 0 5d5h 10.233.72.44 ylserver10686073 <none> <none> stateapp-0 1/1 Running 0 6d1h 10.233.72.43 ylserver10686073 <none> <none> web001-69bd6f8c5f-nvgmj 1/1 Running 0 24h 10.233.72.47 ylserver10686073 <none> <none> web002-79c6bc455-lsx6z 1/1 Running 0 3d23h 10.233.72.46 ylserver10686073 <none> <none> [root@ylserver10686071 ~]#
可以看到effect 为 NoSchedule的时候,node上的pod还是会正常运行,不会被驱逐。
配置下污点 effect 为 NoExecute ,可以看到 POD 正在被驱逐中:
[root@ylserver10686071 ~]# kubectl taint node ylserver10686073 database=mysql:NoExecute node/ylserver10686073 tainted [root@ylserver10686071 ~]# kubectl get pods -n prod -o wide|grep ylserver10686073 affinity002-59b9b4cfcd-8ph9d 1/1 Terminating 0 154m 10.233.72.51 ylserver10686073 <none> <none> busybox-bbf7c9c98-2nph4 1/1 Terminating 0 5d5h 10.233.72.44 ylserver10686073 <none> <none> stateapp-0 1/1 Terminating 0 6d1h 10.233.72.43 ylserver10686073 <none> <none> web001-69bd6f8c5f-nvgmj 1/1 Terminating 0 24h 10.233.72.47 ylserver10686073 <none> <none> web002-79c6bc455-lsx6z 1/1 Terminating 0 3d23h 10.233.72.46 ylserver10686073 <none> <none> [root@ylserver10686071 ~]#
查看node节点污点情况:
[root@ylserver10686071 ~]# kubectl describe node ylserver10686073|grep -4 Taints projectcalico.org/IPv4Address: 10.68.60.73/24 projectcalico.org/IPv4IPIPTunnelAddr: 10.233.72.0 volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Mon, 12 Jul 2021 14:02:28 +0800 Taints: database=mysql:NoExecute database=mysql:NoSchedule database=mysql:PreferNoSchedule Unschedulable: false Lease: [root@ylserver10686071 ~]#
删除node节点 的key为database,effect 为NoExecute的taint:
[root@ylserver10686071 ~]# kubectl taint node ylserver10686073 database:NoExecute- node/ylserver10686073 untainted [root@ylserver10686071 ~]#
删除node 节点 key为database的所有taint:
[root@ylserver10686071 ~]# kubectl taint node ylserver10686073 database- node/ylserver10686073 untainted [root@ylserver10686071 ~]#
此时查看node的taints,可以看到前面创建的taint都被删除:
[root@ylserver10686071 ~]# kubectl describe node ylserver10686073|grep -2 Taints volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Mon, 12 Jul 2021 14:02:28 +0800 Taints: <none> Unschedulable: false Lease: [root@ylserver10686071 ~]#
使用命令 kubectl cordon 可以使某个node 停止被调度,验证一下:
[root@ylserver10686071 ~]# kubectl cordon ylserver10686072 node/ylserver10686072 cordoned [root@ylserver10686071 ~]# kubectl get nodes NAME STATUS ROLES AGE VERSION ylserver10686071 Ready master 15d v1.19.10 ylserver10686072 Ready,SchedulingDisabled master 15d v1.19.10 ylserver10686073 Ready master 15d v1.19.10 [root@ylserver10686071 ~]#
查看下该node的污点情况,可以看到给节点打上了 effect为NoSchedule的 taint:
[root@ylserver10686071 ~]# kubectl describe node ylserver10686072|grep -2 Taints volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Mon, 12 Jul 2021 14:00:48 +0800 Taints: node.kubernetes.io/unschedulable:NoSchedule Unschedulable: true Lease: [root@ylserver10686071 ~]#
恢复节点正常调度:
[root@ylserver10686071 ~]# kubectl uncordon ylserver10686072 node/ylserver10686072 uncordoned [root@ylserver10686071 ~]# kubectl describe node ylserver10686072|grep -2 Taints volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Mon, 12 Jul 2021 14:00:48 +0800 Taints: <none> Unschedulable: false Lease: [root@ylserver10686071 ~]#
Tolerations
Tolerations(容忍) 一般和Taints 一起搭配使用,toleration 是 pod 的属性,让pod 可以部署在标注了 Taint的 Node上。
通过实验了解一下tolerations的用法,先给一个 node 打上 effect为NoSchedule 的taint:
[root@ylserver10686071 ~]# kubectl taint node ylserver10686073 database=mysql:NoSchedule node/ylserver10686073 tainted
创建一个没有tolerations属性的daemonset资源配置文件:
[root@ylserver10686071 ~]# cat toleration001.yml apiVersion: apps/v1 kind: DaemonSet metadata: name: toleration001 namespace: prod spec: selector: matchLabels: k8s-app: toleration001 template: metadata: labels: k8s-app: toleration001 spec: containers: - name: tomcat image: tomcat:8.0
应用上面的配置文件,查看pod分布情况:
[root@ylserver10686071 ~]# kubectl apply -f toleration001.yml daemonset.apps/toleration001 created [root@ylserver10686071 ~]# kubectl get pods -n prod -o wide|grep toleration001 toleration001-bhckj 1/1 Running 0 20s 10.233.67.45 ylserver10686072 <none> <none> toleration001-ksfhh 1/1 Running 0 20s 10.233.75.78 ylserver10686071 <none> <none> [root@ylserver10686071 ~]#
可以看到被打上taint的node没有创建pod资源,修改下配置文件,添加tolerations属性:
[root@ylserver10686071 ~]# cat toleration001.yml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: toleration001
namespace: prod
spec:
selector:
matchLabels:
k8s-app: toleration001
template:
metadata:
labels:
k8s-app: toleration001
spec:
tolerations:
- key: "database"
operator: "Equal"
value: "mysql"
effect: "NoSchedule"
containers:
- name: tomcat
image: tomcat:8.0
更新一下配置文件,可以看到打上taint标签的node也部署了pod:
[root@ylserver10686071 ~]# kubectl apply -f toleration001.yml daemonset.apps/toleration001 configured [root@ylserver10686071 ~]# kubectl get pods -n prod -o wide|grep toleration001 toleration001-csmgk 1/1 Running 0 2m21s 10.233.72.53 ylserver10686073 <none> <none> toleration001-s52vh 1/1 Running 0 82s 10.233.75.79 ylserver10686071 <none> <none> toleration001-w2h58 1/1 Running 0 110s 10.233.67.46 ylserver10686072 <none> <none> [root@ylserver10686071 ~]#
tolerations有4个属性,其中key、value、effect和taint的值相对应,操作符 operator有两种属性:
- Equal 默认值,如果不指定operator,则默认为Equal
- Exists 此时无需指定value
tolerations还有2种特殊情况:
- 空的key 配合 Exists操作符能够匹配所有的键值
- 空的effect匹配所有的effect
实验验证一下,给另外一个Node打上effect为 NoExecute 的taint:
[root@ylserver10686071 ~]# kubectl taint node ylserver10686072 web=tomcat:NoExecute node/ylserver10686072 tainted [root@ylserver10686071 ~]# kubectl get pods -n prod -o wide|grep toleration001 toleration001-nptf2 1/1 Running 0 14s 10.233.72.56 ylserver10686073 <none> <none> toleration001-zgm2f 1/1 Running 0 35s 10.233.75.89 ylserver10686071 <none> <none> [root@ylserver10686071 ~]#
更新配置文件,使用Exists操作符,key和 effect为空:
[root@ylserver10686071 ~]# cat toleration001.yml apiVersion: apps/v1 kind: DaemonSet metadata: name: toleration001 namespace: prod spec: selector: matchLabels: k8s-app: toleration001 template: metadata: labels: k8s-app: toleration001 spec: tolerations: - key: "" operator: "Exists" effect: "" containers: - name: tomcat image: tomcat:8.0 [root@ylserver10686071 ~]#
更新下配置文件,查看pod部署情况:
[root@ylserver10686071 ~]# kubectl apply -f toleration001.yml daemonset.apps/toleration001 configured [root@ylserver10686071 ~]# kubectl get pods -n prod -o wide|grep toleration001 toleration001-4dcj9 1/1 Running 0 13s 10.233.75.90 ylserver10686071 <none> <none> toleration001-nj6fg 1/1 Running 0 42s 10.233.72.57 ylserver10686073 <none> <none> toleration001-tjnsw 1/1 Running 0 64s 10.233.67.49 ylserver10686072 <none> <none> [root@ylserver10686071 ~]#
查看下pod taints属性:
[root@ylserver10686071 ~]# kubectl describe pod toleration001 -n prod|grep -6 Tolerations default-token-lx75g: Type: Secret (a volume populated by a Secret) SecretName: default-token-lx75g Optional: false QoS Class: BestEffort Node-Selectors: <none> Tolerations: op=Exists node.kubernetes.io/disk-pressure:NoSchedule op=Exists node.kubernetes.io/memory-pressure:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists node.kubernetes.io/pid-pressure:NoSchedule op=Exists node.kubernetes.io/unreachable:NoExecute op=Exists node.kubernetes.io/unschedulable:NoSchedule op=Exists
在节点故障时,可以通过TaintBasedEvictions功能自动将节点设置Taint,然后将pod驱逐。但是在一些场景下,比如说网络故障造成的master与node失联,而这个node上运行了很多本地状态的应用即使网络故障,也仍然希望能够持续在该节点上运行,期望网络能够快速恢复,从而避免从这个node上被驱逐。Tolerations还有一个属性 tolerationSeconds 可以解决这个问题 , tolerationSeconds 属性表示在多少秒后才开始驱逐该POD,定义如下:
tolerations: - key: "node.alpha.kubernetes.io/unreachable" operator: "Exists" effect: "NoExecute" tolerationSeconds: 6000
对于Node未就绪状态,可以把key设置为 node.alpha.kubernetes.io/notReady 。
如果没有为pod指定 node.alpha.kubernetes.io/noReady 的Toleration,那么Kubernetes会自动为 pod 加入tolerationSeconds=300 的 node.alpha.kubernetes.io/notReady 类型的toleration。
同样,如果没有为 pod 指定 node.alpha.kubernetes.io/unreachable 的Toleration,那么Kubernetes会自动为 pod 加入 tolerationSeconds=300 的 node.alpha.kubernetes.io/unreachable 类型的 toleration 。
这些系统自动设置的 toleration 用于在 nod e发现问题时,能够为 pod 确保驱逐前再运行5min。这两个默认的 toleration由 Admission Controller "DefaultTolerationSeconds"自动加入。
实验验证一下,先删除node之前配置的taint,然后关闭kubelet进程:
[root@ylserver10686073 ~]# kubectl taint node ylserver10686073 database- node/ylserver10686073 untainted [root@ylserver10686073 ~]# systemctl stop kubelet [root@ylserver10686073 ~]# kubectl get nodes NAME STATUS ROLES AGE VERSION ylserver10686071 Ready master 16d v1.19.10 ylserver10686072 Ready master 16d v1.19.10 ylserver10686073 NotReady master 16d v1.19.10
查看下node的taint情况:
[root@ylserver10686073 ~]# kubectl describe node ylserver10686073|grep -2 Taints volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Mon, 12 Jul 2021 14:02:28 +0800 Taints: node.kubernetes.io/unreachable:NoExecute node.kubernetes.io/unreachable:NoSchedule Unschedulable: false [root@ylserver10686073 ~]#
查看node上面pod的tolerations属性:
[root@ylserver10686073 ~]# kubectl describe pod coredns-7677f9bb54-jk85m -n kube-system |grep -3 Tolerations Optional: false QoS Class: Burstable Node-Selectors: kubernetes.io/os=linux Tolerations: node-role.kubernetes.io/master:NoSchedule node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: [root@ylserver10686073 ~]#
总结一下:
- NodeAffinity 节点亲和性是Pod的属性,可以让pod在某些匹配的Node上运行,跟nodeselector很类似,但是使用操作符后比nodeselector更加灵活;
- Taint 污点跟NodeAffinity相反,是让Pod不要在某些Node上运行,taint 是Node属性;
- Tolerations 容忍一般跟 Taint 搭配使用,利用 Taint 和Tolerations可以让某个Node只运行特定的Pod。