九、k8s入门系列---- Taints 、Tolerations

  这节讲跟POD调度相关的另外2个概念:Taints (污点)、 Tolerations(容忍)

  Taints

  NodeAffinity节点亲和性是pod上定义的一种属性,让pod能够被调度到某些node上运行,Taint(污点)则刚好相反,它让Node拒绝Pod运行,taint也是针对node。

  给节点设置污点的命令如下,其中 key/value 作用是使用Tolerations时作为匹配的标签存在:

kubectl taint node [node] key=value[effect]

  其中 effect 有下列可取值:

  • NoSchedule

     如果一个POD没有声明容忍这个taint,则系统不会把该 POD 调度到这个Taint的node上

  • PreferNoSchedule

     NoSchedule的软限制版本,如果一个Pod 没有声明容忍这个Taint , 则系统会尽量避免把这个pod调度到这一节点上,但不是强制的

  • NoExecute

     定义Pod的驱逐行为,以应对节点故障。其对节点上正在运行的pod有以下影响:

    • 没有设置Toleration的pod会被立刻驱逐
    • 配置了对应的Toleration的pod,如果没有为TolerationSeconds赋值,则为一直留在这一节点上,配置的话,则会在指定时间后驱逐

  实验一下,先查看node 上运行的 pod:

[root@ylserver10686071 ~]# kubectl get pods -n prod -o wide|grep ylserver10686073
affinity002-59b9b4cfcd-8ph9d    1/1     Running   0          147m    10.233.72.51   ylserver10686073   <none>           <none>
busybox-bbf7c9c98-2nph4         1/1     Running   0          5d5h    10.233.72.44   ylserver10686073   <none>           <none>
stateapp-0                      1/1     Running   0          6d1h    10.233.72.43   ylserver10686073   <none>           <none>
web001-69bd6f8c5f-nvgmj         1/1     Running   0          24h     10.233.72.47   ylserver10686073   <none>           <none>
web002-79c6bc455-lsx6z          1/1     Running   0          3d22h   10.233.72.46   ylserver10686073   <none>           <none>
[root@ylserver10686071 ~]# 

  配置下污点,effect为 PreferNoSchedule,然后查看pod运行情况:

[root@ylserver10686071 ~]# kubectl taint node ylserver10686073 database=mysql:PreferNoSchedule
node/ylserver10686073 tainted
[root@ylserver10686071 ~]# kubectl get pods -n prod -o wide|grep ylserver10686073
affinity002-59b9b4cfcd-8ph9d    1/1     Running   0          150m    10.233.72.51   ylserver10686073   <none>           <none>
busybox-bbf7c9c98-2nph4         1/1     Running   0          5d5h    10.233.72.44   ylserver10686073   <none>           <none>
stateapp-0                      1/1     Running   0          6d1h    10.233.72.43   ylserver10686073   <none>           <none>
web001-69bd6f8c5f-nvgmj         1/1     Running   0          24h     10.233.72.47   ylserver10686073   <none>           <none>
web002-79c6bc455-lsx6z          1/1     Running   0          3d23h   10.233.72.46   ylserver10686073   <none>           <none>
[root@ylserver10686071 ~]# 

  配置下污点 effect 为 NoSchedule ,看pod是否会被驱逐:

[root@ylserver10686071 ~]# kubectl taint node ylserver10686073 database=mysql:NoSchedule
node/ylserver10686073 tainted
[root@ylserver10686071 ~]# kubectl get pods -n prod -o wide|grep ylserver10686073
affinity002-59b9b4cfcd-8ph9d    1/1     Running   0          152m    10.233.72.51   ylserver10686073   <none>           <none>
busybox-bbf7c9c98-2nph4         1/1     Running   0          5d5h    10.233.72.44   ylserver10686073   <none>           <none>
stateapp-0                      1/1     Running   0          6d1h    10.233.72.43   ylserver10686073   <none>           <none>
web001-69bd6f8c5f-nvgmj         1/1     Running   0          24h     10.233.72.47   ylserver10686073   <none>           <none>
web002-79c6bc455-lsx6z          1/1     Running   0          3d23h   10.233.72.46   ylserver10686073   <none>           <none>
[root@ylserver10686071 ~]# 

  可以看到effect 为 NoSchedule的时候,node上的pod还是会正常运行,不会被驱逐。

  

  配置下污点 effect 为 NoExecute ,可以看到 POD 正在被驱逐中:

[root@ylserver10686071 ~]# kubectl taint node ylserver10686073 database=mysql:NoExecute
node/ylserver10686073 tainted
[root@ylserver10686071 ~]# kubectl get pods -n prod -o wide|grep ylserver10686073
affinity002-59b9b4cfcd-8ph9d    1/1     Terminating         0          154m    10.233.72.51   ylserver10686073   <none>           <none>
busybox-bbf7c9c98-2nph4         1/1     Terminating         0          5d5h    10.233.72.44   ylserver10686073   <none>           <none>
stateapp-0                      1/1     Terminating         0          6d1h    10.233.72.43   ylserver10686073   <none>           <none>
web001-69bd6f8c5f-nvgmj         1/1     Terminating         0          24h     10.233.72.47   ylserver10686073   <none>           <none>
web002-79c6bc455-lsx6z          1/1     Terminating         0          3d23h   10.233.72.46   ylserver10686073   <none>           <none>
[root@ylserver10686071 ~]# 

  查看node节点污点情况:

[root@ylserver10686071 ~]# kubectl describe node ylserver10686073|grep -4 Taints
                    projectcalico.org/IPv4Address: 10.68.60.73/24
                    projectcalico.org/IPv4IPIPTunnelAddr: 10.233.72.0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Mon, 12 Jul 2021 14:02:28 +0800
Taints:             database=mysql:NoExecute
                    database=mysql:NoSchedule
                    database=mysql:PreferNoSchedule
Unschedulable:      false
Lease:
[root@ylserver10686071 ~]# 

  删除node节点 的key为database,effect 为NoExecute的taint:

[root@ylserver10686071 ~]# kubectl taint node ylserver10686073 database:NoExecute-
node/ylserver10686073 untainted
[root@ylserver10686071 ~]# 

  删除node 节点 key为database的所有taint:

[root@ylserver10686071 ~]# kubectl taint node ylserver10686073 database-
node/ylserver10686073 untainted
[root@ylserver10686071 ~]# 

  此时查看node的taints,可以看到前面创建的taint都被删除:

[root@ylserver10686071 ~]# kubectl describe node ylserver10686073|grep -2 Taints
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Mon, 12 Jul 2021 14:02:28 +0800
Taints:             <none>
Unschedulable:      false
Lease:
[root@ylserver10686071 ~]# 

  使用命令 kubectl cordon 可以使某个node 停止被调度,验证一下:

[root@ylserver10686071 ~]# kubectl cordon ylserver10686072
node/ylserver10686072 cordoned
[root@ylserver10686071 ~]# kubectl get nodes
NAME               STATUS                     ROLES    AGE   VERSION
ylserver10686071   Ready                      master   15d   v1.19.10
ylserver10686072   Ready,SchedulingDisabled   master   15d   v1.19.10
ylserver10686073   Ready                      master   15d   v1.19.10
[root@ylserver10686071 ~]# 

  查看下该node的污点情况,可以看到给节点打上了 effect为NoSchedule的 taint:

[root@ylserver10686071 ~]# kubectl describe node ylserver10686072|grep -2 Taints
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Mon, 12 Jul 2021 14:00:48 +0800
Taints:             node.kubernetes.io/unschedulable:NoSchedule
Unschedulable:      true
Lease:
[root@ylserver10686071 ~]# 

  恢复节点正常调度:

[root@ylserver10686071 ~]# kubectl uncordon ylserver10686072
node/ylserver10686072 uncordoned
[root@ylserver10686071 ~]# kubectl describe node ylserver10686072|grep -2 Taints
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Mon, 12 Jul 2021 14:00:48 +0800
Taints:             <none>
Unschedulable:      false
Lease:
[root@ylserver10686071 ~]# 

  

   Tolerations

  Tolerations(容忍) 一般和Taints 一起搭配使用,toleration 是 pod 的属性,让pod 可以部署在标注了 Taint的 Node上。

   通过实验了解一下tolerations的用法,先给一个 node 打上 effect为NoSchedule 的taint:

[root@ylserver10686071 ~]# kubectl taint node ylserver10686073 database=mysql:NoSchedule
node/ylserver10686073 tainted

  创建一个没有tolerations属性的daemonset资源配置文件:

[root@ylserver10686071 ~]# cat toleration001.yml 
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: toleration001
  namespace: prod
spec:
  selector:
    matchLabels:
      k8s-app: toleration001
  template:
    metadata:
      labels:
        k8s-app: toleration001
    spec:
      containers:
      - name: tomcat
        image: tomcat:8.0

  应用上面的配置文件,查看pod分布情况:

[root@ylserver10686071 ~]# kubectl apply -f toleration001.yml 
daemonset.apps/toleration001 created
[root@ylserver10686071 ~]# kubectl get pods -n prod -o wide|grep toleration001
toleration001-bhckj             1/1     Running   0          20s    10.233.67.45   ylserver10686072   <none>           <none>
toleration001-ksfhh             1/1     Running   0          20s    10.233.75.78   ylserver10686071   <none>           <none>
[root@ylserver10686071 ~]# 

  可以看到被打上taint的node没有创建pod资源,修改下配置文件,添加tolerations属性:

[root@ylserver10686071 ~]# cat  toleration001.yml 
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: toleration001
  namespace: prod
spec:
  selector:
    matchLabels:
      k8s-app: toleration001
  template:
    metadata:
      labels:
        k8s-app: toleration001
    spec:
      tolerations:
      - key: "database"
        operator: "Equal"
        value: "mysql"
        effect: "NoSchedule"
      containers:
      - name: tomcat
        image: tomcat:8.0

  更新一下配置文件,可以看到打上taint标签的node也部署了pod:

[root@ylserver10686071 ~]# kubectl apply -f   toleration001.yml 
daemonset.apps/toleration001 configured
[root@ylserver10686071 ~]# kubectl get pods -n prod -o wide|grep toleration001
toleration001-csmgk             1/1     Running   0          2m21s   10.233.72.53   ylserver10686073   <none>           <none>
toleration001-s52vh             1/1     Running   0          82s     10.233.75.79   ylserver10686071   <none>           <none>
toleration001-w2h58             1/1     Running   0          110s    10.233.67.46   ylserver10686072   <none>           <none>
[root@ylserver10686071 ~]# 

  

  tolerations有4个属性,其中key、value、effect和taint的值相对应,操作符 operator有两种属性:

  • Equal   默认值,如果不指定operator,则默认为Equal
  • Exists  此时无需指定value

  tolerations还有2种特殊情况:

  • 空的key 配合 Exists操作符能够匹配所有的键值
  • 空的effect匹配所有的effect

  

  实验验证一下,给另外一个Node打上effect为 NoExecute 的taint:

[root@ylserver10686071 ~]# kubectl taint node ylserver10686072 web=tomcat:NoExecute
node/ylserver10686072 tainted
[root@ylserver10686071 ~]# kubectl get pods -n prod -o wide|grep toleration001
toleration001-nptf2             1/1     Running   0          14s     10.233.72.56   ylserver10686073   <none>           <none>
toleration001-zgm2f             1/1     Running   0          35s     10.233.75.89   ylserver10686071   <none>           <none>
[root@ylserver10686071 ~]# 

  更新配置文件,使用Exists操作符,key和 effect为空:

[root@ylserver10686071 ~]# cat  toleration001.yml 
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: toleration001
  namespace: prod
spec:
  selector:
    matchLabels:
      k8s-app: toleration001
  template:
    metadata:
      labels:
        k8s-app: toleration001
    spec:
      tolerations:
      - key: ""
        operator: "Exists"
        effect: ""
      containers:
      - name: tomcat
        image: tomcat:8.0
[root@ylserver10686071 ~]# 

  更新下配置文件,查看pod部署情况:

[root@ylserver10686071 ~]# kubectl apply -f toleration001.yml 
daemonset.apps/toleration001 configured
[root@ylserver10686071 ~]# kubectl get pods -n prod -o wide|grep toleration001
toleration001-4dcj9             1/1     Running   0          13s     10.233.75.90   ylserver10686071   <none>           <none>
toleration001-nj6fg             1/1     Running   0          42s     10.233.72.57   ylserver10686073   <none>           <none>
toleration001-tjnsw             1/1     Running   0          64s     10.233.67.49   ylserver10686072   <none>           <none>
[root@ylserver10686071 ~]# 

  查看下pod taints属性:

[root@ylserver10686071 ~]# kubectl describe pod toleration001 -n prod|grep -6 Tolerations
  default-token-lx75g:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-lx75g
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     op=Exists
                 node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                 node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                 node.kubernetes.io/not-ready:NoExecute op=Exists
                 node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                 node.kubernetes.io/unreachable:NoExecute op=Exists
                 node.kubernetes.io/unschedulable:NoSchedule op=Exists

  

  在节点故障时,可以通过TaintBasedEvictions功能自动将节点设置Taint,然后将pod驱逐。但是在一些场景下,比如说网络故障造成的master与node失联,而这个node上运行了很多本地状态的应用即使网络故障,也仍然希望能够持续在该节点上运行,期望网络能够快速恢复,从而避免从这个node上被驱逐。Tolerations还有一个属性 tolerationSeconds 可以解决这个问题 , tolerationSeconds 属性表示在多少秒后才开始驱逐该POD,定义如下:

tolerations:
- key: "node.alpha.kubernetes.io/unreachable"
  operator: "Exists"
  effect: "NoExecute"
  tolerationSeconds: 6000

  

  对于Node未就绪状态,可以把key设置为 node.alpha.kubernetes.io/notReady 

  如果没有为pod指定 node.alpha.kubernetes.io/noReady 的Toleration,那么Kubernetes会自动为 pod 加入tolerationSeconds=300node.alpha.kubernetes.io/notReady 类型的toleration。

  同样,如果没有为 pod 指定 node.alpha.kubernetes.io/unreachable 的Toleration,那么Kubernetes会自动为 pod 加入 tolerationSeconds=300node.alpha.kubernetes.io/unreachable 类型的 toleration 。

  这些系统自动设置的 toleration 用于在 nod e发现问题时,能够为 pod 确保驱逐前再运行5min。这两个默认的 toleration由 Admission Controller "DefaultTolerationSeconds"自动加入。

 

  实验验证一下,先删除node之前配置的taint,然后关闭kubelet进程:

[root@ylserver10686073 ~]# kubectl taint node ylserver10686073 database-
node/ylserver10686073 untainted
[root@ylserver10686073 ~]# systemctl stop kubelet
[root@ylserver10686073 ~]# kubectl get nodes
NAME               STATUS     ROLES    AGE   VERSION
ylserver10686071   Ready      master   16d   v1.19.10
ylserver10686072   Ready      master   16d   v1.19.10
ylserver10686073   NotReady   master   16d   v1.19.10

  查看下node的taint情况:

[root@ylserver10686073 ~]# kubectl describe node ylserver10686073|grep -2 Taints
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Mon, 12 Jul 2021 14:02:28 +0800
Taints:             node.kubernetes.io/unreachable:NoExecute
                    node.kubernetes.io/unreachable:NoSchedule
Unschedulable:      false
[root@ylserver10686073 ~]# 

  查看node上面pod的tolerations属性:

[root@ylserver10686073 ~]# kubectl describe pod coredns-7677f9bb54-jk85m  -n kube-system |grep -3  Tolerations
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  kubernetes.io/os=linux
Tolerations:     node-role.kubernetes.io/master:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
[root@ylserver10686073 ~]# 

  

  总结一下:

  • NodeAffinity 节点亲和性是Pod的属性,可以让pod在某些匹配的Node上运行,跟nodeselector很类似,但是使用操作符后比nodeselector更加灵活;
  • Taint 污点跟NodeAffinity相反,是让Pod不要在某些Node上运行,taint 是Node属性;
  • Tolerations 容忍一般跟 Taint 搭配使用,利用 Taint 和Tolerations可以让某个Node只运行特定的Pod。

 

posted @ 2021-07-27 16:52  梦君子  阅读(672)  评论(1编辑  收藏  举报