污点与容忍

对于 nodeAffinity 无论是硬策略还是软策略方式，都是调度 Pod 到预期节点上，而污点（Taints）恰好与之相反，如果一个节点标记为 Taints ，除非 Pod 也被标识为可以容忍污点节点，否则该 Taints 节点不会被调度 Pod。

比如用户希望把 Master 节点保留给 Kubernetes 系统组件使用，或者把一组具有特殊资源预留给某些 Pod，则污点就很有用了，Pod 不会再被调度到 taint 标记过的节点。我们使用 kubeadm 搭建的集群默认就给 master 节点添加了一个污点标记，所以我们看到我们平时的 Pod 都没有被调度到 master 上去：

$ kubectl describe node k8s-master
Name:               k8s-master
Roles:              master
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=ydzs-master
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/master=
......
Taints:             node-role.kubernetes.io/master:NoSchedule  # kuberadm安装master自己带污点。
Unschedulable:      false
......

我们可以使用上面的命令查看 master 节点的信息，其中有一条关于 Taints 的信息：node-role.kubernetes.io/master:NoSchedule，就表示master 节点打了一个污点的标记，其中影响的参数是 NoSchedule，表示 Pod 不会被调度到标记为 taints 的节点，除了 NoSchedule 外，还有另外两个选项：
1.PreferNoSchedule：NoSchedule 的软策略版本，表示尽量不调度到污点节点上去
2.NoExecute：该选项意味着一旦 Taint 生效，如该节点内正在运行的 Pod 没有对应容忍（Tolerate）设置，则会直接被逐出
污点 taint 标记节点的命令如下：

[root@k8s-master01 ~]# kubectl taint node k8s-node02 test=node2:NoSchedule # 添加污点
node/k8s-node02 tainted

[root@k8s-master01 ~]# kubectl taint node k8s-node02 test-  # 删除污点
node/k8s-node02 untainted

上面的命名将 k8s-node02 节点标记为了污点，影响策略是 NoSchedule，只会影响新的 Pod 调度，如果仍然希望某个 Pod 调度到 taint 节点上，则必须在 Spec 中做出 Toleration 定义，才能调度到该节点，比如现在我们想要将一个新Pod 调度到 master 节点：(taint-demo.yaml)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: taint
  labels:
    app: taint
spec:
  replicas: 3
  selector:
    matchLabels:
      app: taint
  template:
    metadata:
      labels:
        app: taint
    spec:
      containers:
      - name: nginx
        image: nginx
        ports:
        - name: http
          containerPort: 80
      tolerations:
      - key: "node-role.kubernetes.io/master"
        operator: "Exists"
        effect: "NoSchedule"

由于 master 节点被标记为了污点，所以我们这里要想 Pod 能够调度到改节点去，就需要增加容忍的声明：

tolerations:
- key: "node-role.kubernetes.io/master"
  operator: "Exists"
  effect: "NoSchedule"

然后创建上面的资源，查看结果：

[root@k8s-master01 ~]# kubectl apply -f taint-demo.yaml
deployment.apps "taint" created
[root@k8s-master01 ~]# kubectl get pods -o wide
NAME                                      READY     STATUS             RESTARTS   AGE       IP             NODE
......
taint-845d8bb4fb-57mhm                    1/1       Running            0          1m        10.244.4.247   ydzs-node2
taint-845d8bb4fb-bbvmp                    1/1       Running            0          1m        10.244.0.33    ydzs-master
taint-845d8bb4fb-zb78x                    1/1       Running            0          1m        10.244.4.246   ydzs-node2
......

我们可以看到有一个 Pod 副本被调度到了 master 节点，这就是容忍的使用方法。

容忍小实验

1.查看当前pod所在的节点
[root@k8s-master01 ~]# kubectl get pod -owide
NAME                           READY   STATUS    RESTARTS   AGE     IP              NODE           NOMINATED NODE   READINESS GATES
busybox                        1/1     Running   268        11d     172.17.125.9    k8s-node01     <none>           <none>
nginx-68db656dd8-mprnc         1/1     Running   0          6h1m    172.17.125.13   k8s-node01     <none>           <none>
nginx-68db656dd8-znwgp         1/1     Running   1          11d     172.18.195.11   k8s-master03   <none>           <none>
pod-affinity-6967f7785-j9dqs   1/1     Running   0          69s     172.27.14.249   k8s-node02     <none>           <none>
pod-affinity-6967f7785-lcstr   1/1     Running   0          69s     172.27.14.250   k8s-node02     <none>           <none>
pod-affinity-6967f7785-v97lc   1/1     Running   0          69s     172.27.14.248   k8s-node02     <none>           <none>
test-busybox                   1/1     Running   0          2m22s   172.27.14.246   k8s-node02     <none>           <none>
web                            2/2     Running   0          4d      172.18.195.14   k8s-master03   <none>           <none>

2.现在k8s-node02节点上打上一个NoExecute
[root@k8s-master01 ~]# kubectl taint nodes k8s-node02 test=xtaint:NoExecute
node/k8s-node02 tainted
You have new mail in /var/spool/mail/root
[root@k8s-master01 ~]# kubectl get pod -owide
NAME                           READY   STATUS        RESTARTS   AGE     IP              NODE           NOMINATED NODE   READINESS GATES
busybox                        1/1     Running       268        11d     172.17.125.9    k8s-node01     <none>           <none>
nginx-68db656dd8-mprnc         1/1     Running       0          6h2m    172.17.125.13   k8s-node01     <none>           <none>
nginx-68db656dd8-znwgp         1/1     Running       1          11d     172.18.195.11   k8s-master03   <none>           <none>
pod-affinity-6967f7785-542rg   0/1     Pending       0          13s     <none>          <none>         <none>           <none>
pod-affinity-6967f7785-8dmkf   0/1     Pending       0          13s     <none>          <none>         <none>           <none>
pod-affinity-6967f7785-lcstr   0/1     Terminating   0          2m29s   <none>          k8s-node02     <none>           <none>
pod-affinity-6967f7785-v97lc   0/1     Terminating   0          2m29s   <none>          k8s-node02     <none>           <none>
pod-affinity-6967f7785-vvpj2   0/1     Pending       0          13s     <none>          <none>         <none>           <none>
test-busybox                   1/1     Terminating   0          3m42s   172.27.14.246   k8s-node02     <none>           <none>
web                            2/2     Running       0          4d      172.18.195.14   k8s-master03   <none>           <none>

3.创建一个包含有容忍toleration的配置文件
[root@k8s-master01 ~]# cat node-selector-demo.yaml 
apiVersion: v1
kind: Pod
metadata:
  labels:
    app: busybox-pod
  name: test-busybox
spec:
  containers:
  - command:
    - sleep
    - "3600"
    image: busybox
    imagePullPolicy: Always
    name: test-busybox
  nodeSelector:
    com: youdianzhishi
  tolerations:
  - key: "test"
    operator: "Equal"
    value: "xtaint"
    effect: "NoExecute"
    tolerationSeconds: 3600 

配置解释：
tolerations:----------->容忍
- key: "check" ----------->容忍的键
operator: "Equal"----------->操作符"等于"
value: "xtaint"----------->容忍的键对应的键值
effect: "NoExecute"----------->容忍的键对应的影响效果
tolerationSeconds: 3600----------->容忍3600秒。本pod配置文件中有这个参数了，然后再给本服务器设置污点NoExecute，那么这个pod也不会像普通pod那样立即被驱逐，而是再等上3600秒才被删除。

[root@k8s-master01 ~]# kubectl create -f node-selector-demo.yaml 
pod/test-busybox created

[root@k8s-master01 ~]# kubectl get pod -owide
NAME                           READY   STATUS    RESTARTS   AGE     IP              NODE           NOMINATED NODE   READINESS GATES
busybox                        1/1     Running   268        11d     172.17.125.9    k8s-node01     <none>           <none>
nginx-68db656dd8-mprnc         1/1     Running   0          6h8m    172.17.125.13   k8s-node01     <none>           <none>
nginx-68db656dd8-znwgp         1/1     Running   1          11d     172.18.195.11   k8s-master03   <none>           <none>
pod-affinity-6967f7785-542rg   0/1     Pending   0          5m59s   <none>          <none>         <none>           <none>
pod-affinity-6967f7785-8dmkf   0/1     Pending   0          5m59s   <none>          <none>         <none>           <none>
pod-affinity-6967f7785-vvpj2   0/1     Pending   0          5m59s   <none>          <none>         <none>           <none>
test-busybox                   1/1     Running   0          34s     172.27.14.251   k8s-node02     <none>           <none>
web                            2/2     Running   0          4d      172.18.195.14   k8s-master03   <none>           <none>

toleration配置方式

方式一：
tolerations:
- key: "key"
  operator: "Equal"
  value: "value"
  effect: "NoSchedule"
方式二：
tolerations:
- key: "key"
  operator: "Exists"
  effect: "NoSchedule"
  
   一个Toleration和一个Taint相匹配是指它们有一样的key和effect，并且如果operator是Exists（此时toleration不指定value）或者operator是Equal，则它们的value应该相等。
注意两种情况：

    如果一个Toleration的key为空且operator为Exists，表示这个Toleration与任意的key、value和effect都匹配，即这个Toleration能容忍任意的Taint：
tolerations:
- operator: "Exists"

    如果一个Toleration的effect为空，则key与之相同的相匹配的Taint的effect可以是任意值：
tolerations:
- key: "key"
  operator: "Exists"

一个使用了很多本地状态的应用程序在网络断开时，仍然希望停留在当前节点上运行一段时间，愿意等待网络恢复以避免被驱逐。在这种情况下，Pod的Toleration可以这样配置：
tolerations:
- key: "node.alpha.kubernetes.io/unreachable"
  operator: "Exists"
  effect: "NoExecute"
  tolerationSeconds: 6000

上述例子使用到effect的一个值NoSchedule，也可以使用PreferNoSchedule，该值定义尽量避免将Pod调度到存在其不能容忍的Taint的节点上，但并不是强制的。effect的值还可以设置为NoExecute。
对于 tolerations 属性的写法，其中的 key、value、effect 与 Node 的 Taint 设置需保持一致，还有以下几点说明：

如果 operator 的值是 Exists，则 value 属性可省略
如果 operator 的值是 Equal，则表示其 key 与 value 之间的关系是 equal(等于)
如果不指定 operator 属性，则默认值为 Equal
另外，还有两个特殊值：

空的 key 如果再配合 Exists 就能匹配所有的 key 与 value，也就是是能容忍所有节点的所有 Taints
空的 effect 匹配所有的 effect

posted @ 2021-03-30 13:44 等等马上就好阅读(269) 评论(0) 收藏举报

刷新页面返回顶部

污点与容忍

容忍小实验

toleration配置方式

公告