kubernetes(21):k8s的高级调度方式-亲和度和污点
K8s的高级调度方式-亲和度和污点
1 默认的scheduler的调度过程:
- 预选策略:从所有节点当中选择基本符合选择条件的节点。
- 优选函数:在众多符合基本条件的节点中使用优选函数,计算节点各自的得分,通过比较进行排序。
- 从最高得分的节点中随机选择出一个作为Pod运行的节点。
可以通过自己的预设来影响预选、优选过程,从而实现符合我们期望的调度结果。
2 影响调度方式:
- 节点选择器:NodeSelector,甚至可以设置nodename来选择节点本身。
- 亲和性调度:NodeAffinity(节点亲和性)、podAffinity(Pod亲和性)、PodAntiAffinity(Pod的反亲和性)
- 污点和容忍度:Taint、toleration
3 节点选择器:NodeSelector
如果我们期望把Pod调度到某一个特定的节点上,可以通过设定Pod.spec.nodeName给定node名称实现。我们可以给一部分node打上特有标签,在pod.spec.nodeSelector中匹配这些标签。可以极大的缩小预选范围。
给node添加标签:
kubectl label
nodes NODE_NAME key1=value1...keyN=valueN
如:在node01上打上标签为app=frontend,而在pod上设置NodeSelector为这个标签,则此Pod只能运行在存在此标签的节点上。
若没有node存在此标签,则Pod无法被调度,即为Pending状态。
我们先给一个node打上标签
[root@k8s-master ~]# kubectl get nodes --show-labels NAME STATUS ROLES AGE VERSION LABELS k8s-master Ready master 12d v1.15.3 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-master,kubernetes.io/os=linux,node-role.kubernetes.io/master= k8s-node-1 Ready node 12d v1.15.3 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node-1,kubernetes.io/os=linux,node-role.kubernetes.io/node= k8s-node-2 Ready node 12d v1.15.3 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node-2,kubernetes.io/os=linux,node-role.kubernetes.io/node= [root@k8s-master ~]# [root@k8s-master ~]# [root@k8s-master ~]# kubectl label nodes k8s-node-1 disk=ssd node/k8s-node-1 labeled [root@k8s-master ~]# kubectl get nodes --show-labels NAME STATUS ROLES AGE VERSION LABELS k8s-master Ready master 12d v1.15.3 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-master,kubernetes.io/os=linux,node-role.kubernetes.io/master= k8s-node-1 Ready node 12d v1.15.3 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disk=ssd,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node-1,kubernetes.io/os=linux,node-role.kubernetes.io/node= k8s-node-2 Ready node 12d v1.15.3 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node-2,kubernetes.io/os=linux,node-role.kubernetes.io/node= [root@k8s-master ~]# kubectl get nodes --show-labels|grep ssd k8s-node-1 Ready node 12d v1.15.3 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disk=ssd,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node-1,kubernetes.io/os=linux,node-role.kubernetes.io/node= [root@k8s-master ~]#
# cat nodeSelector.yaml apiVersion: v1 kind: Pod metadata: name: nginx-pod labels: app: my-pod spec: containers: - name: my-pod image: nginx ports: - name: http containerPort: 80 nodeSelector: disk: ssd #如果nodeSelector中指定的标签节点都没有,该pod就会处于Pending状态(预选失败)
[root@k8s-master schedule]# kubectl create -f nodeSelector.yaml pod/nginx-pod created [root@k8s-master schedule]# kubectl get pod NAME READY STATUS RESTARTS AGE nginx-pod 1/1 Running 0 6s [root@k8s-master schedule]# kubectl describe pod nginx-pod | grep Node Node: k8s-node-1/10.6.76.23 Node-Selectors: disk=ssd [root@k8s-master schedule]#
4 节点亲和度调度nodeAffinity
requiredDuringSchedulingIgnoredDuringExecution 硬亲和性 必须满足亲和性。
preferredDuringSchedulingIgnoredDuringExecution 软亲和性
能满足最好,不满足也没关系。
4.1 硬亲和性
matchExpressions : 匹配表达式,这个标签可以指定一段,例如pod中定义的key为zone,operator为In(包含那些),values为 foo和bar。就是在node节点中包含foo和bar的标签中调度
matchFields : 匹配字段 和上面的意思 不过他可以不定义标签值,可以定义
选择在 node 有 zone 标签值为 foo 或 bbb 值的节点上运行 pod
[root@k8s-master ~]# kubectl get nodes --show-labels| grep zone k8s-node-1 Ready node 46d v1.15.3 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disk=ssd,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node-1,kubernetes.io/os=linux,node-role.kubernetes.io/node=,zone=foo [root@k8s-master ~]#
# cat node-affinity-1.yaml apiVersion: apps/v1 kind: Deployment metadata: name: nginx-hello-deployment namespace: labels: app: nginx-hello spec: replicas: 2 selector: matchLabels: app: nginx-hello template: metadata: labels: app: nginx-hello spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: zone operator: In values: - foo - bbb containers: - name: nginx-hello image: nginx ports: - containerPort: 80
[root@k8s-master schedule]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESSGATES nginx-hello-deployment-d457bd7bc-fsjjn 1/1 Running 0 2m34s 10.254.1.124 k8s-node-1 <none> <none> nginx-hello-deployment-d457bd7bc-ntb8h 1/1 Running 0 2m34s 10.254.1.123 k8s-node-1 <none> <none> nginx-pod 1/1 Running 0 58m 10.254.1.120 k8s-node-1 <none> <none> [root@k8s-master schedule]#
我们发现都按标签 分配到node1 上面了,我们把标签改一下,让pod匹配不上
# cat node-affinity-1.yaml apiVersion: apps/v1 kind: Deployment metadata: name: nginx-hello-deployment namespace: labels: app: nginx-hello spec: replicas: 2 selector: matchLabels: app: nginx-hello template: metadata: labels: app: nginx-hello spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: zone operator: In values: - foo-no - bbb-no containers: - name: nginx-hello image: nginx ports: - containerPort: 80
#查看(没有zone这个标签value值匹配不上,所以会Pending
[root@k8s-master schedule]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-hello-deployment-6c96b5675f-8jqnx 0/1 Pending 0 43s <none> <none> <none> <none> nginx-hello-deployment-6c96b5675f-lbnsw 0/1 Pending 0 43s <none> <none> <none> <none> nginx-pod 1/1 Running 0 60m 10.254.1.120 k8s-node-1 <none> <none> [root@k8s-master schedule]#
4.2 软亲和
nodeAffinity的preferredDuringSchedulingIgnoredDuringExecution (软亲和,选择条件匹配多的,就算都不满足条件,还是会生成pod)
# cat node-affinity-1.yaml apiVersion: apps/v1kind: Deployment metadata: name: nginx-hello-deployment namespace: labels: app: nginx-hellospec: replicas: 2 selector: matchLabels: app: nginx-hello template: metadata: labels: app: nginx-hello spec: affinity: nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution: - preference: matchExpressions: - key: zone operator: In values: - foo-no - bbb-no weight: 60 #匹配相应nodeSelectorTerm相关联的权重,1-100 containers: - name: nginx-hello image: nginx ports: - containerPort: 80
[root@k8s-master schedule]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-hello-deployment-98654dc57-cvvlb 1/1 Running 0 15s 10.254.1.125 k8s-node-1 <none> <none> nginx-hello-deployment-98654dc57-mglbx 1/1 Running 0 20s 10.254.2.90 k8s-node-2 <none> <none> nginx-pod 1/1 Running 0 72m 10.254.1.120 k8s-node-1 <none> <none> [root@k8s-master schedule]#
5 pod亲和度podAffinity
Pod亲和性场景,我们的k8s集群的节点分布在不同的区域或者不同的机房,当服务A和服务B要求部署在同一个区域或者同一机房的时候,我们就需要亲和性调度了。
labelSelector : 选择跟那组Pod亲和
namespaces : 选择哪个命名空间
topologyKey : 指定节点上的哪个键
5.1 按labelSelector标签亲和
让两个POD标签处于一处
# cat pod-affinity.yaml apiVersion: extensions/v1beta1 kind: Deployment metadata: name: nginx-deployment spec: replicas: 1 template: metadata: labels: app: nginx spec: containers: - name: nginx image: nginx:1.15 ports: - containerPort: 80 --- apiVersion: apps/v1 kind: Deployment metadata: name: nginx-deployment-pod-affinity namespace: labels: app: nginx-hello spec: replicas: 1 selector: matchLabels: app: nginx-hello template: metadata: labels: app: nginx-hello spec: affinity: podAffinity: #preferredDuringSchedulingIgnoredDuringExecution: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app #标签键名,上面pod定义 operator: In #In表示在 values: - nginx #app标签的值 topologyKey: kubernetes.io/hostname #kubernetes.io/hostname的值一样代表pod处于同一位置 #此pod应位于同一位置(亲和力)或不位于同一位置(反亲和力),与pods匹配指定名称空间中的labelSelector,其中co-located定义为在标签值为的节点上运行,key topologyKey匹配任何选定pod的任何节点在跑 containers: - name: nginx-hello image: nginx ports: - containerPort: 80
[root@k8s-master ~]# kubectl get pod -o wide| grep nginx nginx-deployment-6f6d9b887f-5mvqs 1/1 Running 0 6s 10.254.2.92 k8s-node-2 <none> <none> nginx-deployment-pod-affinity-5566c6d4fd-2tnrq 1/1 Running 0 6s 10.254.2.93 k8s-node-2 <none> <none> [root@k8s-master ~]#
5.2 podAntiAffinity反亲和
让pod和某个pod不处于同一node,和上面相反)
# cat pod-affinity.yaml apiVersion: extensions/v1beta1 kind: Deployment metadata: name: nginx-deployment spec: replicas: 1 template: metadata: labels: app: nginx spec: containers: - name: nginx image: nginx:1.15 ports: - containerPort: 80 --- apiVersion: apps/v1 kind: Deployment metadata: name: nginx-deployment-pod-affinity namespace: labels: app: nginx-hello spec: replicas: 1 selector: matchLabels: app: nginx-hello template: metadata: labels: app: nginx-hello spec: affinity: #podAffinity: podAntiAffinity: #就改了这里 requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app #标签键名,上面pod定义 operator: In #In表示在 values: - nginx #app1标签的值 topologyKey: kubernetes.io/hostname #kubernetes.io/hostname的值一样代表pod处于同一位置 #此pod应位于同一位置(亲和力)或不位于同一位置(反亲和力),与pods匹配指定名称空间中的labelSelector,其中co-located定义为在标签值为的节点上运行,key topologyKey匹配任何选定pod的任何节点在跑 containers: - name: nginx-hello image: nginx ports: - containerPort: 80
[root@k8s-master ~]# kubectl apply -f a.yaml deployment.extensions/nginx-deployment unchanged deployment.apps/nginx-deployment-pod-affinity configured [root@k8s-master ~]# kubectl get pod -o wide| grep nginx nginx-deployment-6f6d9b887f-5mvqs 1/1 Running 0 68s 10.254.2.92 k8s-node-2 <none> <none> nginx-deployment-pod-affinity-5566c6d4fd-2tnrq 1/1 Running 0 68s 10.254.2.93 k8s-node-2 <none> <none> nginx-deployment-pod-affinity-86bdf6996b-fdb8f 0/1 ContainerCreating 0 4s <none> k8s-node-1 <none> <none> [root@k8s-master ~]# [root@k8s-master ~]# [root@k8s-master ~]# kubectl get pod -o wide| grep nginx nginx-deployment-6f6d9b887f-5mvqs 1/1 Running 0 73s 10.254.2.92 k8s-node-2 <none> <none> nginx-deployment-pod-affinity-86bdf6996b-fdb8f 1/1 Running 0 9s 10.254.1.56 k8s-node-1 <none> <none> [root@k8s-master ~]#
6 污点调度
https://www.cnblogs.com/klvchen/p/10025205.html
taints and tolerations 允许将某个节点做标记,以使得所有的pod都不会被调度到该节点上。但是如果某个pod明确制定了 tolerates 则可以正常调度到被标记的节点上。
# 可以使用命令行为 Node 节点添加 Taints:
kubectl taint nodes node1 key=value:NoSchedule
operator可以定义为:
Equal:表示key是否等于value,默认
Exists:表示key是否存在,此时无需定义value
tain 的 effect 定义对 Pod 排斥效果:
NoSchedule:仅影响调度过程,对现存的Pod对象不产生影响;
NoExecute:既影响调度过程,也影响显著的Pod对象;不容忍的Pod对象将被驱逐
PreferNoSchedule: 表示尽量不调度
#查看污点
[root@k8s-master schedule]# kubectl describe node k8s-master |grep Taints Taints: node-role.kubernetes.io/master:PreferNoSchedule [root@k8s-master schedule]#
#给node1打上污点
#kubectl taint node k8s-node-1 node-type=production:NoSchedule [root@k8s-master schedule]# kubectl describe node k8s-node-1 |grep Taints Taints: node-type=production:NoSchedule [root@k8s-master schedule]#
# cat deploy.yaml apiVersion: extensions/v1beta1 kind: Deployment metadata: name: nginx-deployment spec: replicas: 2 template: metadata: labels: app: nginx spec: containers: - name: nginx image: nginx:1.15 ports: - containerPort: 80
[root@k8s-master schedule]# kubectl apply -f deploy.yaml
deployment.extensions/nginx-deployment unchanged
#pod都运行在node-2上
[root@k8s-master schedule]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-deployment-6f6d9b887f-j5nmz 1/1 Running 0 83s 10.254.2.94 k8s-node-2 <none> <none> nginx-deployment-6f6d9b887f-wjfpp 1/1 Running 0 83s 10.254.2.93 k8s-node-2 <none> <none> [root@k8s-master schedule]#
#给node2打上污点
[root@k8s-master schedule]# kubectl delete deployments nginx-deployment deployment.extensions "nginx-deployment" deleted [root@k8s-master schedule]# kubectl get pods -o wide No resources found. [root@k8s-master schedule]# kubectl taint node k8s-node-2 node-type=production:NoSchedule node/k8s-node-2 tainted [root@k8s-master schedule]# kubectl describe node k8s-node-2 |grep Taints Taints: node-type=production:NoSchedule [root@k8s-master schedule]# [root@k8s-master schedule]# kubectl apply -f deploy.yaml deployment.extensions/nginx-deployment created
#结果pod都运行在master上了
[root@k8s-master schedule]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-deployment-6f6d9b887f-ck6pd 1/1 Running 0 15s 10.254.0.48 k8s-master <none> <none> nginx-deployment-6f6d9b887f-gdwm6 1/1 Running 0 15s 10.254.0.49 k8s-master <none> <none> [root@k8s-master schedule]#
#master也打上污点
[root@k8s-master schedule]# kubectl taint node k8s-master node-type=production:NoSchedule node/k8s-master tainted [root@k8s-master schedule]# kubectl delete deployments nginx-deployment deployment.extensions "nginx-deployment" deleted [root@k8s-master schedule]# kubectl apply -f deploy.yaml deployment.extensions/nginx-deployment created [root@k8s-master schedule]#
#没有节点可以启动pod
[root@k8s-master schedule]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-deployment-6f6d9b887f-mld4v 0/1 Pending 0 5s <none> <none> <none> <none> nginx-deployment-6f6d9b887f-q4nfj 0/1 Pending 0 5s <none> <none> <none> <none> [root@k8s-master schedule]#
#不能容忍污点
[root@k8s-master schedule]# kubectl describe pod nginx-deployment-6f6d9b887f-mld4v |tail -1 Warning FailedScheduling 51s (x6 over 3m29s) default-scheduler 0/3 nodes are available: 3 node(s) had taints that the pod didn't tolerate.
#定义Toleration(容忍)
# cat deploy.yaml apiVersion: extensions/v1beta1 kind: Deployment metadata: name: nginx-deployment spec: replicas: 2 template: metadata: labels: app: nginx spec: containers: - name: nginx image: nginx:1.15 ports: - containerPort: 80 tolerations: - key: "node-type" # #之前定义的污点名 operator: "Equal" #Exists,如果node-type污点在,就能容忍,Equal精确 value: "production" #污点值 effect: "NoSchedule" #效果
[root@k8s-master schedule]# kubectl apply -f deploy.yaml
deployment.extensions/nginx-deployment unchanged
#两个pod均衡调度到两个node
[root@k8s-master schedule]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-deployment-565dd6b94d-4cdhz 1/1 Running 0 32s 10.254.1.130 k8s-node-1 <none> <none> nginx-deployment-565dd6b94d-fqzm7 1/1 Running 0 32s 10.254.2.95 k8s-node-2 <none> <none> [root@k8s-master schedule]#
#删除污点
[root@k8s-master schedule]# kubectl describe nodes k8s-master |grep Taints Taints: node-role.kubernetes.io/master:PreferNoSchedule [root@k8s-master schedule]# kubectl describe nodes k8s-node-1 |grep Taints Taints: node-type=production:NoSchedule [root@k8s-master schedule]# kubectl taint node k8s-node-1 node-type- node/k8s-node-1 untainted [root@k8s-master schedule]# kubectl describe nodes k8s-node-1 |grep Taints Taints: <none> [root@k8s-master schedule]#