k8s-高级调度方式-二十一
两类:
- 节点选择器:nodeSelector(给node打上标签,pod通过标签预选节点),nodeName
- 节点亲和调度:nodeAffinity
1、节点选择器(nodeSelector,nodeName)
[root@master ~]# kubectl explain pods.spec.nodeSelector [root@master schedule]# pwd /root/manifests/schedule [root@master schedule]# vim pod-demo.yaml apiVersion: v1 kind: Pod metadata: name: pod-demo namespace: default labels: app: myapp tier: frontend annotations: mageedu.com/created-by: "cluster admin" spec: containers: - name: myapp image: ikubernetes/myapp:v1 nodeSelector: #节点选择器 disktype: ssd #该pod运行在有disktype=ssd标签的node节点上
[root@master schedule]# kubectl apply -f pod-demo.yaml pod/pod-demo created [root@master schedule]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod-demo 1/1 Running 0 8m13s 10.244.1.6 node01 <none> <none> [root@master schedule]# kubectl get nodes --show-labels |grep node01 node01 Ready <none> 76d v1.13.1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disktype=ssd,kubernetes.io/hostname=node01 #可见新创建的pod已经运行在node01上了,因为node01上有disktype=ssd标签;
接下来我们给node02打上标签,修改一下资源定义清单文件,再创建pod:
将node02打上标签,pod资源清单里面的节点选择器里,改为和node02一样的标签;
[root@master schedule]# kubectl delete -f pod-demo.yaml [root@master ~]# kubectl label nodes node02 disktype=harddisk node/node02 labeled [root@master schedule]# vim pod-demo.yaml apiVersion: v1 kind: Pod metadata: name: pod-demo namespace: default labels: app: myapp tier: frontend annotations: mageedu.com/created-by: "cluster admin" spec: containers: - name: myapp image: ikubernetes/myapp:v1 nodeSelector: disktype: harddisk [root@master schedule]# kubectl get nodes --show-labels |grep node02 node02 Ready <none> 76d v1.13.1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disktype=harddisk,kubernetes.io/hostname=node02 [root@master schedule]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod-demo 1/1 Running 0 104s 10.244.2.5 node02 <none> <none>
可见pod已经运行在node02上了;
2、节点亲和度调度
[root@master scheduler]# kubectl explain pods.spec.affinity [root@master scheduler]# kubectl explain pods.spec.affinity.nodeAffinity preferredDuringSchedulingIgnoredDuringExecution:软亲和, requiredDuringSchedulingIgnoredDuringExecution:硬亲和,表示必须满足 [root@master ~]# kubectl explain pods.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms.matchExpressions #硬亲和性 [root@master schedule]# vim pod-nodeaffinity-demo.yaml apiVersion: v1 kind: Pod metadata: name: pod-node-affinity-demo namespace: default labels: app: myapp tier: frontend spec: containers: - name: myapp image: ikubernetes/myapp:v1 affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: zone operator: In values: - foo - bar [root@master schedule]# kubectl apply -f pod-nodeaffinity-demo.yaml pod/pod-node-affinity-demo created [root@master schedule]# kubectl get pods NAME READY STATUS RESTARTS AGE pod-node-affinity-demo 0/1 Pending 0 76s #此时pod是Pending, 是因为没有节点满足条件;
下面我们再创建一个软亲和性的pod:
#软亲和性,就算没有符合条件的节点,也会找一个勉强运行; [root@master schedule]# kubectl delete -f pod-nodeaffinity-demo.yaml pod "pod-node-affinity-demo" deleted [root@master schedule]# vim pod-nodeaffinity-demo2.yaml apiVersion: v1 kind: Pod metadata: name: pod-node-affinity-demo2 namespace: default labels: app: myapp tier: frontend spec: containers: - name: myapp image: ikubernetes/myapp:v1 affinity: nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution: - preference: matchExpressions: - key: zone operator: In values: - foo - bar weight: 60 [root@master schedule]# kubectl apply -f pod-nodeaffinity-demo2.yaml pod/pod-node-affinity-demo2 created [root@master schedule]# kubectl get pods #可见pod已经运行了 NAME READY STATUS RESTARTS AGE pod-node-affinity-demo2 1/1 Running 0 74s pod-node-affinity-demo-2 运行起来了,因为这个pod我们是定义的软亲和性,即使没有符合条件的及诶单,也会找个节点让Pod运行起来
3、pod亲和性调度
比如在机房中,我们可以将一个机柜中的机器都打上标签,让pod调度的时候,对此机柜有亲和性;
或者将机柜中某几台机器打上标签,让pod调度的时候,对这几个机器有亲和性;
#查看资源定义清单字段 [root@master ~]# kubectl explain pods.spec.affinity.podAffinity FIELDS: preferredDuringSchedulingIgnoredDuringExecution <[]Object> #软亲和 requiredDuringSchedulingIgnoredDuringExecution <[]Object> #硬亲和 [root@master ~]# kubectl explain pods.spec.affinity.podAffinity.requiredDuringSchedulingIgnoredDuringExecution FIELDS: labelSelector <Object> #表示选定一组资源,(跟哪些pod进行亲和); namespaces <[]string> #指定Pod属于哪个名称空间中,一般不跨名称空间去引用 topologyKey <string> -required- #定义键(要亲和的关键字)
pod硬亲和性调度:
[root@master ~]# kubectl get nodes --show-labels NAME STATUS ROLES AGE VERSION LABELS master Ready master 77d v1.13.1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=master,node-role.kubernetes.io/master= node01 Ready <none> 77d v1.13.1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disktype=ssd,kubernetes.io/hostname=node01 node02 Ready <none> 76d v1.13.1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disktype=harddisk,kubernetes.io/hostname=node02 #资源定义清单 [root@master schedule]# vim pod-requieed-affinity-demo.yaml apiVersion: v1 kind: Pod metadata: name: pod-first namespace: default labels: app: myapp tier: frontend spec: containers: - name: myapp image: ikubernetes/myapp:v1 --- apiVersion: v1 kind: Pod metadata: name: pod-second namespace: default labels: app: backend tier: db spec: containers: - name: busybox #前面的-号表示这是一个列表格式的,也可以用中括号表示 image: busybox:latest imagePullPolicy: IfNotPresent command: ["sh","-c","sleep 3600"] affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution: #硬亲和性 - labelSelector: matchExpressions: - {key: app,operator: In,values: ["myapp"]} #意思是当前这个pod要跟一个有着标签app=myapp(要和上面pod-first的metadata里面的标签一致)的pod在一起 topologyKey: kubernetes.io/hostname #匹配的节点key是kubernetes.io/hostname #创建 [root@master schedule]# kubectl apply -f pod-requieed-affinity-demo.yaml pod/pod-first created pod/pod-second created [root@master schedule]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod-first 1/1 Running 0 3m25s 10.244.2.9 node02 <none> <none> pod-second 1/1 Running 0 3m25s 10.244.2.10 node02 <none> <none> #可以看到我们的两个pod都运行在同一个节点了,这是因为pod-second会和pod-first运行在同一个节点上,pod-second依赖于pod-first;
4、pod反亲和性调度
[root@master ~]# kubectl explain pods.spec.affinity.podAntiAffinity.requiredDuringSchedulingIgnoredDuringExecution.labelSelector FIELDS: matchExpressions <[]Object> matchLabels <map[string]string> [root@master schedule]# kubectl delete -f pod-requieed-affinity-demo.yaml #删掉刚才的pod #资源定义清单 [root@master schedule]# vim pod-requieed-Anti-affinity-demo.yaml apiVersion: v1 kind: Pod metadata: name: pod-first namespace: default labels: app: myapp tier: frontend spec: containers: - name: myapp image: ikubernetes/myapp:v1 --- apiVersion: v1 kind: Pod metadata: name: pod-second namespace: default labels: app: backend tier: db spec: containers: - name: busybox image: busybox:latest imagePullPolicy: IfNotPresent command: ["sh","-c","sleep 3600"] affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - {key: app,operator: In,values: ["myapp"]} topologyKey: kubernetes.io/hostname #创建 [root@master schedule]# kubectl apply -f pod-requieed-Anti-affinity-demo.yaml pod/pod-first created pod/pod-second created [root@master schedule]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod-first 1/1 Running 0 53s 10.244.1.7 node01 <none> <none> pod-second 1/1 Running 0 53s 10.244.2.11 node02 <none> <none> #可见pod-first和pod-second就不会被调度到同一个节点上;
下面可以给两个节点打相同的标签,因为pod调度策略是podAntiAffinity反亲和性,所以pod-first和pod-second不能同时运行在标有zone标签的节点上;
最终出现的情况就是有一个pod-first能成功运行,而另外一个pod-second因为是反亲和的,没有节点可以运行而处于pending状态;
#打标,相同的标签 [root@master ~]# kubectl label nodes node01 zone=foo node/node01 labeled [root@master ~]# kubectl label nodes node02 zone=foo [root@master schedule]# kubectl delete -f pod-requieed-Anti-affinity-demo.yaml #删掉pod #资源定义定义清单 [root@master schedule]# vim pod-requieed-Anti-affinity-demo.yaml apiVersion: v1 kind: Pod metadata: name: pod-first namespace: default labels: app: myapp tier: frontend spec: containers: - name: myapp image: ikubernetes/myapp:v1 --- apiVersion: v1 kind: Pod metadata: name: pod-second namespace: default labels: app: backend tier: db spec: containers: - name: busybox image: busybox:latest imagePullPolicy: IfNotPresent command: ["sh","-c","sleep 3600"] affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - {key: app,operator: In,values: ["myapp"]} topologyKey: zone #节点标签改为zone #创建 [root@master schedule]# kubectl apply -f pod-requieed-Anti-affinity-demo.yaml pod/pod-first created pod/pod-second created [root@master schedule]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod-first 1/1 Running 0 4s 10.244.2.12 node02 <none> <none> pod-second 0/1 Pending 0 4s <none> <none> <none> <none> #可见pod-first能成功运行,而pod-second因为是反亲和的,没有节点可以运行而处于pending状态;
5、污点调度
污点调度是让节点来选择哪些pod能运行在其上面,污点(taints)用在节点上,容忍度(Tolerations )用在pod上;
污点定义:
[root@master ~]# kubectl explain nodes.spec.taints #taints:定义节点的污点 FIELDS: effect <string> -required- #表示当pod不能容忍节点上污点时的行为是什么,主要有以下三种行为: {NoSchedule:仅影响调度过程,不影响现存pod。没调度过来的就调度不过来了。如果对节点新加了污点,那么对节点上现存的Pod没有影响。 NoExecute:既影响调度过程,也影响现存Pod,没调度过来的就调度不过来了,如果对节点新加了污点,那么对现存的pod对象将会被驱逐 PreferNoSchedule:不能容忍就不能调度过来,但是实在没办法也是能调度过来的。对节点新加了污点,那么对节点上现存的pod没有影响。} key <string> -required- timeAdded <string> value <string> #查看节点的污点 [root@master ~]# kubectl describe node node01 |grep Taints Taints: <none> [root@master ~]# kubectl describe node node02 |grep Taints Taints: <none> #查看pod的容忍度 [root@master ~]# kubectl describe pods kube-apiserver-master -n kube-system |grep Tolerations Tolerations: :NoExecute [root@master ~]# kubectl taint -h | grep -A 1 Usage #给节点打污点的方式 Usage: kubectl taint NODE NAME KEY_1=VAL_1:TAINT_EFFECT_1 ... KEY_N=VAL_N:TAINT_EFFECT_N [options] 污点和容忍度都是自定义的键值对形式; 下面给node1打上污点node-type=production:NoSchedule: [root@master ~]# kubectl taint node node01 node-type=production:NoSchedule node/node01 tainted #pod资源定义清单,此文件没有定义容忍度,但是node01有污点,pod应该都会运行在node02上; [root@master schedule]# vim deploy-demo.yaml apiVersion: apps/v1 kind: Deployment metadata: name: myapp-deploy namespace: default spec: replicas: 3 selector: matchLabels: app: myapp release: canary template: metadata: labels: app: myapp release: canary spec: containers: - name: myapp image: ikubernetes/myapp:v2 ports: - name: http containerPort: 80 #创建 [root@master schedule]# kubectl apply -f deploy-demo.yaml deployment.apps/myapp-deploy created #可见pod都运行在了node02上 [root@master schedule]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES myapp-deploy-6b56d98b6b-52hth 1/1 Running 0 9s 10.244.2.15 node02 <none> <none> myapp-deploy-6b56d98b6b-dr224 1/1 Running 0 9s 10.244.2.14 node02 <none> <none> myapp-deploy-6b56d98b6b-z278x 1/1 Running 0 9s 10.244.2.13 node02 <none> <none>
容忍度定义:
[root@master ~]# kubectl explain pods.spec.tolerations FIELDS: effect <string> key <string> operator <string> #两个值:Exists表示只要节点有这个污点的key,pod都能容忍,值是什么都行;Equal表示只要节点必须精确匹配污点的key和value才能容忍; tolerationSeconds <integer> #表示宽限多长时间pod才会被驱逐 value <string> [root@master ~]# kubectl taint node node02 node-type=dev:NoExecute #给node02打上另一个标签 node/node02 tainted [root@master schedule]# kubectl delete -f deploy-demo.yaml #资源定义清单 [root@master schedule]# vim deploy-demo.yaml apiVersion: apps/v1 kind: Deployment metadata: name: myapp-deploy namespace: default spec: replicas: 3 selector: matchLabels: app: myapp release: canary template: metadata: labels: app: myapp release: canary spec: containers: - name: myapp image: ikubernetes/myapp:v2 ports: - name: http containerPort: 80 tolerations: - key: "node-type" operator: "Equal" #要精确匹配污点键值 value: "production" effect: "NoSchedule" #创建pod [root@master schedule]# kubectl apply -f deploy-demo.yaml deployment.apps/myapp-deploy created [root@master schedule]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES myapp-deploy-779c578779-5vkbw 1/1 Running 0 12s 10.244.1.12 node01 <none> <none> myapp-deploy-779c578779-bh9td 1/1 Running 0 12s 10.244.1.11 node01 <none> <none> myapp-deploy-779c578779-dn52p 1/1 Running 0 12s 10.244.1.13 node01 <none> <none> #可见pod都运行在了node01上,因为我们设置了pod能容忍node01的污点;
下面我们把operator: "Equal"改成operator: "Exists"
Exists表示只要节点有这个污点的key,pod都能容忍,值是什么都行;
[root@master schedule]# kubectl delete -f deploy-demo.yaml [root@master schedule]# vim deploy-demo.yaml apiVersion: apps/v1 kind: Deployment metadata: name: myapp-deploy namespace: default spec: replicas: 3 selector: matchLabels: app: myapp release: canary template: metadata: labels: app: myapp release: canary spec: containers: - name: myapp image: ikubernetes/myapp:v2 ports: - name: http containerPort: 80 tolerations: - key: "node-type" operator: "Exists" value: "" effect: "" #不设置行为 #创建 [root@master schedule]# kubectl apply -f deploy-demo.yaml deployment.apps/myapp-deploy create [root@master schedule]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES myapp-deploy-69b95476c8-bfpgj 1/1 Running 0 13s 10.244.2.20 node02 <none> <none> myapp-deploy-69b95476c8-fhwbd 1/1 Running 0 13s 10.244.1.17 node01 <none> <none> myapp-deploy-69b95476c8-tzzlx 1/1 Running 0 13s 10.244.2.19 node02 <none> <none> #可见,node01 node02上面都有pod了; effect:不设置表示什么行为都能容忍;
最后可以去除节点上的污点:
#去除污点命令,删除指定key上所有的effect [root@master ~]# kubectl taint node node02 node-type- node/node02 untainted [root@master ~]# kubectl taint node node01 node-type- node/node01 untainted