k8s-Pod调度策略
首先在k8s中,k8s会根据每个work节点的配置,负载差异,自动生成优选函数,根据优选函数,当master节点分配下来任务时,将pod分配带最适合运行的node节点上。
之外我们技术人员还有以下三种方式去影响我们的pod调度,
1. node节点调度器
2. 亲和性调度
3. 污点容忍度
4. 资源影响调度
区别和实例操作
一 .node节点调度
是最直接的调度方式,简单粗暴,所以常用在简单的集群架构中,负载的资源分类和编制不适合这种方式,
解释:大概意思就是给我们的work节点绑定唯一便签,然后在pod的yml文件中去设置node便签匹配器绑定节点,这样就能实现影响k8s优选参数的选择,让当前的pod启动在设置的node节点上。
1、首先通过 kubectl 给 node 打上标签
格式: kubectl label nodes <node-name> <label-key>=<label-value> [root@k8s-master ~]# kubectl label nodes k8s-node01 zone=sh [root@k8s-master ~]# kubectl get nodes --show-labels
2、通过 nodeSelector 调度 pod 到 node
- Pod的定义中通过nodeSelector指定label标签,pod将会只调度到具有该标签的node之上
[root@k8s-master ~]# vim pod-demo.yaml apiVersion: v1 kind: Pod metadata: name: nginx labels: env: test spec: containers: - name: nginx image: nginx imagePullPolicy: IfNotPresent nodeSelector: disktype: ssd
- 这个例子中pod只会调度到具有 disktype=ssd 的 node 上面.
- 验证 节点调度
[root@k8s-master ~]# kubectl apply -f pod-demo.yaml [root@k8s-master ~]# kubectl get pods -o wide [root@k8s-master ~]# kubectl describe pod pod-demo ##查看事件
二 .亲和性调度
较复杂,应用在复杂的多节点归类,资源分类管理的中大型集群中,有硬亲和,软亲和,亲和性和反亲和,两两为一组,反义词
硬亲和:匹配节点上的其中一个或多个标签(必须存在一个)
软亲和:匹配节点上的其中一个或多个标签(有则选择这个node,没有就参考优选函数)
1.硬亲和
[root@k8s-master ~]# vim pod-nodeaffinity-demo.yaml apiVersion: v1 kind: Pod metadata: name: pod-node-affinity-demo labels: app: myapp tier: frontend spec: containers: - name: myapp image: ikubernetes/myapp:v1 affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: zone operator: In values: - foo - bar [root@k8s-master ~]# kubectl apply -f pod-nodeaffinity-demo.yaml [root@k8s-master ~]# kubectl describe pod pod-node-affinity-demo # 运行结果: Warning FailedScheduling 2s (x8 over 20s) default-scheduler 0/3 nodes are available: 3 node(s) didn't match node selector. # 给其中一个node打上foo的标签 [root@k8s-master ~]# kubectl label node k8s-node01 zone=foo # 正常启动 [root@k8s-master ~]# kubectl get pods
2.软亲和
[root@k8s-master ~]# vim pod-nodeaffinity-demo-2.yaml apiVersion: v1 kind: Pod metadata: name: pod-node-affinity-demo-2 labels: app: myapp tier: frontend spec: containers: - name: myapp image: ikubernetes/myapp:v1 affinity: nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution: - preference: matchExpressions: - key: zone operator: In values: - foo - bar weight: 60 - preference: matchExpressions: - key: zone1 operator: In values: - foo1 - bar1 weight: 10 [root@k8s-master ~]# kubectl apply -f pod-nodeaffinity-demo-2.yaml
3.同时存在时(硬,软亲和)
apiVersion: v1 kind: Pod metadata: name: with-node-affinity spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: zone operator: In values: - dev preferredDuringSchedulingIgnoredDuringExecution: - weight: 1 preference: matchExpressions: - key: disktype operator: In values: - ssd containers: - name: with-node-affinity image: nginx
三.污点和容忍度
与之前两个调度方式不同,污点是首先给节点绑定污点,作用是保护节点,不再让这个节点会scheduler(资源调度)选为pod启动环境。
我们集群中master就是设置污点,所以你启动任何pod都不会在master上工作,保证master的工作效率。
相关参数
介绍几个用到的参数
operator 可以定义为
Equal:表示key是否等于value,默认
Exists:表示key是否存在,此时无需定义value
tain 的 effect 定义对 Pod 排斥效果
NoSchedule:仅影响调度过程,对现存的Pod对象不产生影响;
NoExecute:既影响调度过程,也影响现有的Pod对象;不容忍的Pod对象将被驱逐
PreferNoSchedule: 表示尽量不调度
查看节点的污点
kubectl describe node k8s-node01 | grep Taints
设置污点
kubectl taint node k8s-node01 node-type=production:NoSchedule
创建容器测试
[root@k8s-master ~]# vim deploy-demo.yaml apiVersion: apps/v1 kind: Deployment metadata: name: myapp-deploy namespace: default spec: replicas: 2 selector: matchLabels: app: myapp release: canary template: metadata: labels: app: myapp release: canary spec: containers: - name: myapp image: ikubernetes/myapp:v1 ports: - name: http containerPort: 80 [root@k8s-master ~]# kubectl apply -f deploy-demo.yaml [root@k8s-master ~]# kubectl get pods -o wide # 运行结果: NAME READY STATUS RESTARTS AGE IP NODE myapp-deploy-69b47bc96d-cwt79 1/1 Running 0 5s 10.244.2.6 k8s-node02 myapp-deploy-69b47bc96d-qqrwq 1/1 Running 0 5s 10.244.2.5 k8s-node02
所以只能启动到没有污点的node2节点上
设置一个能容忍node1污点的pod测试
[root@k8s-master ~]# vim deploy-demo.yaml apiVersion: apps/v1 kind: Deployment metadata: name: myapp-deploy namespace: defaultm spec: replicas: 2 selector: matchLabels: app: myapp release: canary template: metadata: labels: app: myapp release: canary spec: containers: - name: myapp image: ikubernetes/myapp:v2 ports: - name: http containerPort: 80 tolerations: - key: "node-type" operator: "Equal" value: "production" effect: "NoSchedule" [root@k8s-master ~]# kubectl apply -f deploy-demo.yaml
测试
[root@k8s-master ~]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE myapp-deploy-65cc47f858-tmpnz 1/1 Running 0 10s 10.244.1.10 k8s-node01 myapp-deploy-65cc47f858-xnklh 1/1 Running 0 13s 10.244.1.9 k8s-node01
其他参数的作用
- 定义Toleration,是否存在 node-type 这个key 且 effect 值为 NoSchedule
[root@k8s-master ~]# vim deploy-demo.yaml apiVersion: apps/v1 kind: Deployment metadata: name: myapp-deploy namespace: default spec: replicas: 2 selector: matchLabels: app: myapp release: canary template: metadata: labels: app: myapp release: canary spec: containers: - name: myapp image: ikubernetes/myapp:v2 ports: - name: http containerPort: 80 tolerations: - key: "node-type" operator: "Exists" value: "" effect: "NoSchedule" [root@k8s-master ~]# kubectl apply -f deploy-demo.yaml [root@k8s-master ~]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE myapp-deploy-559f559bcc-6jfqq 1/1 Running 0 10s 10.244.1.11 k8s-node01 myapp-deploy-559f559bcc-rlwp2 1/1 Running 0 9s 10.244.1.12 k8s-node01
- 定义Toleration,是否存在 node-type 这个key 且 effect 值为空,则包含所有的值
[root@k8s-master ~]# vim deploy-demo.yaml apiVersion: apps/v1 kind: Deployment metadata: name: myapp-deploy namespace: default spec: replicas: 2 selector: matchLabels: app: myapp release: canary template: metadata: labels: app: myapp release: canary spec: containers: - name: myapp image: ikubernetes/myapp:v2 ports: - name: http containerPort: 80 tolerations: - key: "node-type" operator: "Exists" value: "" effect: "" [root@k8s-master ~]# kubectl apply -f deploy-demo.yaml # 两个 pod 均衡调度到两个节点 kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE myapp-deploy-5d9c6985f5-hn4k2 1/1 Running 0 2m 10.244.1.13 k8s-node01 myapp-deploy-5d9c6985f5-lkf9q 1/1 Running 0 2m 10.244.2.7 k8s-node02
4. 资源影响调度
在创建pod 的时候会配置resource 来设定容器需要的CPU 和memory,,主要是包括 requsts 和 limits,, 当然schedule 在调度的时候也会根据node 上剩余资源的情况,来调度pod 在哪个节点上。这里就不做具体演示了。
原文:https://blog.csdn.net/ht9999i/article/details/108037568