46、K8S-调度机制-Pod调度-亲和性-podAffinity
1、基础知识
1.1、什么是Pod调度
所谓的pod调度,主要说的是pod彼此之间的亲和性,也就是说,哪些pod应该在一起。
比如:我们的k8s集群的节点分布在不同的区域或者不同的机房
当服务A和服务B需要高效的交流数据的话,要求部署在同一个区域或者同一机房的时候。
当服务A需要做冗余操作,那么多个服务A必须在不同的位置
1.2、属性解析
kubectl explain pod.spec.affinity.podAffinity requiredDuringSchedulingIgnoredDuringExecution -- 硬亲和性: labelSelector 选择跟那组Pod亲和,前提得知道如何判断 namespaces 选择哪个命名空间进行条件匹配 topologyKey 指定节点上的哪个键,这是一个必选项 注意:这三个条件是一个逻辑与的关系 preferredDuringSchedulingIgnoredDuringExecution -- 软亲和性: podAffinityTerm # 与权重关联的亲和选项,这是一个必选项 labelSelector namespaces # 表示仅在指定的命名空间中查找 topologyKey # 先查询有该标签值,再去指定节点上调度对应的标签关键字 weight # 权重,这是一个必选项
2、Pod调度-affinity.podAffinity-实践
2.1、硬亲和性-requiredDuringSchedulingIgnoredDuringExecution
2.1.1、需求
创建两个在不同主机上的pod,然后亲和pod和标签env=test的pod部署在一起
2.1.2、给node节点创建标签
kubectl label node node2 env=test kubectl label node node1 env=dev
2.1.3、定义资源配置清单且应用【使用节点标签选择器在每个节点创建一个Pod】
kubectl apply -f - <<EOF apiVersion: v1 kind: Pod metadata: name: pod-dev labels: env: dev spec: containers: - name: pod-test image: 192.168.10.33:80/k8s/pod_test:v0.1 imagePullPolicy: IfNotPresent nodeSelector: env: dev --- apiVersion: v1 kind: Pod metadata: name: pod-test labels: env: test spec: containers: - name: pod-test image: 192.168.10.33:80/k8s/pod_test:v0.1 imagePullPolicy: IfNotPresent nodeSelector: env: test EOF # 该pod的调度策略,env: test调度到node2,env: dev调度到node1
2.1.4、查询pod运行情况
master1 ~]# kubectl get pod -o wide --show-labels NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES LABELS pod-dev 1/1 Running 0 38m 10.244.3.51 node1 <none> <none> env=dev pod-test 1/1 Running 0 38m 10.244.4.84 node2 <none> <none> env=test
2.1.5、定义资源配置清单并且应用【创建一个pod,使用节点硬亲和性调度】
kubectl apply -f - <<EOF apiVersion: v1 kind: Pod metadata: name: pod-affinity labels: env: test spec: containers: - name: pod-test image: 192.168.10.33:80/k8s/pod_test:v0.1 imagePullPolicy: IfNotPresent affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - {key: env, operator: In, values: ["test"]} namespaces: ["default"] topologyKey: kubernetes.io/hostname EOF
2.1.6、属性解析
说明解析: 1、给这个pod定义我们调度的标签,方便我们查询时,一眼能看出与哪个pod调度在一起 2、使用pod硬亲和性,表达式匹配标签,拓扑结构使用主机名类型 属性详解: topologyKey的来源: ]# kubectl get nodes -o yaml | grep hostname kubernetes.io/hostname: master1 kubernetes.io/hostname: master2 kubernetes.io/hostname: master3 kubernetes.io/hostname: node1 kubernetes.io/hostname: node2 其实指定 kubernetes.io/hostname 和 主机名 效果是一样的,只不过一个是自动获取,一个是手工指定
2.1.7、查询调度的结果
]# kubectl get pod -o wide --show-labels NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES LABELS pod-affinity 1/1 Running 0 8m39s 10.244.4.85 node2 <none> <none> env=test pod-dev 1/1 Running 0 53m 10.244.3.51 node1 <none> <none> env=dev pod-test 1/1 Running 0 53m 10.244.4.84 node2 <none> <none> env=test
# 查询发现,调度一致,都是把env=test调度到node2节点上
2.2、软亲和性-preferredDuringSchedulingIgnoredDuringExecution
2.2.1、需求
软亲和性类型,使用pod调度权重方式选择合适的pod调度到相同的节点。
2.2.2、给node节点创建标签
kubectl label node node2 env=test kubectl label node node1 env=dev
2.2.3、定义资源配置清单
kubectl apply -f - <<EOF apiVersion: v1 kind: Pod metadata: name: pod-affinity spec: containers: - name: pod-test image: 192.168.10.33:80/k8s/pod_test:v0.1 imagePullPolicy: IfNotPresent affinity: podAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 60 podAffinityTerm: labelSelector: matchExpressions: - {key: env, operator: In, values: ["dev"]} topologyKey: kubernetes.io/hostname - weight: 30 podAffinityTerm: labelSelector: matchExpressions: - {key: env, operator: In, values: ["test"]} topologyKey: kubernetes.io/hostname EOF 解析: 1、使用pod软亲和性调度,按权重选择最优符合的pod调度相同的节点。 2、拓朴类型:使用主机名
2.2.4、查询调度的结果
]# kubectl get pod -o wide --show-labels NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES LABELS pod-affinity 1/1 Running 0 9s 10.244.3.52 node1 <none> <none> <none> pod-dev 1/1 Running 0 107m 10.244.3.51 node1 <none> <none> env=dev pod-test 1/1 Running 0 107m 10.244.4.84 node2 <none> <none> env=test 分析: 被调度到权重最高的env=dev的pod相同的节点,说明标签权重pod调度是生效的。
2.2.5、删除env=dev的pod,重新创建pod-affinity对象,观察是否被调度到标签权重小pod相同的节点
# 删除pod kubectl delete pod pod-dev kubectl delete pod pod-affinity 重新创建pod,请重新执行一下:2.2.3、定义资源配置清单 master1 ~]# kubectl get pod -o wide --show-labels NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES LABELS pod-affinity 1/1 Running 0 10s 10.244.4.86 node2 <none> <none> <none> pod-test 1/1 Running 0 114m 10.244.4.84 node2 <none> <none> env=test # 可以看到,被调度到权重最小的pod节点一起
3、redis与pod软亲和性、硬亲和性-实践
3.1、redis硬亲和性-实践
3.1.1、需求
Pod必须把数据存储到redis中,必须在一起,提高写入速度。
3.1.2、定义资源配置清单且应用【创建redis的pod】
kubectl apply -f - <<EOF apiVersion: apps/v1 kind: Deployment metadata: name: redis spec: replicas: 1 selector: matchLabels: app: redis template: metadata: labels: app: redis spec: containers: - name: redis image: 192.168.10.33:80/k8s/redis:latest imagePullPolicy: IfNotPresent EOF
3.1.3、查询redis创建的结果
master1 ~]# kubectl get pod -o wide --show-labels NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES LABELS redis-56f97b7f57-x82dj 1/1 Running 0 10s 10.244.3.53 node1 <none> <none> app=redis,pod-template-hash=56f97b7f57
3.1.4、定义资源配置清单【创建测试pod,使用标签调度到跟pod标签匹配的相同节点上】
kubectl apply -f - <<EOF apiVersion: apps/v1 kind: Deployment metadata: name: pod-affinity-required spec: replicas: 4 selector: matchLabels: app: pod-test template: metadata: labels: app: pod-test spec: containers: - name: pod-test image: 192.168.10.33:80/k8s/pod_test:v0.1 imagePullPolicy: IfNotPresent affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - {key: app, operator: In, values: ["redis"]} topologyKey: kubernetes.io/hostname EOF 解析: 1、使用硬亲和性调度,根据运行的pod标签,选择调度到相同pod的节点。 2、同时开启4个容器,测试是否调度到app=redis pod相同的节点上。
3.1.5、查询调度是否符合预期
master1 ~]# kubectl get pods -o wide --show-labels NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES LABELS pod-affinity-required-5574987b9f-2rqlz 1/1 Running 0 12s 10.244.3.54 node1 <none> <none> app=pod-test,pod-template-hash=5574987b9f pod-affinity-required-5574987b9f-c8n2j 1/1 Running 0 12s 10.244.3.56 node1 <none> <none> app=pod-test,pod-template-hash=5574987b9f pod-affinity-required-5574987b9f-mnz44 1/1 Running 0 12s 10.244.3.57 node1 <none> <none> app=pod-test,pod-template-hash=5574987b9f pod-affinity-required-5574987b9f-sp2qm 1/1 Running 0 12s 10.244.3.55 node1 <none> <none> app=pod-test,pod-template-hash=5574987b9f redis-56f97b7f57-x82dj 1/1 Running 0 5m54s 10.244.3.53 node1 <none> <none> app=redis,pod-template-hash=56f97b7f57
# 可以看出来,所有都被调度到跟redis相同的节点上,硬亲和性生效
3.2、redis软亲和性-实践
3.2.1、需求
pod必须把数据存储到redis中,原则上最好在一起,实在不行的话,划分到其他节点也可以
示例:申请资源的时候cpu或内存,已经写上预期,分析软亲和性调度,是否当资源分配完成后,会不会调度到权限低的节点上。
3.2.2、定义资源配置清单且应用【创建redis的pod】
kubectl apply -f - <<EOF apiVersion: apps/v1 kind: Deployment metadata: name: redis-preferred spec: replicas: 1 selector: matchLabels: app: redis template: metadata: labels: app: redis spec: nodeName: node1 containers: - name: redis image: 192.168.10.33:80/k8s/redis:latest imagePullPolicy: IfNotPresent resources: requests: cpu: 300m memory: 512Mi EOF 解析: 1、提前写上预期的cpu和内存 2、使用节点调度,把它调度到node1上
3.2.3、查询redis创建的结果
master1 ~]# kubectl get pods -o wide --show-labels NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES LABELS redis-preferred-58975865b8-qxx8l 1/1 Running 0 32s 10.244.3.58 node1 <none> <none> app=redis,pod-template-hash=58975865b8
3.2.4、定义资源配置清单【创建测试pod,利用资源限制测试软亲和性,pod标签选择的权重调度】
kubectl apply -f - <<EOF apiVersion: apps/v1 kind: Deployment metadata: name: pod-affinity-preferred spec: replicas: 4 selector: matchLabels: app: pod-test template: metadata: labels: app: pod-test spec: containers: - name: pod-test image: 192.168.10.33:80/k8s/pod_test:v0.1 imagePullPolicy: IfNotPresent resources: requests: cpu: 300m memory: 500Mi affinity: podAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 podAffinityTerm: labelSelector: matchExpressions: - {key: app, operator: In, values: ["redis"]} topologyKey: kubernetes.io/hostname - weight: 50 podAffinityTerm: labelSelector: matchExpressions: - {key: app, operator: In, values: ["redis"]} topologyKey: env EOF 解析: 1、生成4个容器,每个容器cpu:300m,内存:500Mi。 2、软亲和性,pod标签匹配,按权重调度。 3、拓朴:使用标签env
3.2.5、查询运行结果且分析总结
master1 ~]# kubectl get pods -o wide --show-labels NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES LABELS pod-affinity-preferred-8447b54dd7-fcs4h 1/1 Running 0 84s 10.244.4.89 node2 <none> <none> app=pod-test,pod-template-hash=8447b54dd7 pod-affinity-preferred-8447b54dd7-gsl4x 0/1 Pending 0 84s <none> <none> <none> <none> app=pod-test,pod-template-hash=8447b54dd7 pod-affinity-preferred-8447b54dd7-jwk46 1/1 Running 0 84s 10.244.3.60 node1 <none> <none> app=pod-test,pod-template-hash=8447b54dd7 pod-affinity-preferred-8447b54dd7-xpzlf 1/1 Running 0 84s 10.244.4.90 node2 <none> <none> app=pod-test,pod-template-hash=8447b54dd7 redis-preferred-58975865b8-qxx8l 1/1 Running 0 9m35s 10.244.3.58 node1 <none> <none> app=redis,pod-template-hash=58975865b8 解析: 1、redis调度相同的节点只有一个pod master1 ~]# kubectl describe nodes node1 ... Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 850m (85%) 0 (0%) # 已经使用850不够分配,所以调度到其它权重小的节点上 memory 1012Mi (27%) 0 (0%) ephemeral-storage 0 (0%) 0 (0%) hugepages-1Gi 0 (0%) 0 (0%) hugepages-2Mi 0 (0%) 0 (0%) ... 2、node2为什么只调度有2个pod? master1 ~]# kubectl describe nodes node2 Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 850m (85%) 0 (0%) # 因为每个容器需要占用300,已经到850,无法在分配,所以一直挂起,直到有节点符合资源配置,才被调度运行 memory 1000Mi (27%) 0 (0%) ephemeral-storage 0 (0%) 0 (0%) hugepages-1Gi 0 (0%) 0 (0%) hugepages-2Mi 0 (0%) 0 (0%) 3、查看panding原因 master1 ~]# kubectl describe pod pod-affinity-preferred-8447b54dd7-gsl4x Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 68s default-scheduler 0/5 nodes are available: 2 Insufficient cpu, 3 node(s)
had untolerated taint {node-role.kubernetes.io/control-plane: }. preemption: 0/5 nodes are available: 2 No
preemption victims found for incoming pod, 3 Preemption is not helpful for scheduling. # 从上面日志分析,就是CPU资源不够分配的原因。