45、K8S-调度机制-节点调度-亲和性-nodeAffinity、nodeName、nodeSelector
1、配置解析
1.1、简介
我们知道,默认的调度策略有可能无法满足我们的需求,我们可以根据实际情况,定制自己的调度策略,然后整合到k8s的集群中。
1.2、属性解析
apiVersion: kubescheduler.config.k8s.io/v1beta1 kind: KubeSchedulerConfiguration AlgorithmSource: # 指定调度算法配置源,v1alpha2版本起该配置进入废弃阶段 Policy: # 基于调度策略的调度算法配置源 File: # 文件格式的调度策略 Path <string>: # 调度策略文件policy.cfg的位置 ConfigMap: # configmap格式的调度策略 Namespace <string> # 调度策略configmap资源隶属的名称空间 Name <string> # configmap资源的名称 Provider <string> # 配置使用的调度算法的名称,例如DefaultProvider LeaderElection: {} # 多kube-scheduler实例并在时使用的领导选举算法 ClientConnection: {} # 与API Server通信时提供给代理服务器的配置信息 HealthzBindAddress <string> # 响应健康状态检测的服务器监听的地址和端口 MetricsBindAddress <string> # 响应指标抓取请求的服务器监听地址和端口 DisablePreemption <bool> # 是否禁用抢占模式,false表示不禁用 PercentageOfNodesToScore <int32> # 需要过滤出的可用节点百分比 BindTimeoutSeconds <int64> # 绑定操作的超时时长,必须使用非负数 PodInitialBackoffSeconds <int64> # 不可调度Pod的初始补偿时长,默认值为1 PodMaxBackoffSeconds <int64> # 不可调度Pod的最大补偿时长,默认为10 Profiles <[]string> # 加载的KubeSchedulerProfile配置列表,v1beta1支持多个 Extenders <[]Extender> # 加载的Extender列表
2、自定义的调度策略-实践【了解】
KubeSchedulerConfiguration v1beta2 在1.25版本开始,已弃用
2.1、定制调度文件【所有master】
2.1.1、准备文件目录
mkdir /etc/kubernetes/scheduler
2.1.2、定义的调度策略资源清单
cat > /etc/kubernetes/scheduler/kube_scheduler_configuration.yaml <<EOF apiVersion: kubescheduler.config.k8s.io/v1beta1 kind: KubeSchedulerConfiguration clientConnection: kubeconfig: "/etc/kubernetes/scheduler.conf" profiles: - schedulerName: default-scheduler - schedulerName: demo-scheduler plugins: filter: disabled: - name: NodeUnschedulable score: disabled: - name: NodeResourcesBalancedAllocation weight: 1 - name: NodeResourcesLeastAllocated weight: 1 enabled: - name: NodeResourcesMostAllocated weight: 5 EOF 配置解析: schedulerName: default-scheduler 表示,不影响正常的调度策略 NodeResourcesBalancedAllocation 禁用了节点资源的平均分步 NodeResourcesLeastAllocated 禁用了节点最少资源调度 NodeResourcesMostAllocated 启用了所有资源最好都放在一个节点上。
2.1.3、修改kube-scheduler.yaml
vi /etc/kubernetes/manifests/kube-scheduler.yaml spec: containers: - command: - kube-scheduler - --authentication-kubeconfig=/etc/kubernetes/scheduler.conf - --authorization-kubeconfig=/etc/kubernetes/scheduler.conf - --config=/etc/kubernetes/scheduler/kube_scheduler_configuration.yaml ... volumeMounts: - mountPath: /etc/kubernetes/scheduler.conf name: kubeconfig readOnly: true - mountPath: /etc/kubernetes/scheduler name: schedconf readOnly: true volumes: - hostPath: path: /etc/kubernetes/scheduler.conf type: FileOrCreate name: kubeconfig - hostPath: path: /etc/kubernetes/scheduler type: DirectoryOrCreate name: schedconf ...
2.1.4、确认kube-scheduler正常重启
master1 ~]# kubectl -n kube-system get pod | grep sche kube-scheduler-master1 1/1 Running 0 46s kube-scheduler-master2 1/1 Running 0 32m kube-scheduler-master3 1/1 Running 0 32m
2.1.5、定义测试的资源配置清单
kubectl apply -f - <<EOF apiVersion: apps/v1 kind: Deployment metadata: name: deployment-test spec: replicas: 5 selector: matchLabels: app: pod-test template: metadata: labels: app: pod-test spec: schedulerName: demo-scheduler containers: - name: nginxpod-test image: 192.168.10.33:80/k8s/pod_test:v0.1 imagePullPolicy: IfNotPresent EOF 属性解析: schedulerName 表示,我们采用哪种调度策略。
2.1.6、查看效果
会发现所有的资源都被调度到了同一个节点上。
3、节点调度-nodeName、nodeSelector-实践
3.1、基础知识
3.1.1、关于资源的调度-场景
指定节点 - 根据节点的标签,直接将应用部署到指定的节点
节点亲和 - 根据任务的配置倾向,选择合适的节点来进行资源的配置
3.1.2、对于节点的调度-场景
1、节点亲和 kubectl explain pod.spec.affinity.nodeAffinity 2、节点软亲和 - preferredDuringSchedulingIgnoredDuringExecution 3、节点硬亲和 - requiredDuringSchedulingIgnoredDuringExecution 注意:这些调度策略,仅在调度时候有用,一旦调度成功后,节点状态无法满足调度要求,也不会影响正常运行的pod
3.2、调度应用到指定的节点-nodeName-实践
3.2.1、定义资源配置清单并且应用
kubectl apply -f - <<EOF apiVersion: v1 kind: Pod metadata: name: pod-nodename spec: nodeName: node1 containers: - name: demoapp image: 192.168.10.33:80/k8s/pod_test:v0.1 imagePullPolicy: IfNotPresent EOF
3.2.2、查看是否调度到node1节点
master1 ~]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod-nodename 1/1 Running 0 6s 10.244.3.44 node1 <none> <none>
3.3、根据node标签的调度-nodeSelector实践
3.3.1、给node2打上标签
master1 ~]# kubectl label node node2 name=node2 node/node2 labeled master1 ~]# kubectl get nodes node2 --show-labels NAME STATUS ROLES AGE VERSION LABELS node2 Ready <none> 2d16h v1.25.7 ...,kubernetes.io/os=linux,name=node2 # 标签为name=node2 标签管理,请参考文章:https://www.cnblogs.com/ygbh/p/17221425.html#_label2
3.3.2、定义资源配置清单并且应用
kubectl apply -f - <<EOF apiVersion: v1 kind: Pod metadata: name: pod-nodeselector spec: containers: - name: demoapp image: 192.168.10.33:80/k8s/pod_test:v0.1 imagePullPolicy: IfNotPresent nodeSelector: name: node2 EOF
3.3.3、查询是否按标签name=node2调度到node2
master1 ~]# kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod-nodeselector 1/1 Running 0 51s 10.244.4.76 node2 <none> <none>
3.3.4、注意:如果没有标签的调度情况
1、pod状态一直pending。 2、一旦有标签,就会调度到相应的节点上
4、节点调度-affinity.nodeAffinity-实践
4.1、requiredDuringSchedulingIgnoredDuringExecution-硬亲和
4.1.1、属性解析
kubectl explain pod.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution
-requiredDuringSchedulingIgnoredDuringExecution # 硬亲和性 必须满足亲和性。 nodeSelectorTerms: # 节点选择主题 matchExpressions # 匹配表达式,可用大量运算符: In # label 的值在某个列表中 NotIn # label 的值不在某个列表中 Gt # label 的值大于某个值 Lt # label 的值小于某个值 Exists # 某个 label 存在 DoesNotExist # 某个 label 不存在 matchFields # 匹配字段,可以不定义标签值
4.1.2、需求
只要节点包含env标签值是dev或者test的时候,才允许部署pod
4.1.3、定义资源配置清单并且应用
kubectl apply -f - <<EOF apiVersion: v1 kind: Pod metadata: name: node-required-affinity spec: containers: - name: demoapp image: 192.168.10.33:80/k8s/pod_test:v0.1 imagePullPolicy: IfNotPresent affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: env operator: In values: - dev - test EOF
4.1.4、查看当前调度是否是pending
master1 ~]# kubectl get pods NAME READY STATUS RESTARTS AGE node-required-affinity 0/1 Pending 0 2m3s
# 此时节点没有看到node节点上标签env有包含dev或test的值,所以一直挂起,直到标签匹配,到调度到节点
4.1.5、给node2打上标签env=test
]# kubectl label nodes node2 env=test
4.1.6、查看是否调度到node2成功
master1 ~]# kubectl get pods -o wide -w NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES node-required-affinity 0/1 Pending 0 3m56s <none> <none> <none> <none> node-required-affinity 0/1 Pending 0 5m26s <none> node2 <none> <none> node-required-affinity 0/1 ContainerCreating 0 5m26s <none> node2 <none> <none> node-required-affinity 0/1 ContainerCreating 0 5m27s <none> node2 <none> <none> node-required-affinity 1/1 Running 0 5m27s 10.244.4.77 node2 <none> <none>
# 成功调度到node2
4.2、preferredDuringSchedulingIgnoredDuringExecution-软亲和
4.2.1、属性解析
kubectl explain pod.spec.affinity.nodeAffinity.preferredDuringSchedulingIgnoredDuringExecution - preferredDuringSchedulingIgnoredDuringExecution 软亲和性 能满足最好,不满足也没关系。 preference 优先级 matchExpressions matchFields weight 权重值,range 1-100 对于满足所有调度要求的每个节点,调度程序将通过迭代此字段的元素计算总和并在节点与对应的节点匹配时将“权重”添加到总和。 注意:与硬亲和属性不同,这里需要注意是软亲和属性是一个列表对象,而preference不是一个列表项了。
4.2.2、需求
按权重,只要节点包含env标签值是dev 或者 test的时候,才允许部署pod
4.2.3、定义资源配置清单且应用
kubectl apply -f - <<EOF apiVersion: v1 kind: Pod metadata: name: node-preferred-affinity spec: containers: - name: demoapp image: 192.168.10.33:80/k8s/pod_test:v0.1 imagePullPolicy: IfNotPresent affinity: nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 50 preference: matchExpressions: - key: env operator: In values: - test - weight: 20 preference: matchExpressions: - key: env operator: In values: - dev EOF
4.2.4、查看pod运行状态
master1 ~]# kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES node-preferred-affinity 1/1 Running 0 18s 10.244.3.45 node1 <none> <none> 总结: 1、亲和性调度时,当没有匹配的标签,不会一直pending,而是选择最佳节点调度
4.2.5、给node1、node2打上标签
kubectl label node node1 env=dev kubectl label node node2 env=test
4.2.6、删除原来的pod,重新创建pod
kubectl delete pod node-preferred-affinity # 重新创建pod,看上面知识点:4.2.3、定义资源配置清单且应用
4.2.7、观察权重值最高node2,是否被调度该节点
]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES node-preferred-affinity 1/1 Running 0 4s 10.244.4.78 node2 <none> <none>
4.3、硬亲和性-当内存、CPU资源不足时不会调度-实践
4.3.1、回顾
根据我们之前对调度策略的了解,默认的的调度策略有一个优先级的筛选问题。
比如:即时我们的标签满足需求,如果资源需求不满足的话,仍然不会去调度资源。
对于我们之前实践的 预选和优选的问题,我们知道 预选是强制性的,一旦满足不了,就不再走下去了,
而优选,仅仅是选择一个最好的,前提是 预选必须通过。
4.3.2、需求
主要验证:硬亲和性时,如果出现申请资源:内存、CPU不足的时候,会不会强制调度最佳节点
4.3.3、计算节点打标签
kubectl label node node2 env=test kubectl label node node1 env=dev
4.3.4、查看当前节点的支持的配置
# Node1节点的配置 master1 ~]# kubectl describe nodes node1 ... Allocatable: # 可分配的资源 cpu: 1 # 我们可以看到这个CPU只能分配一核 ephemeral-storage: 17019017598 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 3758872Ki pods: 110 ... Allocated resources: # 分配资源的使用情况 (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 250m (25%) 0 (0%) memory 0 (0%) 0 (0%) ephemeral-storage 0 (0%) 0 (0%) hugepages-1Gi 0 (0%) 0 (0%) hugepages-2Mi 0 (0%) 0 (0%) ...
4.3.4、定义资源配置清单
kubectl apply -f - <<EOF apiVersion: apps/v1 kind: Deployment metadata: name: node-resourcefits-affinity spec: replicas: 2 selector: matchLabels: app: podtest template: metadata: labels: app: podtest spec: containers: - name: podtest image: 182.168.10.33:80/k8s/pod_test:v0.1 imagePullPolicy: IfNotPresent resources: requests: cpu: 2 memory: 2Gi affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: env operator: Exists EOF 解析: 1、这里的cpu: 2,实际可分配的是cpu :1。 2、使用硬亲和性,条件是节点标签key=env即可调度。 下一小点,我们观察运行情况,进行总结
4.3.5、查看运行pod情况
master1 ~]# kubectl get pods NAME READY STATUS RESTARTS AGE node-resourcefits-affinity-59d56f8fdd-4wlvl 0/1 Pending 0 4m32s node-resourcefits-affinity-59d56f8fdd-9sl9v 0/1 Pending 0 4m32s master1 ~]# kubectl describe pod node-resourcefits-affinity-59d56f8fdd-4wlvl Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 15s default-scheduler 0/5 nodes are available: 2 Insufficient cpu, 3 node(s)
had untolerated taint {node-role.kubernetes.io/control-plane: }. preemption: 0/5 nodes are available:
2 No preemption victims found for incoming pod, 3 Preemption is not helpful for scheduling. 状态解析: 1、pod状态一直是pending。 2、查看pod启动事件,发现调度失败,原因是cpu只能使用1核,我们配置2核,cpu资源不够。
4.3.6、降低cpu的核数,再重新更新pod看看能不能调度成功
kubectl apply -f - <<EOF apiVersion: apps/v1 kind: Deployment metadata: name: node-resourcefits-affinity spec: replicas: 2 selector: matchLabels: app: podtest template: metadata: labels: app: podtest spec: containers: - name: podtest image: 192.168.10.33:80/k8s/pod_test:v0.1 imagePullPolicy: IfNotPresent resources: requests: cpu: 0.2 memory: 2Gi affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: env operator: Exists EOF
4.3.7、查看pod创建运行成功
master1 ~]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES node-resourcefits-affinity-745d4d4d76-c6c54 1/1 Running 0 35s 10.244.4.83 node2 <none> <none> node-resourcefits-affinity-745d4d4d76-t442d 1/1 Running 0 35s 10.244.3.50 node1 <none> <none> # 此时已创建成功
4.3.8、查看节点的资源分配情况
master1 ~]# kubectl describe node node1 ... Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 450m (45%) 0 (0%) memory 2Gi (55%) 0 (0%) ephemeral-storage 0 (0%) 0 (0%) hugepages-1Gi 0 (0%) 0 (0%) hugepages-2Mi 0 (0%) 0 (0%) Events: <none>... # 我们发现内存和CPU已分配一些