调度器

scheduler调度过程:

predicate --> priority --> select

 预选                优选     选定

 

调度方式:

1,节点倾向性调度 node affinity

2,pod affinity  pod亲和性调度 pod反亲和性调度

3,污点和污点容忍调度 taints(污点),tolerations(容忍性)

 

源码:https://github.com/kubernetes/kubernetes/tree/master/pkg/scheduler/algorithm

默认的常用预选机制:

一票否决的方式

CheckNodeCondition

GeneralPredicates:

HostName  pod.spec.hostname  检查对象是否定义了 是pod的hostname

PodFitsHostPorts: pods.spec.containers.ports.hostPort

MatchNodeSelector: pods.spec.nodeSelector

podFitsRescources: 检查pod的资源需求是否被节点满足 kubectl describe nodes node1

 

NoDiskConflict 检查pod依赖的存储卷是否能满足需求 默认不启用

PodToleratesNodeTaints: 检查pod上spec.tolerations可容忍的污点是否完全包含节点上的污点,

PodToleratesNodeNoExecuteTaints:检查pod容忍不能执行的污点 默认不启用

CheckNodeLabelPresence:检查节点上指定标签的存在性 默认不启用 即通过节点标签调度

CheckServiceAffinity:将相同service的pod尽量放置在同一个node上 默认不启用

 

MaxEBSVolumeCount:亚马逊弹性存储 即节点上对应挂载的云存储数量

MaxGCEPDVolumeCount:谷歌云存储

MaxAzureDiskVolumeCount:微软云存储

 

CheckVolumeBinding:检查节点上绑定的pvc数量

NoVolumeZoneConflict: 检查区域的存储卷剩余份额

 

CheckNodeMemoryPressure:检查节点内存资源是否存在压力

CheckNodePIDPressure:  检查进程数量是否过大

CheckNodeDiskPressure: 检查硬盘存储压力是否过大

 

MatchInterPodAffinity: 检查节点亲和性

 

默认的优选函数(针对调度node)

启用所有优先函数,根据每个优选函数评分相加,总得分最高即最佳

LeastRequested: 资源剩余量比例越高得分越高 priority= (cpu(capacity-sum(requested))*10/capacity + memory(capacity-sum(requested))*10/capacity)/2

balanced_resource_allocation:  cpu和内存占用率相近得分越高

node_prefer_avoid_pods:节点的注解信息”scheduler.alpha.kubernetes.io/preferAvoidPods”匹配越多的得分越高

taint_toleration: 将pod对象的spec.tolerations列表项与节点taints列表项进行匹配度检查,匹配的条目越多,优先级越低

selector_spreading:与当前pod同属标签的pod所在的节点越多的越低

interpod_affinity:遍历pod的亲和性,满足亲和性条目越多的,得分越高

node_affinity:  node亲和性 根据pod上的nodeselector匹配节点检查,匹配的数量越多,得分越高

 

most_requested 空闲量越小的得分越高 默认不启用

node_label:根据节点标签来评判得分 默认不启用

image_locality: node上满足pod已有镜像总体积越高得分高 默认不启用

 

高级调度预设机制

节点选择器:nodeSelector nodeName

节点亲和调度: nodeAffinity

 

nodeSelector 强约束

kubectl explain pods.spec.nodeSelector

 

mkdir schedule

cp ../pod-sa-demo.yaml ./

vim pod-demo.yaml

apiVersion: v1

kind: Pod

metadata:

  name: pod-demo

  namespace: default

  labels:

    app: myapp

    tier: frontend

spec:

  containers:

  - name: myapp

    image: ikubernetes/myapp:v1

    ports:

    - name: myapp

      containerPort: 80

  nodeSelector:

disktype: ssd

 

kubectl apply -f pod-demo.yaml

kubectl label nodes node01 disktype=ssd

kubectl get nodes --show-label

kubectl get pods 新创建的pod将会在打了标签的node上运行

 

affinity

kubectl explain pods.spec.affinity

kubectl explain pods.spec.affinity.nodeAffinity

preferredDuringSchedulingIgnoredDuringExecution      <[]Object>  软亲和性 尽量满足 满足不了也没关系

requiredDuringSchedulingIgnoredDuringExecution        <Object>  硬亲和性 一定要满足才会在那个节点运行

 

实例

cp pod-demo.yaml pod-nodeaffinity-demo.yaml

vim pod-nodeaffinity-demo.yaml

 

apiVersion: v1

kind: Pod

metadata:

  name: pod-node-affinity-demo

  namespace: default

  labels:

    app: myapp

    tier: frontend

spec:

  containers:

  - name: myapp

    image: ikubernetes/myapp:v1

    ports:

    - name: myapp

      containerPort: 80

  affinity:  亲和性优选

    nodeAffinity:  node亲和性

      requiredDuringSchedulingIgnoredDuringExecution: 硬亲和性,一定要满足亲和性

        nodeSelectorTerms: 节点标签组

        - matchExpressions: 匹配表达式

         - key: zone  --.key

          operator: In    àoperator

          values:   值

          - foo    值1

          - bar    值2

 

kubectl apply -f pod-nodeaffinity-demo.yaml

 

vim pod-nodeaffinity-demo-2.yaml

apiVersion: v1

kind: Pod

metadata:

  name: pod-node-affinity-demo-2

  namespace: default

  labels:

    app: myapp

    tier: frontend

spec:

  containers:

  - name: myapp

    image: ikubernetes/myapp:v1

    ports:

    - name: myapp

      containerPort: 80

  affinity:

    nodeAffinity:

      preferredDuringSchedulingIgnoredDuringExecution: 有下面的亲和性优先,如果都没有,也能运行

      - preference:

         matchExpressions:

         - key: zone

          operator: In

          values:

          - foo

          - bar

       weight: 60

 

kubectl apply -f pod-nodeaffinity-demo-2.yaml

 

podAffinity调度 pod亲和性

podAntiAffinity  pod反亲和性 以第一个pod所在的节点作为评判后续pod到所在节点的方式,需要判定哪些pod在相同节点,哪些不在同一节点

 

kubectl explain pods.spec.affinity.podAffinity pod亲和性既有硬亲和也有软亲和性

preferredDuringSchedulingIgnoredDuringExecution      <[]Object> 软亲和性

 

requiredDuringSchedulingIgnoredDuringExecution        <[]Object> 硬亲和性

topologyKey     <string>  判断位置

labelSelector    <Object> 判定跟哪个或哪些pod亲和

namespaces    <[]string>  指名亲和pod的名称空间,如果没指明,就是当前要创建的pod的名称空间,一般不会跨名称空间引用其他pod

 

kubectl explain pods.spec.affinity.podAffinity.requiredDuringSchedulingIgnoredDuringExecution.labelSelector

matchExpressions   <[]Object>  集合选择器

matchLabels    <map[string]string>  等值选择器

 

实例

cp pod-demo.yaml pod-required-affinity-demo.yaml

vim pod-required-affinity-demo.yaml

apiVersion: v1

kind: Pod

metadata:

  name: pod-first

  namespace: default

  labels:

    app: myapp

    tier: frontend

spec:

  containers:

  - name: myapp

    image: ikubernetes/myapp:v1

---

apiVersion: v1

kind: Pod

metadata:

  name: pod-second

  namespace: default

  labels:

    app: backend

    tier: db

spec:

  containers:

  - name: busybox

    image: busybox:latest

    imagePullPolicy: IfNotPresent

    command: ["sh","-c","sleep 3600"]

  affinity:

    podAffinity:   pod亲和优选

      requiredDuringSchedulingIgnoredDuringExecution: 指定硬亲和

        - labelSelector:  pod标签选择器,选择这个标签的pod做为亲和对象

           matchExpressions:  匹配pod标签表达式

           - {key: app, operator: In, values: ["myapp"]}  标签 app=myapp

         topologyKey: Kubernetes.io/hostname  与亲和pod一起放在哪个node上运行 指定node唯一的标签的key做标识符,然后values与亲和的 pod的values一致,从而指定和亲和的pod在同一个node或同一类node上运行

 

apiVersion: v1

kind: Pod

metadata:

  name: pod-first

  labels:

    app: myapp

    tier: frontend

spec:

  containers:

  - name: myapp

    image: ikubernetes/myapp:v1

---

apiVersion: v1

kind: Pod

metadata:

  name: pod-second

  labels:

    app: backend

    tier: db

spec:

  containers:

  - name: busybox

    image: busybox:latest

    imagePullPolicy: IfNotPresent

    command: ["sh","-c","sleep 3600"]

  affinity:

    podAffinity:

      requiredDuringSchedulingIgnoredDuringExecution:

      - labelSelector:

          matchExpressions:

          - {key: app, operator: In, values: ["myapp"]}

        topologyKey: kubernetes.io/hostname

 

kubectl delete -f pod-required-affinity-demo.yaml

kubectl apply -f pod-required-affinity-demo.yaml

 

podAntiAffinity  pod反亲和

实例

cp pod-required-affinity-demo.yaml pod-required-Antiaffinity-demo.yaml

vim pod-required-Antiaffinity-demo.yaml

 

apiVersion: v1

kind: Pod

metadata:

  name: pod-three

  labels:

    app: myapp

    tier: frontend

spec:

  containers:

  - name: myapp

    image: ikubernetes/myapp:v1

---

apiVersion: v1

kind: Pod

metadata:

  name: pod-four

  labels:

    app: backend

    tier: db

spec:

  containers:

  - name: busybox

    image: busybox:latest

    imagePullPolicy: IfNotPresent

    command: ["sh","-c","sleep 3600"]

  affinity:

    podAntiAffinity:  pod反亲和优选策略 下面其他参数名与pod亲和一样

      requiredDuringSchedulingIgnoredDuringExecution:

      - labelSelector:

          matchExpressions:

          - {key: app, operator: In, values: ["myapp"]}

        topologyKey: kubernetes.io/hostname

 

kubectl delete -f pod-required-affinity-demo.yaml

kubectl apply -f pod-required-Antiaffinity-demo.yaml

因为只有一个node,而又是反亲和,所有只能pengding状态

 

 污点调度 给了节点主动选择权 节点属性

kubectl get nodes node01 -o yaml

kubectl explain nodes.spec

 

taints

kubectl explain nodes.spec.taints  污点

kubectl explain nodes.spec.taints.effect

effect        <string> -required-   当pod不能容忍时采取的行为 定义对pod的排斥效果

NoExecute: 不仅影响调度,还影响现存的pod对象;不容忍的pod对象将被node驱逐

NoSchedule: 仅影响调度过程,对现存的pod对象不产生影响;不能容忍不要调度过来

PreferNoSchedule:仅影响调度过程,对现存的pod对象不产生影响;不能容忍不会调度过来,

但非要调度过来也行

 

Master污点

kubectl describe nodes master

Taints:       node-role.kubernetes.io/master:NoSchedule

             污点                    effect

Pod不能容忍这个污点,就不调度过来

 

Pod容忍度

Kubectl get pods -n kube-system

kubectl describe pods -n kube-system kube-apiserver-master

Tolerations:       :NoExecute、

 

kubectl describe pods -n kube-system kube-flannel-ds-amd64-99ccn

Tolerations:     :NoSchedule

                 node.kubernetes.io/disk-pressure:NoSchedule

                 node.kubernetes.io/memory-pressure:NoSchedule

                 node.kubernetes.io/network-unavailable:NoSchedule

                 node.kubernetes.io/not-ready:NoExecute

                 node.kubernetes.io/pid-pressure:NoSchedule

                 node.kubernetes.io/unreachable:NoExecute

                 node.kubernetes.io/unschedulable:NoSchedule       

 

 管理节点的污点

kubectl taint –help  打上污点

kubectl taint node node01 node-type=production:NoSchedule

kubectl taint node node01 node-type-  删除污点

             指定node01  污点key=污点value: 不能容忍的就不要调度  effect

 

Taints:             node-type=production:NoSchedule

 

vim deploy-demo.yaml

apiVersion: apps/v1

kind: Deployment

metadata:

  name: myapp-deploy

  namespace: default

spec:

  replicas: 2

  selector:

    matchLabels:

      app: myapp

      release: canary

  template:

    metadata:

      labels:

        app: myapp

        release: canary

 

    spec:

      containers:

      - name: myapp

        image: ikubernetes/myapp:v1

        ports:

        - name: http

          containerPort: 80

 

kubectl apply -f deploy-demo.yaml  因为pod 上没加污点容忍度,所以不会运行,状态为pending

 

kubectl taint node node02 node-type=qa:NoExecute

                      污点key   污点value pod不能容忍就驱逐 effect

 

    spec:

      containers:

      - name: myapp

        image: ikubernetes/myapp:v1

        ports:

        - name: http

          containerPort: 80

      tolerations:  pod容忍以下污点 既可以在这个污点里运行

      - key: "node-type"  node的污点key

        operator: "Equal"  等值比较 equal和node的污点完全一致  exist存在在node污点里

        value: "production"  node污点value

        effect: "NoExecute"  pod能够容忍的程度

        tolerationSeconds: 60 驱逐时间

 

kubectl apply -f deploy-demo.yaml

 

    spec:

      containers:

      - name: myapp

        image: ikubernetes/myapp:v1

        ports:

        - name: http

          containerPort: 80

      tolerations:  容忍以下污点名 既可以在这个污点上运行

      - key: "node-type"

        operator: "Exists"  等值比较 存在在污点里

        value: ""

        effect: "NoSchedule"  能够容忍的程度

 

kubectl apply -f deploy-demo.yaml

 

    spec:

      containers:

      - name: myapp

        image: ikubernetes/myapp:v1

        ports:

        - name: http

          containerPort: 80

      tolerations:  容忍以下污点名 既可以在这个污点上运行

      - key: "node-type"

        operator: "Exists"  等值比较 存在在污点里

        value: ""

 

     effect: ""   表示能够容忍所有的程度

kubectl apply -f deploy-demo.yaml

 

容忍程度: NoExecute > NoSchedule > PreferNoSchedule

最大容忍度:NoExcute

posted on 2019-08-13 21:27  SZ_文彬  阅读(414)  评论(0编辑  收藏  举报