k8s的亲和调度

k8s的亲和调度

出于高效通信等需求,偶尔需要把一些Pod对象组织在相近的位置(同一节点、机架、区域或地区等),例如应用程序的Pod及其后端提供数据服务的Pod等,我们可以认为这是一类具有亲和关系的Pod对象。

理想的实现方式是允许调度器把第一个Pod放置在任何位置,而后与其有着亲和或反亲和关系的其他Pod据此动态完成位置编排,这就是Pod亲和调度与反亲和调度的功用。Pod间的亲和关系也存在强制亲和及首选亲和的区别,它们表示的约束意义同节点亲和相似。

Pod 亲和性

Pod 亲和性(podAffinity)主要解决 Pod 可以和哪些 Pod 部署在同一个拓扑域中的问题(其中拓扑域用主机标签实现,可以是单个主机,也可以是多个主机组成的 cluster、zone 等等),而 Pod 反亲和性主要是解决 Pod 不能和哪些 Pod 部署在同一个拓扑域中的问题,它们都是处理的 Pod 与 Pod 之间的关。

[root@k8s-01 ~]# kubectl explain deploy.spec.template.spec.affinity.podAffinity.requiredDuringSchedulingIgnoredDuringExecution
KIND:     Deployment
VERSION:  apps/v1

RESOURCE: requiredDuringSchedulingIgnoredDuringExecution <[]Object>

DESCRIPTION:
     If the affinity requirements specified by this field are not met at
     scheduling time, the pod will not be scheduled onto the node. If the
     affinity requirements specified by this field cease to be met at some point
     during pod execution (e.g. due to a pod label update), the system may or
     may not try to eventually evict the pod from its node. When there are
     multiple elements, the lists of nodes corresponding to each podAffinityTerm
     are intersected, i.e. all terms must be satisfied.

     Defines a set of pods (namely those matching the labelSelector relative to
     the given namespace(s)) that this pod should be co-located (affinity) or
     not co-located (anti-affinity) with, where co-located is defined as running
     on a node whose value of the label with key <topologyKey> matches that of
     any node on which a pod of the set of pods is running

FIELDS:
   labelSelector        <Object>
     A label query over a set of resources, in this case pods.

   namespaceSelector    <Object>
     A label query over the set of namespaces that the term applies to. The term
     is applied to the union of the namespaces selected by this field and the
     ones listed in the namespaces field. null selector and null or empty
     namespaces list means "this pod's namespace". An empty selector ({})
     matches all namespaces. This field is beta-level and is only honored when
     PodAffinityNamespaceSelector feature is enabled.

   namespaces   <[]string>
     namespaces specifies a static list of namespace names that the term applies
     to. The term is applied to the union of the namespaces listed in this field
     and the ones selected by namespaceSelector. null or empty namespaces list
     and null namespaceSelector means "this pod's namespace"

   topologyKey  <string> -required-
     This pod should be co-located (affinity) or not co-located (anti-affinity)
     with the pods matching the labelSelector in the specified namespaces, where
     co-located is defined as running on a node whose value of the label with
     key topologyKey matches that of any node on which any of the selected pods
     is running. Empty topologyKey is not allowed.

[root@k8s-01 ~]#

Pod间的亲和关系定义在spec.affinity.podAffinity字段中,而反亲和关系定义在spec.affinity.podAntiAffinity字段中,它们各自的约束特性也存在强制与首选两种,它们都支持使用如下关键字段。

  • topologyKey :拓扑键,用来划分拓扑结构的节点标签,在指定的键上具有相同值的节点归属为同一拓扑;必选字段。
  • labelSelector :Pod标签选择器,用于指定该Pod将针对哪类现有Pod的位置来确定可放置的位置。
  • namespaces <[]string>:用于指示labelSelector字段的生效目标名称空间,默认为当前Pod所属的同一名称空间。
  • 下面是测试的yaml

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: pod-affinity
      labels:
        app: pod-affinity
    spec:
      replicas: 3
      selector:
        matchLabels:
          app: pod-affinity
      template:
        metadata:
          labels:
            app: pod-affinity
        spec:
          containers:
          - name: nginx
            image: nginx
            ports:
            - containerPort: 80
              name: nginxweb
          affinity:
            podAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:  # 硬策略
              - labelSelector:
                  matchExpressions:
                  - key: logging
                    operator: In
                    values:
                    - true
                topologyKey: kubernetes.io/hostname
    

    这里的 topologyKey为 kubernetes.io/hostname,即以每个node节点名为一个区域,然后在选择有pod为logging=true的pod所在的节点

    image-20220627183733233

    查看pods,发现所有的pods都在node3节点

    [root@k8s-01 ~]# kubectl get pods -o wide |grep pod-affinity
    pod-affinity-64bc56d789-2bczb             1/1     Running   0               5m25s   10.244.165.213   k8s-03   <none>           <none>
    pod-affinity-64bc56d789-qgtkd             1/1     Running   0               5m25s   10.244.165.211   k8s-03   <none>           <none>
    pod-affinity-64bc56d789-w95dv             1/1     Running   0               5m25s   10.244.165.208   k8s-03   <none>           <none>
    [root@k8s-01 ~]#
    

    如果此时,我们修改部分的yaml,并将副本改成10

              - labelSelector:
                  matchExpressions:
                  - key: app
                    operator: In
                    values: ["nginx-readiness","nginx-test"]
                topologyKey: disk
    

    运行yaml,可以看见pod分散在node2和node4 2个节点上。

    
    [root@k8s-01 ~]# kubectl get pods -o wide |grep pod-affinity
    pod-affinity-94b66f75b-2cxns              1/1     Running   0               107s    10.244.7.86      k8s-04   <none>           <none>
    pod-affinity-94b66f75b-6jfrv              1/1     Running   0               107s    10.244.7.87      k8s-04   <none>           <none>
    pod-affinity-94b66f75b-7bftn              1/1     Running   0               107s    10.244.179.15    k8s-02   <none>           <none>
    pod-affinity-94b66f75b-9tqgm              1/1     Running   0               107s    10.244.7.85      k8s-04   <none>           <none>
    pod-affinity-94b66f75b-dnph9              1/1     Running   0               107s    10.244.7.88      k8s-04   <none>           <none>
    pod-affinity-94b66f75b-fznzb              1/1     Running   0               107s    10.244.179.11    k8s-02   <none>           <none>
    pod-affinity-94b66f75b-q6lv2              1/1     Running   0               107s    10.244.179.13    k8s-02   <none>           <none>
    pod-affinity-94b66f75b-s7jj5              1/1     Running   0               107s    10.244.179.16    k8s-02   <none>           <none>
    pod-affinity-94b66f75b-tn4s4              1/1     Running   0               107s    10.244.179.10    k8s-02   <none>           <none>
    pod-affinity-94b66f75b-xpbnq              1/1     Running   0               107s    10.244.7.89      k8s-04   <none>           <none>
    [root@k8s-01 ~]#
    
    

    由此可见,Pod间的亲和调度能够将有密切关系或密集通信的应用约束在同一位置,通过降低通信延迟来降低性能损耗。需要注意的是,若节点上的标签在运行时发生更改导致不能再满足Pod上的亲和关系定义时,该Pod将继续在该节点上运行而不会被重新调度。另外,labelSelector属性仅匹配与被调度的Pod在同一名称空间中的Pod资源,不过也可以通过为其添加namespace字段以指定其他名称空间。

    pod的亲和也支持柔性亲和,和节点亲和一致,这里不再给出具体的测试过程。

    Pod 反亲和性

    Pod 反亲和性(podAntiAffinity)则是反着来的,比如一个节点上运行了某个 Pod,那么我们的模板 Pod 则不希望被调度到这个节点上面去了。我们把上面的 podAffinity 直接改成podAntiAffinity。

    反亲和可以实现DaemonSe+nodeSelector的效果,但是比它更加的灵活,前者如果node节点挂了,则pod就少一份,必须要等这个node起来,才会拉起pod,而反亲和的话,则可以在满足的topologyKey中,选择任意一节点,在起一个pod。因此,反亲和性调度一般用于分散同一类应用的Pod对象等,也包括把不同安全级别的Pod对象调度至不同的区域、机架或节点等。

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: pod-antiaffinity
      labels:
        app: pod-antiaffinity
    spec:
      replicas: 3
      selector:
        matchLabels:
          app: pod-antiaffinity
      template:
        metadata:
          labels:
            app: pod-antiaffinity
        spec:
          containers:
          - name: nginx
            image: nginx
            ports:
            - containerPort: 80
              name: nginxweb
          affinity:
            podAntiAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:  # 硬策略
              - labelSelector:
                  matchExpressions:
                  - key: app
                    operator: In
                    values:
                    - pod-antiaffinity
                topologyKey: kubernetes.io/hostname
    

    发现每一个pod都运行在不同的节点上

    [root@k8s-01 ~]# kubectl get pods -o wide |grep pod-antiaffinity
    pod-antiaffinity-86566d4dd5-bpspt         1/1     Running   0               23s     10.244.61.220    k8s-01   <none>           <none>
    pod-antiaffinity-86566d4dd5-ggbgc         1/1     Running   0               23s     10.244.179.2     k8s-02   <none>           <none>
    pod-antiaffinity-86566d4dd5-q5jl4         1/1     Running   0               23s     10.244.7.83      k8s-04   <none>           <none>
    [root@k8s-01 ~]#
    

    如果此时将副本改成5个,则有一个pod处于pending状态

    [root@k8s-01 ~]# kubectl get pods -o wide |grep pod-antiaffinity
    pod-antiaffinity-86566d4dd5-5h9h7         1/1     Running   0               59s     10.244.61.224    k8s-01   <none>           <none>
    pod-antiaffinity-86566d4dd5-fslqk         1/1     Running   0               59s     10.244.179.14    k8s-02   <none>           <none>
    pod-antiaffinity-86566d4dd5-n474x         1/1     Running   0               59s     10.244.165.222   k8s-03   <none>           <none>
    pod-antiaffinity-86566d4dd5-pcbhs         1/1     Running   0               59s     10.244.7.91      k8s-04   <none>           <none>
    pod-antiaffinity-86566d4dd5-vqvhv         0/1     Pending   0               59s     <none>           <none>   <none>           <none>
    [root@k8s-01 ~]#
    
    

    类似地,Pod反亲和调度也支持使用柔性约束机制,调度器会尽量不把位置相斥的Pod对象调度到同一位置,但约束关系无法得到满足时,也可以违反约束规则进行调度,而非把Pod置于Pending状态。

posted @ 2022-11-29 10:25  天宇轩-王  阅读(100)  评论(0编辑  收藏  举报