15、kubernetes之 高级调度方式

调度方式
    节点选择器:nodeSelector、nodeName
    节点亲和调度:nodeAffinity
Taint的effect定义对pod排斥效果
[root@k8s-master pki]# kubectl explain pods.spec.nodeSelector
[root@k8s-master yas]# mkdir schedule
c[root@k8s-master yas]# cd schedule/
[root@k8s-master schedule]# cat pod-demo.yaml

[root@k8s-master schedule]# cat pod-demo.yaml
apiVersion: v1
kind: Pod
metadata:
  name: pod-demo
  namespace: default
  labels:
    app: myapp
    tier: frontend
spec:
  containers:
  - image: ikubernetes/myapp:v1
    name: myapp
  nodeSelector:
    disktype: ssd
pod-demo.yaml

[root@k8s-master schedule]# kubectl create -f pod-demo.yaml
pod/pod-demo created
[root@k8s-master schedule]# kubectl get pod/pod-demo
NAME READY STATUS RESTARTS AGE
pod-demo 0/1 Pending 0 16s

[root@k8s-master schedule]# kubectl describe pod/pod-demo
...
Node-Selectors:  disktype=ssd
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                From               Message
  ----     ------            ----               ----               -------
  Warning  FailedScheduling  26s (x2 over 26s)  default-scheduler  0/3 nodes are available: 3 node(s) didn't match node selector.

如上,发现没有匹配到相应的标签选择器。,不满足选择器,pod置于pending状态中。
现在绑定标签选择器
nodeselect条件不满足,预选过程就不通过。可以手动给node打上标签,即可运行。
[root@k8s-master schedule]# kubectl get no -l disktype=ssd
No resources found.
[root@k8s-master schedule]# kubectl label node k8s-node1 disktype=ssd
node/k8s-node1 labeled
[root@k8s-master schedule]# kubectl get no -l disktype=ssd
NAME STATUS ROLES AGE VERSION
k8s-node1 Ready <none> 6d22h v1.14.3
[root@k8s-master schedule]# kubectl get pod/pod-demo
NAME READY STATUS RESTARTS AGE
pod-demo 1/1 Running 0 3m46s
如下,匹配到相应的标签选择器,pod正常运行。

配置node节点亲和性:硬亲和性require,软亲和性preference
配置硬亲和性操作
[root@master schedule]# vim pod-affinity-demo.yaml
[root@k8s-master schedule]# cat pod-affinity-demo.yaml

[root@k8s-master schedule]# cat pod-affinity-demo.yaml
apiVersion: v1
kind: Pod
metadata:
  name: pod-affinity-demo
  namespace: default
  labels:
    app: myapp
    tier: frontend
spec:
  containers:
  - image: ikubernetes/myapp:v1
    name: myapp
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: zone
            operator: In
            values:
            - foo
            - bar

[root@k8s-master schedule]# kubectl create -f pod-affinity-demo.yaml

[root@k8s-master schedule]# kubectl get pod/pod-affinity-demo --show-labels
NAME READY STATUS RESTARTS AGE LABELS
pod-affinity-demo 0/1 Pending 0 54s app=myapp,tier=frontend
[root@k8s-master schedule]# kubectl describe pod/pod-affinity-demo # 硬亲和性,没有匹配到标签、、、
软亲和性
修改一下,调整为软亲和性,系统会勉为其难的选择node节点部署pod

[root@k8s-master schedule]# cat pod-affinity-demo-2.yaml               
apiVersion: v1
kind: Pod
metadata:
  name: pod-affinity-demo2
  namespace: default
  labels:
    app: myapp
    tier: frontend
spec:
  containers:
  - image: ikubernetes/myapp:v1
    name: myapp
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - preference:
          matchExpressions:
          - key: zone
            operator: In
            values:
            - foo
            - bar
        weight: 60
[root@k8s-master schedule]# kubectl create -f pod-affinity-demo-2.yaml 
pod/pod-affinity-demo2 created
[root@k8s-master schedule]# kubectl get pod/pod-affinity-demo2 --show-labels 
NAME                 READY   STATUS    RESTARTS   AGE   LABELS
pod-affinity-demo2   1/1     Running   0          29s   app=myapp,tier=frontend

如上,自行完成了node亲和度调度,软亲和度下,不匹配也是可以运行的

Pod自身亲和度调度
pod一样也有硬亲和性和软亲和性,通过ndoe亲和性来定义pod亲和性。
Pod亲和性不需要在同一个node,在相近位置就行。
以node名称为标准,不同节点在不同位置。如下,新增pod在上面两个node有亲和性。
第1个节点,那么扩容的2-4号节点,是否能运行与第一个pod一致。
机架rack,排数row
下面以2个pod作为演示验证亲和性

[root@k8s-master schedule]# cat pod-required-affinity-demo.yaml
apiVersion: v1
kind: Pod
metadata:
  name: pod-first
  labels:
    app: myapp
    tier: frontend
spec:
  containers:
  - image: ikubernetes/myapp:v1
    name: myapp
---
apiVersion: v1
kind: Pod
metadata:
  name: pod-second
  labels:
    app: db
    tier: db
spec:
  containers:
  - name: busybox
    image: busybox:latest
    imagePullPolicy: IfNotPresent
    command: ["/bin/sh","-c","sleep 3600"]
  affinity:
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - {key: app, operator: In, values: ["myapp"]}
        topologyKey: kubernetes.io/hostname
[root@k8s-master schedule]# kubectl create -f pod-required-affinity-demo.yaml               
pod/pod-first created
pod/pod-second created
[root@k8s-master ~]# kubectl get pods -owide
NAME         READY   STATUS    RESTARTS   AGE   IP            NODE        NOMINATED NODE   READINESS GATES
pod-first    1/1     Running   0          6s    10.244.1.41   k8s-node1   <none>           <none>
pod-second   1/1     Running   0          6s    10.244.1.40   k8s-node1   <none>           <none>
[root@k8s-master ~]# kubectl describe pods pod-second 

其中一个pod在某个节点上,另一个pod相应启动也会在一台机器上。正常来说,k8s默认优选策略是会把pod分布在资源最先的节点上,但是这个做了亲和性,两个必须在一起,要切也得一起切。

反亲和性
两者一定不能在同一个节点上,分布在不同节点。如下示例。
调整上面示例,为反亲和性
podAffinity--》podAntiAffinity,yaml文件修改。
[root@k8s-master schedule]# kubectl create -f pod-required-anti-affinity-demo.yaml
[root@k8s-master ~]# kubectl get pods -owide # 验证ok,分布在不同地方。

来玩高级一点,将topologyKey调整为zone
现在将两个node标记为一样的。看下pod是否正常启动。

[root@k8s-master schedule]# kubectl label nodes k8s-node1 zone=foo
[root@k8s-master schedule]# kubectl label nodes k8s-node2 zone=foo
[root@k8s-master schedule]# tail -2 pod-required-anti-affinity-demo.yaml
          - {key: app, operator: In, values: ["myapp"]}
        topologyKey: zone
[root@k8s-master schedule]# kubectl delete -f pod-required-anti-affinity-demo.yaml
[root@k8s-master schedule]# kubectl create -f pod-required-anti-affinity-demo.yaml
[root@k8s-master ~]# kubectl get pods -owide
NAME         READY   STATUS    RESTARTS   AGE   IP            NODE        NOMINATED NODE   READINESS GATES
pod-first    1/1     Running   0          11s   10.244.1.44   k8s-node1   <none>           <none>
pod-second   0/1     Pending   0          10s   <none>        <none>      <none>           <none>
[root@k8s-master ~]# kubectl describe pod pod-second

如上,第二个节点由于强反亲和性,一直在pending状态,达到了预期效果。此时需要降低亲和力,第二个pod才可以跑起来。
达到了预期效果。

node-taint污点,pod-tolerations 容忍度

最后一个:污点调度,节点属性,用在节点之上。前面的亲和性都是基于pod执行的,节点都是被动选择的。
给node选择主动权,可以挑选和拒绝pod。
标签、注解所有对象都可以使用;污点通常用在节点之上,也是一种键值属性,拒绝不能容忍的pod

taints:在nodes节点上定义污点,其effect对pod排斥效果有三种类型,
    NoSchedule:仅影响调度过程,对现存的pod对象不影响
	NoExecute:不仅影响调度,也影响现存的pod对象;不满足Pod对象,将被驱逐。
	PreferNoSchedule:尽量不影响,也能接受
tolerations:容忍度,在pod上定义容忍哪些污点

[root@k8s-master ~]# kubectl explain pod.spec.tolerations
[root@k8s-master ~]# kubectl explain nodes.spec.taints
默认maser有一个污点。后期创建的pod从来都没有调度到master节点上,因为自行创建的pod没有定义容忍度,默认不能容忍这一个污点。Master就是靠这种方式去拒绝常规pod上master节点。如下,系统层面的node污点,和pod容忍度
[root@k8s-master ~]# kubectl get node -n kube-system
[root@k8s-master ~]# kubectl get pod -n kube-system
[root@k8s-master ~]# kubectl describe nodes k8s-master |grep -A 1 Taints
[root@k8s-master ~]# kubectl describe pod kube-proxy-h2j8g -n kube-system |grep -A 10 Tolerations
[root@k8s-master ~]# kubectl describe pod kube-flannel-ds-amd64-864bn -n kube-system |grep -A 10 Tolerations

如上,有taints污点和Tolerations容忍度设置。

那么如何管理节点的污点
示例,把node1标记为生产专用的,其他的为测试
[root@k8s-master ~]# kubectl taint node k8s-node1 node-type=production:NoSchedule # 设置污点
node/k8s-node1 tainted
[root@k8s-master schedule]# cat deploy-demo.yaml

root@k8s-master schedule]# cat deploy-demo.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-deploy
  namespace: default
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
      release: canary
  template:
    metadata:
      name: myapp
      labels:
        app: myapp
        release: canary
    spec:
      containers:
      - name: myapp
        image: ikubernetes/myapp:v2
        ports:
        - name: http
          containerPort: 80
View Code

[root@k8s-master schedule]# kubectl create -f deploy-demo.yaml 如上,没有设置pod污点容忍度,
deployment.apps/myapp-deploy created
[root@k8s-master ~]# kubectl get pods -owide |grep myapp-deploy # pod全部票到node2上面了,因他们不能容忍节点上面的污点,没办法运行到node1节点1上面。
myapp-deploy-6b59768b8-7lp47 1/1 Running 0 59s 10.244.2.69 k8s-node2 <none> <none>
myapp-deploy-6b59768b8-9f6rr 1/1 Running 0 59s 10.244.2.67 k8s-node2 <none> <none>
myapp-deploy-6b59768b8-pcl2d 1/1 Running 0 59s 10.244.2.68 k8s-node2 <none> <none>

此时pod全部跑到node2
不妨继续操作,把node2污点标记为dev,看下会不会被驱逐

[root@k8s-master ~]# kubectl taint node k8s-node2 node-type=dev:NoExecute       
node/k8s-node2 tainted
[root@k8s-master ~]# kubectl get pods -owide |grep myapp-deploy          
myapp-deploy-6b59768b8-gg966   0/1     Pending   0          20s     <none>        <none>      <none>           <none>
myapp-deploy-6b59768b8-kmwrd   0/1     Pending   0          20s     <none>        <none>      <none>           <none>
myapp-deploy-6b59768b8-mf2nq   0/1     Pending   0          21s     <none>        <none>      <none>           <none>

kubectl describe pods myapp-deploy-6b59768b8-gg966查看提示:节点不能容忍污点。
Warning FailedScheduling 10s (x16 over 28s) default-scheduler 0/3 nodes are available: 3 node(s) had taints that the pod didn't tolerate.
此时,pod全部被干掉了,一直处于pending状态。

pod添加Tolerations恢复
这个时候需要加Tolerations可以避免污点的限制。
配置deployment容忍度配置,重启生成pod服务。
  Equal:精确容忍到哪个污点;
  Exists:存在的污点服务,都容忍。
[root@k8s-master schedule]# grep -A 5 tolerations deploy-demo.yaml # 能精确容忍第一个污点,pod正常在node1上运行了。

[root@k8s-master schedule]# grep -A 5 tolerations deploy-demo.yaml    # 能精确容忍第一个污点,pod正常在node1上运行了。
      tolerations:
        - key: "node-type"
          operator: "Equal"
          value: "production"
          effect: "NoSchedule"

[root@k8s-master schedule]# kubectl apply -f deploy-demo.yaml
[root@k8s-master schedule]# kubectl get pods -owide |grep myapp-deploy
myapp-deploy-f9f7f6969-ng6jl 1/1 Running 0 19s 10.244.1.46 k8s-node1 <none> <none>
myapp-deploy-f9f7f6969-vpxr6 1/1 Running 0 22s 10.244.1.45 k8s-node1 <none> <none>
myapp-deploy-f9f7f6969-zjsmd 1/1 Running 0 17s 10.244.1.47 k8s-node1 <none> <none>
不妨调整pod能够容忍所有的污点,

[root@k8s-master schedule]# grep -A 5 tolerations deploy-demo.yaml
      tolerations:
        - key: "node-type"
          operator: "Exists"
          value: ""
          effect: ""
[root@k8s-master schedule]# kubectl apply -f deploy-demo.yaml         
deployment.apps/myapp-deploy configured
[root@k8s-master schedule]# kubectl get pods -owide |grep myapp-deploy 
myapp-deploy-5db9b6869b-kp422   1/1     Running   0          24s     10.244.1.48   k8s-node1   <none>           <none>
myapp-deploy-5db9b6869b-vl42v   1/1     Running   0          21s     10.244.2.71   k8s-node2   <none>           <none>
myapp-deploy-5db9b6869b-zz8zc   1/1     Running   0          27s     10.244.2.70   k8s-node2   <none>           <none>

如上,容忍所有的污点后,pod能够在优选策略下,自动分散pod的位置。
自此,完成了污点调度使用。

标签怎么去除
$ kubectl taint node node1 node-type-
[root@k8s-master ~]# kubectl describe nodes k8s-node2 |grep -A 0 Taints
Taints: node-type=dev:NoExecute
[root@k8s-master ~]# kubectl taint node k8s-node2 node-type-
node/k8s-node2 untainted
[root@k8s-master ~]# kubectl describe nodes k8s-node2 |grep -A 0 Taints
Taints: <none>

posted @ 2023-07-17 16:13  wang_wei123  阅读(34)  评论(0编辑  收藏  举报