从OpenShift SDN切换到OVN-Kubernetes
OpenShift 4的版本从4.6开始GA了新的网络插件OVN-kubernetes,和原有的OpenShift SDN对比,实现不同如下
整个Ovn的架构如下图
本文主要记录一下从传统的SDN网络切换到ovn-kubernetes的主要步骤
1.首先确认下目前的集群状态是否健康。
[root@bastion cluster-fe55]# oc get nodes NAME STATUS ROLES AGE VERSION ip-10-0-129-219.us-east-2.compute.internal Ready master 24m v1.23.5+9ce5071 ip-10-0-152-235.us-east-2.compute.internal Ready worker 18m v1.23.5+9ce5071 ip-10-0-166-169.us-east-2.compute.internal Ready master 24m v1.23.5+9ce5071 ip-10-0-190-233.us-east-2.compute.internal Ready worker 11m v1.23.5+9ce5071 ip-10-0-193-179.us-east-2.compute.internal Ready worker 18m v1.23.5+9ce5071 ip-10-0-199-160.us-east-2.compute.internal Ready master 24m v1.23.5+9ce5071
[root@bastion cluster-fe55]# oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE authentication 4.10.11 True False False 4m33s baremetal 4.10.11 True False False 22m cloud-controller-manager 4.10.11 True False False 24m cloud-credential 4.10.11 True False False 25m cluster-autoscaler 4.10.11 True False False 22m config-operator 4.10.11 True False False 23m console 4.10.11 True False False 2m15s csi-snapshot-controller 4.10.11 True False False 23m dns 4.10.11 True False False 22m etcd 4.10.11 True False False 21m image-registry 4.10.11 True False False 11m ingress 4.10.11 True False False 13m insights 4.10.11 True False False 10m kube-apiserver 4.10.11 True False False 7m27s kube-controller-manager 4.10.11 True False False 20m kube-scheduler 4.10.11 True False False 20m kube-storage-version-migrator 4.10.11 True False False 23m machine-api 4.10.11 True False False 19m machine-approver 4.10.11 True False False 23m machine-config 4.10.11 True False False 22m marketplace 4.10.11 True False False 22m monitoring 4.10.11 True False False 10m network 4.10.11 True False False 24m node-tuning 4.10.11 True False False 23m openshift-apiserver 4.10.11 True False False 7m28s openshift-controller-manager 4.10.11 True False False 21m openshift-samples 4.10.11 True False False 14m operator-lifecycle-manager 4.10.11 True False False 23m operator-lifecycle-manager-catalog 4.10.11 True False False 23m operator-lifecycle-manager-packageserver 4.10.11 True False False 15m service-ca 4.10.11 True False False 23m storage 4.10.11 True False False 23m
sdn命名空间下的pods
[root@bastion cluster-fe55]# oc get pods -n openshift-sdn NAME READY STATUS RESTARTS AGE sdn-bjnld 2/2 Running 0 20m sdn-chgzt 2/2 Running 0 19m sdn-controller-5lw7p 2/2 Running 0 25m sdn-controller-bsqf9 2/2 Running 0 25m sdn-controller-lgfcw 2/2 Running 0 25m sdn-jjskf 2/2 Running 0 25m sdn-k5ff9 2/2 Running 0 25m sdn-mtf6h 2/2 Running 0 13m sdn-vn9lg 2/2 Running 0 25m
备份当前的网络配置
oc get Network.config.openshift.io cluster -o yaml > cluster-openshift-sdn.yaml
2. 准备阶段,设置cluster network operator为migration状态
[root@bastion cluster-fe55]# oc patch Network.operator.openshift.io cluster --type='merge' --patch '{ "spec": { "migration": {"networkType": "OVNKubernetes" } } }' network.operator.openshift.io/cluster patched
查看mcp以及machineconfig,确保更新完成,更新过程中集群会重启。
[root@bastion cluster-fe55]# oc get mcp NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-716bbe8d388b0c03c7220b92261ce3b7 False True False 3 0 0 0 31m worker rendered-worker-19e997749a8169203d7633a41779dc70 False True False 3 0 0 0 31m [root@bastion cluster-fe55]# oc describe node | egrep "hostname|machineconfig" kubernetes.io/hostname=ip-10-0-129-219.us-east-2.compute.internal machineconfiguration.openshift.io/controlPlaneTopology: HighlyAvailable machineconfiguration.openshift.io/currentConfig: rendered-master-716bbe8d388b0c03c7220b92261ce3b7 machineconfiguration.openshift.io/desiredConfig: rendered-master-716bbe8d388b0c03c7220b92261ce3b7 machineconfiguration.openshift.io/reason: machineconfiguration.openshift.io/state: Done kubernetes.io/hostname=ip-10-0-152-235.us-east-2.compute.internal machineconfiguration.openshift.io/controlPlaneTopology: HighlyAvailable machineconfiguration.openshift.io/currentConfig: rendered-worker-19e997749a8169203d7633a41779dc70 machineconfiguration.openshift.io/desiredConfig: rendered-worker-19e997749a8169203d7633a41779dc70 machineconfiguration.openshift.io/reason: machineconfiguration.openshift.io/state: Done kubernetes.io/hostname=ip-10-0-166-169.us-east-2.compute.internal machineconfiguration.openshift.io/controlPlaneTopology: HighlyAvailable machineconfiguration.openshift.io/currentConfig: rendered-master-716bbe8d388b0c03c7220b92261ce3b7 machineconfiguration.openshift.io/desiredConfig: rendered-master-716bbe8d388b0c03c7220b92261ce3b7 machineconfiguration.openshift.io/reason: machineconfiguration.openshift.io/state: Done kubernetes.io/hostname=ip-10-0-190-233.us-east-2.compute.internal machineconfiguration.openshift.io/controlPlaneTopology: HighlyAvailable machineconfiguration.openshift.io/currentConfig: rendered-worker-19e997749a8169203d7633a41779dc70 machineconfiguration.openshift.io/desiredConfig: rendered-worker-19e997749a8169203d7633a41779dc70 machineconfiguration.openshift.io/reason: machineconfiguration.openshift.io/state: Done kubernetes.io/hostname=ip-10-0-193-179.us-east-2.compute.internal machineconfiguration.openshift.io/controlPlaneTopology: HighlyAvailable machineconfiguration.openshift.io/currentConfig: rendered-worker-8a02c6921084c925662477a94e36d6d2 machineconfiguration.openshift.io/desiredConfig: rendered-worker-8a02c6921084c925662477a94e36d6d2 machineconfiguration.openshift.io/reason: machineconfiguration.openshift.io/state: Done kubernetes.io/hostname=ip-10-0-199-160.us-east-2.compute.internal machineconfiguration.openshift.io/controlPlaneTopology: HighlyAvailable machineconfiguration.openshift.io/currentConfig: rendered-master-716bbe8d388b0c03c7220b92261ce3b7 machineconfiguration.openshift.io/desiredConfig: rendered-master-53edd0e339dd68dc86dbcd4d60b244a6 machineconfiguration.openshift.io/reason: machineconfiguration.openshift.io/state: Working
直到全部更新
[root@bastion cluster-fe55]# oc get mcp NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-53edd0e339dd68dc86dbcd4d60b244a6 True False False 3 3 3 0 43m worker rendered-worker-8a02c6921084c925662477a94e36d6d2 True False False 3 3 3 0 43m
确认已经准备完成
[root@bastion cluster-fe55]# oc get machineconfig rendered-master-53edd0e339dd68dc86dbcd4d60b244a6 -o yaml | grep ExecStart | grep OVNKubernetes ExecStart=/usr/local/bin/configure-ovs.sh OVNKubernetes [root@bastion cluster-fe55]# oc get machineconfig rendered-worker-8a02c6921084c925662477a94e36d6d2 -o yaml | grep ExecStart | grep OVNKubernetes ExecStart=/usr/local/bin/configure-ovs.sh OVNKubernetes
3. 正式开始切换
[root@bastion cluster-fe55]# oc patch Network.config.openshift.io cluster --type='merge' --patch '{ "spec": { "networkType": "OVNKubernetes" } }' network.config.openshift.io/cluster patched
首先是multus daemonset会更新到新的版本,基于命令查看
[root@bastion cluster-fe55]# oc -n openshift-multus rollout status daemonset/multus daemon set "multus" successfully rolled out
4. 在multus的daemon roll out完成以后需要重启启动集群
如果不是在云环境中,或者可以ssh到每一个节点,可以基于下述脚本进行重启
cat << EOF > ~/reboot-nodes.sh #!/bin/bash for ip in $(oc get nodes -o jsonpath='{.items[*].status.addresses[?(@.type=="InternalIP")].address}') do echo "reboot node $ip" ssh -o StrictHostKeyChecking=no core@\$ip sudo shutdown -r -t 3 done EOF
如果在aws云环境,且helper节点无法连接到集群的情况,通过下述方式
[root@bastion ~]# oc debug node/ip-10-0-199-160.us-east-2.compute.internal Starting pod/ip-10-0-199-160us-east-2computeinternal-debug ... To use host binaries, run `chroot /host` Pod IP: 10.0.199.160 If you don't see a command prompt, try pressing enter. sh-4.4# chroot /host sh-4.4# systemctl reboot sh-4.4# Removing debug pod ...
重启完成后
[root@bastion ~]# oc get nodes NAME STATUS ROLES AGE VERSION ip-10-0-129-219.us-east-2.compute.internal Ready master 78m v1.23.5+9ce5071 ip-10-0-152-235.us-east-2.compute.internal Ready worker 73m v1.23.5+9ce5071 ip-10-0-166-169.us-east-2.compute.internal Ready master 78m v1.23.5+9ce5071 ip-10-0-190-233.us-east-2.compute.internal Ready worker 66m v1.23.5+9ce5071 ip-10-0-193-179.us-east-2.compute.internal Ready worker 73m v1.23.5+9ce5071 ip-10-0-199-160.us-east-2.compute.internal Ready master 78m v1.23.5+9ce5071 [root@bastion ~]# oc get network.config/cluster -o jsonpath='{.status.networkType}{"\n"}' OVNKubernetes
查看所有的co状态确保正常
查看ovn-kubernetes相关的pod状态信息
[root@bastion ~]# oc get pods -n openshift-ovn-kubernetes -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES ovnkube-master-2s82h 6/6 Running 15 (19m ago) 38m 10.0.166.169 ip-10-0-166-169.us-east-2.compute.internal <none> <none> ovnkube-master-hm5wj 6/6 Running 9 38m 10.0.129.219 ip-10-0-129-219.us-east-2.compute.internal <none> <none> ovnkube-master-jnll2 6/6 Running 15 (19m ago) 38m 10.0.199.160 ip-10-0-199-160.us-east-2.compute.internal <none> <none> ovnkube-node-bhnsg 5/5 Running 13 (19m ago) 38m 10.0.199.160 ip-10-0-199-160.us-east-2.compute.internal <none> <none> ovnkube-node-lzprb 5/5 Running 8 10m 10.0.129.219 ip-10-0-129-219.us-east-2.compute.internal <none> <none> ovnkube-node-m4jr5 5/5 Running 13 38m 10.0.190.233 ip-10-0-190-233.us-east-2.compute.internal <none> <none> ovnkube-node-m7pww 5/5 Running 13 38m 10.0.193.179 ip-10-0-193-179.us-east-2.compute.internal <none> <none> ovnkube-node-w4prc 5/5 Running 13 (19m ago) 38m 10.0.166.169 ip-10-0-166-169.us-east-2.compute.internal <none> <none> ovnkube-node-zn9fx 5/5 Running 13 38m 10.0.152.235 ip-10-0-152-235.us-east-2.compute.internal <none> <none>
可以看到ovnkube-master都落在master节点上,ovnkube-node每个节点都有。
[root@bastion ~]# oc get nodes | grep master ip-10-0-129-219.us-east-2.compute.internal Ready master 88m v1.23.5+9ce5071 ip-10-0-166-169.us-east-2.compute.internal Ready master 87m v1.23.5+9ce5071 ip-10-0-199-160.us-east-2.compute.internal Ready master 87m v1.23.5+9ce5071
ovnkube-master内包含的container
ovnkube-nodes下包含的container
[root@bastion ~]# oc get ds -n openshift-ovn-kubernetes NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE ovnkube-master 3 3 3 3 3 beta.kubernetes.io/os=linux,node-role.kubernetes.io/master= 56m ovnkube-node 6 6 6 6 6 beta.kubernetes.io/os=linux 56m
5. 修改operator的migration状态
[root@bastion ~]# oc patch Network.operator.openshift.io cluster --type='merge' --patch '{ "spec": { "migration": null } }' network.operator.openshift.io/cluster patched
清除工作
oc patch Network.operator.openshift.io cluster --type='merge' \ --patch '{ "spec": { "defaultNetwork": { "openshiftSDNConfig": null } } }' oc delete namespace openshift-sdn