kubernetes 1.19 版本升级
kubernetes集群需要升级那些组件:
-
升级管理节点,管理节点上的kube-apiserver,kuber-controller-manager,kube-scheduler,etcd等;
-
其他管理节点,管理节点如果以高可用的方式部署,多个高可用节点需要一并升级;
-
worker工作节点,升级工作节点上的Container Runtime如docker,kubelet和kube-proxy。
版本升级分为两类(小版本和大版本):
-
小版本升级,例如1.19.1可以跨版本升级到1.19.5
-
大版本升级,例如1.19x升级到1.20x
升级注意,不能跨版本升级,比如:
-
1.19.x → 1.20.y
——是可以的(其中y > x) -
1.19.x → 1.21.y
——不可以【跨段了】(其中y > x) -
1.21.x→ 1.21.y
所以,如果需要跨大版本升级,必须多次逐步升级
1. etcd升级
etcd现有版本:https://github.com/etcd-io/etcd/tree/main/CHANGELOG
查询集群状态
# 查看版本
etcdctl version
etcdctl version: 3.4.15
API version: 3.4
# 查看健康状态
etcdctl --cacert=/etc/kubernetes/pki/ca.pem --cert=/etc/kubernetes/pki/etcd.pem --key=/etc/kubernetes/pki/etcd-key.pem --endpoints="https://192.168.80.45:2379,https://192.168.80.46:2379,https://192.168.80.47:2379" endpoint health
https://192.168.80.47:2379 is healthy: successfully committed proposal: took = 23.431273ms
https://192.168.80.46:2379 is healthy: successfully committed proposal: took = 21.958927ms
https://192.168.80.45:2379 is healthy: successfully committed proposal: took = 35.090404ms
# 查看集群
etcdctl --cacert=/etc/kubernetes/pki/ca.pem --cert=/etc/kubernetes/pki/etcd.pem --key=/etc/kubernetes/pki/etcd-key.pem --endpoints="https://192.168.80.45:2379,https://192.168.80.46:2379,https://192.168.80.47:2379" endpoint status
https://192.168.80.45:2379, 46bc5ad35e418584, 3.4.15, 4.5 MB, false, false, 123, 498143, 498143,
https://192.168.80.46:2379, b01e7a29099f3eb8, 3.4.15, 4.5 MB, true, false, 123, 498143, 498143,
https://192.168.80.47:2379, 8f347c1327049bc8, 3.4.15, 4.5 MB, false, false, 123, 498143, 498143,
# 备份leader数据,备份单节点上的数据
etcdctl --cacert=/etc/kubernetes/pki/ca.pem --cert=/etc/kubernetes/pki/etcd.pem --key=/etc/kubernetes/pki/etcd-key.pem --endpoints="https://192.168.80.45:2379" snapshot save `date +%Y%m%d%H`.db
{"level":"info","ts":1651323491.1928215,"caller":"snapshot/v3_snapshot.go:119","msg":"created temporary db file","path":"2022430.db.part"}
{"level":"info","ts":"2022-04-30T05:58:11.198-0700","caller":"clientv3/maintenance.go:200","msg":"opened snapshot stream; downloading"}
{"level":"info","ts":1651323491.1991322,"caller":"snapshot/v3_snapshot.go:127","msg":"fetching snapshot","endpoint":"https://192.168.80.45:2379"}
{"level":"info","ts":"2022-04-30T05:58:11.262-0700","caller":"clientv3/maintenance.go:208","msg":"completed snapshot read; closing"}
{"level":"info","ts":1651323491.2794814,"caller":"snapshot/v3_snapshot.go:142","msg":"fetched snapshot","endpoint":"https://192.168.80.45:2379","size":"4.5 MB","took":0.086043736}
{"level":"info","ts":1651323491.2810626,"caller":"snapshot/v3_snapshot.go:152","msg":"saved","path":"2022430.db"}
Snapshot saved at 2022043007.db
参考:https://cloud.tencent.com/developer/article/1852345
注意:先根据上面命令查看etcd集群节点不是集群leader的哪些,先对这些节点进行轮巡升级。
# 1.下载升级版本的etcd
#!/bin/bash
# Desc:
export ETCDCTL_API=3
ETCD_VER=v3.4.18
ETCD_DIR=etcd-download
DOWNLOAD_URL=https://github.com/coreos/etcd/releases/download
# Download
mkdir -p ${ETCD_DIR}
cd ${ETCD_DIR}
pwd
rm -rf etcd-${ETCD_VER}-linux-amd64.tar.gz etcd-${ETCD_VER}-linux-amd64
wget ${DOWNLOAD_URL}/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz
tar -xzvf etcd-${ETCD_VER}-linux-amd64.tar.gz
# Install
cd etcd-${ETCD_VER}-linux-amd64
systemctl stop etcd.service
cp etcd /usr/bin/
cp etcdctl /usr/bin/
systemctl start etcd.service
# 2.其他节点操作
which etcd etcdctl
/usr/bin/etcd # 统一命令位置
/usr/bin/etcdct
systemctl stop etcd.service
# 3.leader机器将etcd包cp到其他机器
cd /root/etcd-download
scp -r etcd-v3.4.18-linux-amd64/etcd* root@192.168.80.46:/usr/bin
scp -r etcd-v3.4.18-linux-amd64/etcd* root@192.168.80.47:/usr/bin
# 4.升级的机器开启服务
systemctl start etcd.service
systemctl status etcd.service
etcd --version
思路:在升级集群master时是有副本的,本次升级有master1
和master2
,目前是master1
,在管理集群那我们就先从master2下手,然后将vip ip漂移到master2
上面在升级master1
。
注:原来的执行文件记得备份一下
升级目录:/root/kubernetes/server/bin
升级文件:ls /usr/bin/kube*
/usr/bin/kube-apiserver
/usr/bin/kube-controller-manager
/usr/bin/kubectl
/usr/bin/kubelet
/usr/bin/kube-proxy
/usr/bin/kube-scheduler
# 1.禁止调度、下载、解压
kubectl cordon k8s-master2
wget https://dl.k8s.io/v1.20.15/kubernetes-server-linux-amd64.tar.gz
tar xf kubernetes-server-linux-amd64.tar.gz
# 2.升级apiserver
systemctl stop kube-apiserver.service
cd kubernetes/server/bin/
cp -rp kube-apiserver /usr/bin/kube-apiserver
systemctl daemon-reload
systemctl restart kube-apiserver.service
systemctl status kube-apiserver.service
/usr/bin/kube-apiserver --version
# 3.升级kube-controller-manager、kube-scheduler
systemctl stop kube-controller-manager kube-scheduler
cp -rp kube-controller-manager kube-scheduler /usr/bin/
systemctl daemon-reload
systemctl restart kube-controller-manager kube-scheduler
systemctl status kube-controller-manager kube-scheduler
/usr/bin/kube-controller-manager --version
/usr/bin/kube-scheduler --version
# 4.升级kube-proxy
systemctl stop kube-proxy
cp -rp kube-proxy /usr/bin/
systemctl daemon-reload
systemctl restart kube-proxy
systemctl status kube-proxy
/usr/bin/kube-proxy --version
# 5.升级kubectl
cp -rp kubectl /usr/bin/
/usr/bin/kubectl version
# 6.升级kubelet
systemctl stop kubelet.service
cp -rp kubelet /usr/bin/
systemctl daemon-reload
systemctl restart kubelet.service
systemctl status kubelet.service
/usr/bin/kubelet --version
ip地址
# master2加入集群
kubectl uncordon k8s-master2
node/k8s-master2 cordoned
kubectl get node
NAME STATUS ROLES AGE VERSION
k8s-master1 Ready master 12d v1.19.11
k8s-master2 Ready master 10d v1.20.15
k8s-node01 Ready node 12d v1.19.11
k8s-node02 Ready node 12d v1.19.11
# 关闭nginx代理将192.168.80.100漂移到master2
systemctl stop nginx.service
ip addr | grep 192.168.80.100
inet 192.168.80.100/24 scope global secondary ens33
# master1禁止调度
kubectl cordon k8s-master1
kubectl get node
现在集群由master2进行管理了,也将master1踢出了集群,现在我们可以对master1进行升级了
# master2上copy二进制文件目录到master1
scp -r kubernetes 192.168.80.45:/root/
# master1上面目录已经存在
注:请将上面升级的命令在master1 升级目录下在执行一遍 (下载除外)
3. node节点升级
# 1.下线Node节点,节点已经变成SchedulingDisabled状态
kubectl drain k8s-node01 --delete-local-data --force --ignore-daemonsets
# 2.升级kubelet、kube-proxy,从master上面scp过去
systemctl stop kube-proxy.service
systemctl stop kubelet.service
scp kubelet kube-proxy 192.168.80.46:/usr/bin
scp kubelet kube-proxy 192.168.80.47:/usr/bin
systemctl daemon-reload
# 3.node节点启动
systemctl restart kube-proxy.service
systemctl restart kubelet.service
/usr/bin/kube-proxy --version
/usr/bin/kubelet --version
# 4.node加入集群
kubectl uncordon k8s-node01
kubectl uncordon k8s-node02
kubectl get node
#注意:其他node节点逻辑上是一样的
目前集群calico版本是最新的v3.23
,目前k8s1.20最高可使用版本是v3.21
,这里演示怎样升级
下线节点:
kubectl drain k8s-master2 --delete-local-data --force --ignore-daemonsets
升级节点:
# 1.网络和网络策略管理都是用Calico
curl https://docs.projectcalico.org/v3.21/manifests/calico.yaml -O # 我用的是这个
# 2.网络策略管理用Calico和网络用flannel
curl https://docs.projectcalico.org/manifests/canal.yaml -O
# 3.备份原来的cm、deploy
kubectl get cm -n kube-system calico-config -oyaml > calico-config.yaml
kubectl get deploy -n kube-system calico-kube-controllers -oyaml > calico-kube-controllers.yaml
# 4.升级k8s-master02
cat calico.yaml | grep image
image: docker.io/calico/cni:v3.21.6
image: docker.io/calico/cni:v3.21.6
image: docker.io/calico/pod2daemon-flexvol:v3.21.6
image: docker.io/calico/node:v3.21.6
image: docker.io/calico/kube-controllers:v3.21.6
# 5.更改一下更新策略(防止更新失败)
# 原来的策略:
updateStrategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
# 更改为:
updateStrategy:
type: OnDelete
rollingUpdate:
maxUnavailable: 1
# 然后apply它
kubectl apply -f calico.yaml
# 6.查看是否加载成功
kubectl describe ds -n kube-system | grep Image:
# 查找calico的镜像,看到v3.22.2说明更新成功
Image: docker.io/calico/cni:v3.21.6
Image: docker.io/calico/cni:v3.21.6
Image: docker.io/calico/pod2daemon-flexvol:v3.21.6
Image: docker.io/calico/node:v3.21.6
# 更新策略也变了
kubectl edit ds -n kube-system
updateStrategy:
rollingUpdate:
maxUnavailable: 1
type: OnDelete
# CIDR的值,与 kube-controller-manager中“--cluster-cidr=10.244.0.0/16” 一致 文件:/etc/kubernetes/kube-controller-manager.conf
3680 # The default IPv4 pool to create on startup if none exists. Pod IPs will be
3681 # chosen from this range. Changing this value after installation will have
3682 # no effect. This should fall within `--cluster-cidr`.
3683 - name: CALICO_IPV4POOL_CIDR
3684 value: "10.244.0.0/16"
# 7.更新
kubectl apply -f calico.yaml
# 8.查看是否加载成功
kubectl edit ds -n kube-system calico-node
#查找calico的镜像
image: docker.io/calico/cni:v3.21.6
# 更新策略也变了
updateStrategy:
rollingUpdate:
maxUnavailable: 1
type: OnDelete
# 9.启动kubelet
systemctl start kubelet
# 10.上线节点k8s-master2
kubectl uncordon k8s-master2
# 11.更新Calico
kubectl get pod -n kube-system -owide | grep k8s-master02
kubectl delete pod -n kube-system calico-node-5m2ql
# 12.查看更新状态
kubectl get pod -n kube-system -owide | grep k8s-master2
# 13.等状态变成Running,查看版本是否更新成功
kubectl edit po calico-node-ltgpk -n kube-system
# calico更新完成
注:这里calico升级可以随着节点升级一起来,这样我们就不用第二次去禁止调度在将节点加入集群了。
部署文档:https://github.com/coredns/deployment/tree/master/kubernetes
# coredns官网:https://github.com/coredns/coredns
# 老版本用:kube-dns
# 新版的都用:coredns3
# 1.查看当前版本
kubectl get pod -n kube-system coredns-7bb48b4bc5-6gxlr -oyaml | grep image:
image: coredns/coredns:1.6.2
# 2.备份原来的cm、deploy、clusterrole、clusterrolebinding
kubectl get cm -n kube-system coredns -oyaml > coredns-config.yaml
kubectl get deploy -n kube-system coredns -oyaml > coredns-controllers.yaml
kubectl get clusterrole system:coredns -oyaml > coredns-clusterrole.yaml
kubectl get clusterrolebinding system:coredns -oyaml > coredns-clusterrolebinding.yaml
# 3.升级
git clone https://github.com/coredns/deployment.git
cd deployment/kubernetes/
#查看版本
grep image: ./ -nr
image: coredns/coredns:1.9.1
# 修改coredns.yaml.sed文件
loop 去掉
./deploy.sh -s | kubectl apply -f -
kubectl get pods -n kube-system
# 4.验证阶段,登录nginx容器
# 验证 busybox1.28.4有问题
kubectl run -it --rm dns-test --image=busybox:1.28.4 /bin/sh
If you don't see a command prompt, try pressing enter.
/ # nslookup kubernetes
Server: 10.254.0.2
Address 1: 10.254.0.2 kube-dns.kube-system.svc.cluster.local
Name: kubernetes
Address 1: 10.254.0.1 kubernetes.default.svc.cluster.local