Kubernetes实践技巧:集群升级k8s版本

更新证书

使用 kubeadm 安装 kubernetes 集群非常方便,但是也有一个比较烦人的问题就是默认的证书有效期只有一年时间,所以需要考虑证书升级的问题,本文的演示集群版本为 v1.16.2 版本,不保证下面的操作对其他版本也适用,在操作之前一定要先对证书目录进行备份,防止操作错误进行回滚。本文主要介绍两种方式来更新集群证书。

手动更新证书

由 kubeadm 生成的客户端证书默认只有一年有效期,我们可以通过 check-expiration 命令来检查证书是否过期:

$ kubeadm alpha certs check-expiration
CERTIFICATE                EXPIRES                  RESIDUAL TIME   EXTERNALLY MANAGED
admin.conf                 Nov 07, 2020 11:59 UTC   73d             no
apiserver                  Nov 07, 2020 11:59 UTC   73d             no
apiserver-etcd-client      Nov 07, 2020 11:59 UTC   73d             no
apiserver-kubelet-client   Nov 07, 2020 11:59 UTC   73d             no
controller-manager.conf    Nov 07, 2020 11:59 UTC   73d             no
etcd-healthcheck-client    Nov 07, 2020 11:59 UTC   73d             no
etcd-peer                  Nov 07, 2020 11:59 UTC   73d             no
etcd-server                Nov 07, 2020 11:59 UTC   73d             no
front-proxy-client         Nov 07, 2020 11:59 UTC   73d             no
scheduler.conf             Nov 07, 2020 11:59 UTC   73d             no

该命令显示 /etc/kubernetes/pki 文件夹中的客户端证书以及 kubeadm 使用的 KUBECONFIG 文件中嵌入的客户端证书的到期时间/剩余时间。

kubeadm 不能管理由外部 CA 签名的证书,如果是外部得证书,需要自己手动去管理证书的更新。

另外需要说明的是上面的列表中没有包含 kubelet.conf,因为 kubeadm 将 kubelet 配置为自动更新证书。

另外 kubeadm 会在控制面板升级的时候自动更新所有证书,所以使用 kubeadm 搭建得集群最佳的做法是经常升级集群,这样可以确保你的集群保持最新状态并保持合理的安全性。但是对于实际的生产环境我们可能并不会去频繁得升级集群,所以这个时候我们就需要去手动更新证书。

要手动更新证书也非常方便,我们只需要通过 kubeadm alpha certs renew 命令即可更新你的证书,这个命令用 CA(或者 front-proxy-CA )证书和存储在 /etc/kubernetes/pki 中的密钥执行更新。

如果你运行了一个高可用的集群,这个命令需要在所有控制面板节点上执行。

接下来我们来更新我们的集群证书,下面的操作都是在 master 节点上进行,首先备份原有证书:

$ mkdir /etc/kubernetes.bak
$ cp -r /etc/kubernetes/pki/ /etc/kubernetes.bak
$ cp /etc/kubernetes/*.conf /etc/kubernetes.bak

然后备份 etcd 数据目录:

$ cp -r /var/lib/etcd /var/lib/etcd.bak

接下来执行更新证书的命令:

$ kubeadm alpha certs renew all --config=kubeadm.yaml
kubeadm alpha certs renew all --config=kubeadm.yaml
certificate embedded in the kubeconfig file for the admin to use and for kubeadm itself renewed
certificate for serving the Kubernetes API renewed
certificate the apiserver uses to access etcd renewed
certificate for the API server to connect to kubelet renewed
certificate embedded in the kubeconfig file for the controller manager to use renewed
certificate for liveness probes to healthcheck etcd renewed
certificate for etcd nodes to communicate with each other renewed
certificate for serving etcd renewed
certificate for the front proxy client renewed
certificate embedded in the kubeconfig file for the scheduler manager to use renewed

通过上面的命令证书就一键更新完成了,这个时候查看上面的证书可以看到过期时间已经是一年后的时间了:

$ kubeadm alpha certs check-expiration
CERTIFICATE                EXPIRES                  RESIDUAL TIME   EXTERNALLY MANAGED
admin.conf                 Aug 26, 2021 03:47 UTC   364d            no
apiserver                  Aug 26, 2021 03:47 UTC   364d            no
apiserver-etcd-client      Aug 26, 2021 03:47 UTC   364d            no
apiserver-kubelet-client   Aug 26, 2021 03:47 UTC   364d            no
controller-manager.conf    Aug 26, 2021 03:47 UTC   364d            no
etcd-healthcheck-client    Aug 26, 2021 03:47 UTC   364d            no
etcd-peer                  Aug 26, 2021 03:47 UTC   364d            no
etcd-server                Aug 26, 2021 03:47 UTC   364d            no
front-proxy-client         Aug 26, 2021 03:47 UTC   364d            no
scheduler.conf             Aug 26, 2021 03:47 UTC   364d            no

然后记得更新下 kubeconfig 文件:

$ kubeadm init phase kubeconfig all --config kubeadm.yaml
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/admin.conf"
[kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/kubelet.conf"
[kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/controller-manager.conf"
[kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/scheduler.conf"

将新生成的 admin 配置文件覆盖掉原本的 admin 文件:

$ mv $HOME/.kube/config $HOME/.kube/config.old
$ cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
$ chown $(id -u):$(id -g) $HOME/.kube/config

完成后重启 kube-apiserver、kube-controller、kube-scheduler、etcd 这4个容器即可,我们可以查看 apiserver 的证书的有效期来验证是否更新成功:

$ docker restart `docker ps | grep etcd  | awk '{ print $1 }'`
$ docker restart `docker ps | grep kube-apiserver  | awk '{ print $1 }'`
$ docker restart `docker ps | grep kube-scheduler  | awk '{ print $1 }'`
$ docker restart `docker ps | grep kube-controller  | awk '{ print $1 }'`
systemctl restart kubelet
$ echo | openssl s_client -showcerts -connect 127.0.0.1:6443 -servername api 2>/dev/null | openssl x509 -noout -enddate
notAfter=Aug 26 03:47:23 2021 GMT

可以看到现在的有效期是一年过后的,证明已经更新成功了。

用 Kubernetes 证书 API 更新证书(不推荐)

除了上述的一键手动更新证书之外,还可以使用 Kubernetes 证书 API 执行手动证书更新。对于线上环境我们可能并不会去冒险经常更新集群或者去更新证书,这些毕竟是有风险的,所以我们希望生成的证书有效期足够长,虽然从安全性角度来说不推荐这样做,但是对于某些场景下一个足够长的证书有效期也是非常有必要的。有很多管理员就是去手动更改 kubeadm 的源码为10年,然后重新编译来创建集群,这种方式虽然可以达到目的,但是不推荐使用这种方式,特别是当你想要更新集群的时候,还得用新版本进行更新。其实 Kubernetes 提供了一种 API 的方式可以来帮助我们生成一个足够长证书有效期。

要使用内置的 API 方式来签名,首先我们需要配置 kube-controller-manager 组件的 --experimental-cluster-signing-duration 参数,将其调整为10年,我们这里是 kubeadm 安装的集群,所以直接修改静态 Pod 的 yaml 文件即可:

$ vi /etc/kubernetes/manifests/kube-controller-manager.yaml
......
spec:
  containers:
  - command:
    - kube-controller-manager
    # 设置证书有效期为 10 年
    - --experimental-cluster-signing-duration=87600h 
    - --client-ca-file=/etc/kubernetes/pki/ca.crt
......

修改完成后 kube-controller-manager 会自动重启生效。然后我们需要使用下面的命令为 Kubernetes 证书 API 创建一个证书签名请求。如果您设置例如 cert-manager 等外部签名者,则会自动批准证书签名请求(CSRs)。否者,您必须使用 kubectl certificate 命令手动批准证书。以下 kubeadm 命令输出要批准的证书名称,然后等待批准发生:

$ kubeadm alpha certs renew all --use-api --config kubeadm.yaml &

输出类似于以下内容:

[1] 2890
[certs] Certificate request "kubeadm-cert-kubernetes-admin-pn99f" created

然后接下来我们需要去手动批准证书:

$ kubectl get csr
NAME                                  AGE   REQUESTOR          CONDITION
kubeadm-cert-kubernetes-admin-pn99f   64s   kubernetes-admin   Pending
# 手动批准证书
$ kubectl certificate approve kubeadm-cert-kubernetes-admin-pn99f
certificatesigningrequest.certificates.k8s.io/kubeadm-cert-kubernetes-admin-pn99f approved

用同样的方式为处于 Pending 状态的 csr 执行批准操作,直到所有的 csr 都批准完成为止。最后所有的 csr 列表状态如下所示:

$ kubectl get csr
NAME                                                AGE     REQUESTOR          CONDITION
kubeadm-cert-front-proxy-client-llhrj               30s     kubernetes-admin   Approved,Issued
kubeadm-cert-kube-apiserver-2s6kf                   2m43s   kubernetes-admin   Approved,Issued
kubeadm-cert-kube-apiserver-etcd-client-t9pkx       2m7s    kubernetes-admin   Approved,Issued
kubeadm-cert-kube-apiserver-kubelet-client-pjbjm    108s    kubernetes-admin   Approved,Issued
kubeadm-cert-kube-etcd-healthcheck-client-8dcn8     64s     kubernetes-admin   Approved,Issued
kubeadm-cert-kubernetes-admin-pn99f                 4m29s   kubernetes-admin   Approved,Issued
kubeadm-cert-system:kube-controller-manager-mr86h   79s     kubernetes-admin   Approved,Issued
kubeadm-cert-system:kube-scheduler-t8lnw            17s     kubernetes-admin   Approved,Issued
kubeadm-cert-ydzs-master-cqh4s                      52s     kubernetes-admin   Approved,Issued
kubeadm-cert-ydzs-master-lvbr5                      41s     kubernetes-admin   Approved,Issued

批准完成后检查证书的有效期:

$ kubeadm alpha certs check-expiration
CERTIFICATE                EXPIRES                  RESIDUAL TIME   EXTERNALLY MANAGED
admin.conf                 Nov 05, 2029 11:53 UTC   9y              no
apiserver                  Nov 05, 2029 11:54 UTC   9y              no
apiserver-etcd-client      Nov 05, 2029 11:53 UTC   9y              no
apiserver-kubelet-client   Nov 05, 2029 11:54 UTC   9y              no
controller-manager.conf    Nov 05, 2029 11:54 UTC   9y              no
etcd-healthcheck-client    Nov 05, 2029 11:53 UTC   9y              no
etcd-peer                  Nov 05, 2029 11:53 UTC   9y              no
etcd-server                Nov 05, 2029 11:54 UTC   9y              no
front-proxy-client         Nov 05, 2029 11:54 UTC   9y              no
scheduler.conf             Nov 05, 2029 11:53 UTC   9y              no

我们可以看到已经延长小10年了,这是因为 ca 证书的有效期只有10年。

但是现在我们还不能直接重启控制面板的几个组件,这是因为使用 kubeadm 安装的集群对应的 etcd 默认是使用的 /etc/kubernetes/pki/etcd/ca.crt 这个证书进行前面的,而上面我们用命令 kubectl certificate approve 批准过后的证书是使用的默认的 /etc/kubernetes/pki/ca.crt 证书进行签发的,所以我们需要替换 etcd 中的 ca 机构证书:

# 先拷贝静态 Pod 资源清单
$ cp -r /etc/kubernetes/manifests/ /etc/kubernetes/manifests.bak
$ cp /etc/kubernetes/pki/ca.crt /etc/kubernetes/pki/etcd/ca.crt
$ cp /etc/kubernetes/pki/ca.key /etc/kubernetes/pki/etcd/ca.key

除此之外还需要替换 requestheader-client-ca-file 文件,默认是 /etc/kubernetes/pki/front-proxy-ca.crt 文件,现在也需要替换成默认的 CA 文件,否则使用聚合 API,比如安装了 metrics-server 后执行 kubectl top 命令就会报错:

$ cp /etc/kubernetes/pki/ca.crt /etc/kubernetes/pki/front-proxy-ca.crt
$ cp /etc/kubernetes/pki/ca.key /etc/kubernetes/pki/front-proxy-ca.key

由于是静态 Pod,修改完成后上面的组件都会自动重启生效。由于我们当前版本的 kubelet 默认开启了证书自动轮转,所以 kubelet 的证书也不用再去管理了,这样我就将证书更新成10有效期了。在操作之前一定要先对证书目录进行备份,防止操作错误进行回滚。

集群升级

最新的 Kubernetes 版本是 v1.19.0,我们这里的环境是 v1.16.2,由于我们这里版本跨度太大,不能直接从 1.16.x 更新到 1.19.x,kubeadm 的更新是不支持跨多个主版本的,所以我们可以一个版本一个版本的升级,不过版本更新的方式方法基本上都是一样的,所以后面要更新的话也挺简单了,下面我们就先将集群更新到 v1.16.14 版本。

首先查看当前集群版本:

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.2", GitCommit:"c97fe5036ef3df2967d086711e6c0c405941e14b", GitTreeState:"clean", BuildDate:"2019-10-15T19:18:23Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.2", GitCommit:"c97fe5036ef3df2967d086711e6c0c405941e14b", GitTreeState:"clean", BuildDate:"2019-10-15T19:09:08Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"}

首先我们保留 kubeadm config 文件:

$ kubeadm config view > kubeadm-config.yaml
apiServer:
  extraArgs:
    authorization-mode: Node,RBAC
  timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns:
  type: CoreDNS
etcd:
  local:
    dataDir: /var/lib/etcd
imageRepository: registry.aliyuncs.com/k8sxio  # 修改成阿里云镜像源
kind: ClusterConfiguration
kubernetesVersion: v1.16.14
networking:
  dnsDomain: cluster.local
  podSubnet: 10.244.0.0/16
  serviceSubnet: 10.96.0.0/12
scheduler: {}

将上面的 imageRepository 值更改为:registry.aliyuncs.com/k8sxio,然后保存内容到文件 kubeadm-config.yaml 中(当然如果你的集群可以获取到 grc.io 的镜像可以不用更改)。

然后更新 kubeadm:

$ yum makecache fast && yum install -y kubeadm-1.16.14-0 kubectl-1.16.14-0
$ kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.14", GitCommit:"d2a081c8e14e21e28fe5bdfa38a817ef9c0bb8e3", GitTreeState:"clean", BuildDate:"2020-08-13T12:31:14Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}

因为 kubeadm upgrade plan 命令执行过程中会去 dl.k8s.io 获取版本信息,这个地址是需要科学方法才能访问的,所以我们可以先将 kubeadm 更新到目标版本,然后就可以查看到目标版本升级的一些信息了。

执行 upgrade plan 命令查看是否可以升级:

$ kubeadm upgrade plan
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[preflight] Running pre-flight checks.
[upgrade] Making sure the cluster is healthy:
[upgrade] Fetching available versions to upgrade to
[upgrade/versions] Cluster version: v1.16.2
[upgrade/versions] kubeadm version: v1.16.14
I0827 15:46:54.805052   11355 version.go:251] remote version is much newer: v1.19.0; falling back to: stable-1.16
[upgrade/versions] Latest stable version: v1.16.14
[upgrade/versions] Latest version in the v1.16 series: v1.16.14

Components that must be upgraded manually after you have upgraded the control plane with 'kubeadm upgrade apply':
COMPONENT   CURRENT       AVAILABLE
Kubelet     7 x v1.16.2   v1.16.14

Upgrade to the latest version in the v1.16 series:

COMPONENT            CURRENT   AVAILABLE
API Server           v1.16.2   v1.16.14
Controller Manager   v1.16.2   v1.16.14
Scheduler            v1.16.2   v1.16.14
Kube Proxy           v1.16.2   v1.16.14
CoreDNS              1.6.2     1.6.2
Etcd                 3.3.15    3.3.15-0

You can now apply the upgrade by executing the following command:

    kubeadm upgrade apply v1.16.14

_____________________________________________________________________

我们可以先使用 dry-run 命令查看升级信息:

$ kubeadm upgrade apply v1.16.14 --config kubeadm-config.yaml --dry-run
``

注意要通过 `--config` 指定上面保存的配置文件,该配置文件信息包含了上一个版本的集群信息以及修改过后的镜像地址。

查看了上面的升级信息确认无误后就可以执行升级操作了,我们可以先提前下载所需镜像: 

$ kubeadm config images pull --config kubeadm-config.yaml
[config/images] Pulled registry.aliyuncs.com/k8sxio/kube-apiserver:v1.16.14
[config/images] Pulled registry.aliyuncs.com/k8sxio/kube-controller-manager:v1.16.14
[config/images] Pulled registry.aliyuncs.com/k8sxio/kube-scheduler:v1.16.14
[config/images] Pulled registry.aliyuncs.com/k8sxio/kube-proxy:v1.16.14
[config/images] Pulled registry.aliyuncs.com/k8sxio/pause:3.1
[config/images] Pulled registry.aliyuncs.com/k8sxio/etcd:3.3.15-0
[config/images] Pulled registry.aliyuncs.com/k8sxio/coredns:1.6.2


然后就可以执行真正的升级命令:

$ kubeadm upgrade apply v1.16.14 --config kubeadm-config.yaml
kubeadm upgrade apply v1.16.14 --config kubeadm-config.yaml
[upgrade/config] Making sure the configuration is correct:
[preflight] Running pre-flight checks.
[upgrade] Making sure the cluster is healthy:
[upgrade/version] You have chosen to change the cluster version to "v1.16.14"
[upgrade/versions] Cluster version: v1.16.2
[upgrade/versions] kubeadm version: v1.16.14
[upgrade/confirm] Are you sure you want to proceed with the upgrade? [y/N]:y
......


隔一段时间看到如下信息就证明集群升级成功了:

......
[addons]: Migrating CoreDNS Corefile
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

[upgrade/successful] SUCCESS! Your cluster was upgraded to "v1.16.14". Enjoy!

[upgrade/kubelet] Now that your control plane is upgraded, please proceed with upgrading your kubelets if you haven't already done so.


由于上面我们已经更新过 kubectl 了,现在我们用 kubectl 来查看下版本信息: 

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.14", GitCommit:"d2a081c8e14e21e28fe5bdfa38a817ef9c0bb8e3", GitTreeState:"clean", BuildDate:"2020-08-13T12:33:34Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.14", GitCommit:"d2a081c8e14e21e28fe5bdfa38a817ef9c0bb8e3", GitTreeState:"clean", BuildDate:"2020-08-13T12:24:51Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ydzs-master Ready master 292d v1.16.2
ydzs-node1 Ready 292d v1.16.2
ydzs-node2 Ready 292d v1.16.2
ydzs-node3 Ready 290d v1.16.2
ydzs-node4 Ready 290d v1.16.2
ydzs-node5 Ready 218d v1.16.2
ydzs-node6 Ready 218d v1.16.2


可以看到版本并没有更新,这是因为节点上的 kubelet 还没有更新的,我们可以通过 kubelet 查看下版本:

$ kubelet --version
Kubernetes v1.16.2


这个时候我们去手动更新下 kubelet: 

$ yum install -y kubelet-1.16.14-0

安装完成后查看下版本

$ kubelet --version
Kubernetes v1.16.14

然后重启 kubelet 服务

$ systemctl daemon-reload
$ systemctl restart kubelet
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ydzs-master Ready master 292d v1.16.14
ydzs-node1 Ready 292d v1.16.2
ydzs-node2 Ready 292d v1.16.2
ydzs-node3 Ready 290d v1.16.2
ydzs-node4 Ready 290d v1.16.2
ydzs-node5 Ready 218d v1.16.2
ydzs-node6 Ready 218d v1.16.2


可以看到 master 节点已经更新到 v1.16.14 版本了,然后就可以去升级节点了,升级节点的时候最好先驱逐节点,然后逐个升级: 

$ kubectl drain ydzs-node1 --ignore-daemonsets
node/ydzs-node1 cordoned
error: unable to drain node "ydzs-node1", aborting command...

There are pending nodes to be drained:
ydzs-node1
error: cannot delete Pods with local storage (use --delete-local-data to override): rook-ceph/csi-cephfsplugin-provisioner-56c8b7ddf4-n96kk, rook-ceph/csi-rbdplugin-provisioner-6ff4dd4b94-2bl82
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ydzs-master Ready master 292d v1.16.14
ydzs-node1 Ready,SchedulingDisabled 292d v1.16.2
ydzs-node2 Ready 292d v1.16.2
ydzs-node3 Ready 290d v1.16.2
ydzs-node4 Ready 290d v1.16.2
ydzs-node5 Ready 218d v1.16.2
ydzs-node6 Ready 218d v1.16.2


然后在 ydzs-node1 节点上执行更新命令: 

$ kubeadm upgrade node
[upgrade] Reading configuration from the cluster...
[upgrade] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[upgrade] Skipping phase. Not a control plane node[kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.16" ConfigMap in the kube-system namespace
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[upgrade] The configuration for this node was successfully updated!
[upgrade] Now you should go ahead and upgrade the kubelet package using your package manager.

更新软件包

$ yum install -y kubeadm-1.16.14-0 kubectl-1.16.14-0 kubelet-1.16.14-0

安装完成后重启 kubelet

$ systemctl daemon-reload
$ systemctl restart kubelet


更新完成后,确认节点升级成功:

$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ydzs-master Ready master 292d v1.16.14
ydzs-node1 Ready,SchedulingDisabled 292d v1.16.14
ydzs-node2 Ready 292d v1.16.2
ydzs-node3 Ready 290d v1.16.2
ydzs-node4 Ready 290d v1.16.2
ydzs-node5 Ready 218d v1.16.2
ydzs-node6 Ready 218d v1.16.2


然后解除禁止调度:

$ kubectl uncordon ydzs-node1
node/ydzs-node1 uncordoned


用同样的方式升级其他节点即可升级成功: 

$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ydzs-master Ready master 292d v1.16.14
ydzs-node1 Ready 292d v1.16.14
ydzs-node2 Ready 292d v1.16.14
ydzs-node3 Ready 290d v1.16.14
ydzs-node4 Ready 290d v1.16.14
ydzs-node5 Ready 218d v1.16.14
ydzs-node6 Ready 218d v1.16.14

posted @ 2022-05-31 09:40  哈喽哈喽111111  阅读(1291)  评论(0编辑  收藏  举报