Kubernetes etcd备份恢复

Kubernetes etcd备份恢复

所有 Kubernetes 对象都存储在 etcd 上。定期备份 etcd 集群数据对于在灾难场景(例如丢失所有控制平面节点)下恢复 Kubernetes 集群非常重要。 快照文件包含所有 Kubernetes 状态和关键信息。

在一个基线上为etcd做快照能够实现etcd数据的备份。通过定期地为etcd节点后端数据库做快照,etcd就能从一个已知的良好状态的时间点进行恢复。运行在虚拟机的k8s集群,如果偶遇突然断电,就可能会部分文件有问题,导致etcd和apiserver起不来,这样整个集群都无法运行,因此在k8s的集群进行etcd备份十分重要,下面主要演示单master集群和多master集群2个方面。

单master集群

环境准备:kubeadm安装的一主三从

root@k8s-01 ~]# kubectl get nodes
NAME     STATUS   ROLES                  AGE   VERSION
k8s-01   Ready    control-plane,master   22h   v1.22.3
k8s-02   Ready    <none>                 22h   v1.22.3
k8s-03   Ready    <none>                 22h   v1.22.3
k8s-04   Ready    <none>                 22h   v1.22.3
[root@k8s-01 ~]#
  1. 先备份etcd数据
[root@k8s-01 kubernetes]#  ETCDCTL_API=3 etcdctl snapshot save /opt/etcd-back/snap.db --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key
  1. 创建测试pod
[root@k8s-01 ~]# kubectl get pods
NAME                                      READY   STATUS    RESTARTS        AGE
nfs-client-provisioner-69b76b8dc6-6l8xs   1/1     Running   7 (3h55m ago)   4h43m
nginx-6799fc88d8-5rqg8                    1/1     Running   0               48s
nginx-6799fc88d8-phvkx                    1/1     Running   0               48s
nginx-6799fc88d8-rwjc6                    1/1     Running   0               48s
[root@k8s-01 ~]#
  1. 停止etcd和apiserver

control-plane阶段用于为API Server、Controller Manager和Scheduler生成静态Pod配置清单,而etcd阶段则为本地etcd存储生成静态Pod配置清单,它们都会保存于/etc/kubernetes/manifests目录中。当前主机上的kubelet服务会监视该目录中的配置清单的创建、变动和删除等状态变动,并根据变动完成Pod创建、更新或删除操作。因此,这两个阶段创建生成的各配置清单将会启动Master组件的相关Pod

[root@k8s-01 kubernetes]#  mv /etc/kubernetes/manifests/ /etc/kubernetes/manifests-backup/
[root@k8s-01 kubernetes]# kubectl get pods -A
The connection to the server 192.168.1.128:6443 was refused - did you specify the right host or port?
[root@k8s-01 kubernetes]#

image-20220718230447435

  1. 变更/var/lib/etcd
[root@k8s-01 kubernetes]#  mv /var/lib/etcd /var/lib/etcd.bak
[root@k8s-01 kubernetes]#
  1. 恢复etcd数据
[root@k8s-01 lib]# ETCDCTL_API=3 etcdctl --endpoints="https://127.0.0.1:2379"  --cert="/etc/kubernetes/pki/etcd/server.crt"  --key="/etc/kubernetes/pki/etcd/server.key"  --cacert="/etc/kubernetes/pki/etcd/ca.crt"   snapshot restore /opt/etcd-back/snap.db  --data-dir=/var/lib/etcd/
  1. 启动etcd和apiserver,查看pods
[root@k8s-01 lib]#  cd /etc/kubernetes/
[root@k8s-01 kubernetes]#  mv manifests-backup manifests
[root@k8s-01 kubernetes]#  kubectl get pods
NAME                                      READY   STATUS    RESTARTS         AGE
nfs-client-provisioner-69b76b8dc6-6l8xs   1/1     Running   12 (2m25s ago)   4h48m
[root@k8s-01 ~]#  kubectl get pods -n kube-system
NAME                                       READY   STATUS    RESTARTS       AGE
calico-kube-controllers-65898446b5-t2mqq   1/1     Running   11 (16h ago)   21h
calico-node-8md6b                          1/1     Running   0              21h
calico-node-9457b                          1/1     Running   0              21h
calico-node-nxs2w                          1/1     Running   0              21h
calico-node-p7d52                          1/1     Running   0              21h
coredns-7f6cbbb7b8-g84gl                   1/1     Running   0              22h
coredns-7f6cbbb7b8-j9q4q                   1/1     Running   0              22h
etcd-k8s-01                                1/1     Running   0              22h
kube-apiserver-k8s-01                      1/1     Running   0              22h
kube-controller-manager-k8s-01             1/1     Running   0              22h
kube-proxy-49b8g                           1/1     Running   0              22h
kube-proxy-8wh5l                           1/1     Running   0              22h
kube-proxy-b6lqq                           1/1     Running   0              22h
kube-proxy-tldpv                           1/1     Running   0              22h
kube-scheduler-k8s-01                      1/1     Running   0              22h
[root@k8s-01 ~]#

由于3个nginx是备份之后启动的,所以恢复后都不存在了。

多master集群

环境准备:kubeadm安装的二主二从

[root@k8s-01 ~]# kubectl get nodes
NAME     STATUS   ROLES                  AGE   VERSION
k8s-01   Ready    control-plane,master   16h   v1.22.3
k8s-02   Ready    control-plane,master   16h   v1.22.3
k8s-03   Ready    <none>                 16h   v1.22.3
k8s-04   Ready    <none>                 16h   v1.22.3
[root@k8s-01 etcd-v3.5.4-linux-amd64]# ETCDCTL_API=3 etcdctl --endpoints=https://192.168.1.123:2379,https://192.168.1.124:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key member list
58915ab47aed1957, started, k8s-02, https://192.168.1.124:2380, https://192.168.1.124:2379, false
c48307bcc0ac155e, started, k8s-01, https://192.168.1.123:2380, https://192.168.1.123:2379, false
[root@k8s-01 etcd-v3.5.4-linux-amd64]#
  1. 2台master都需要备份:
[root@k8s-01 ~]# ETCDCTL_API=3 etcdctl --endpoints="https://127.0.0.1:2379"  --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key  snapshot save /snap-$(date +%Y%m%d%H%M).db
[root@k8s-02 ~]# ETCDCTL_API=3 etcdctl --endpoints="https://127.0.0.1:2379"  --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key  snapshot save /snap-$(date +%Y%m%d%H%M).db
  1. 创建3个测试pod
[root@k8s-01 ~]# kubectl get pods
NAME                      READY   STATUS    RESTARTS   AGE
nginx-6799fc88d8-2x6gw    1/1     Running   0          4m22s
nginx-6799fc88d8-82mjz    1/1     Running   0          4m22s
nginx-6799fc88d8-sbb6n    1/1     Running   0          4m22s
tomcat-7d987c7694-552v2   1/1     Running   0          2m8s
[root@k8s-01 ~]#
  1. 停掉Master机器的kube-apiserver和etcd
[root@k8s-01 kubernetes]#  mv /etc/kubernetes/manifests/ /etc/kubernetes/manifests-backup/
[root@k8s-02 kubernetes]#  mv /etc/kubernetes/manifests/ /etc/kubernetes/manifests-backup/
  1. 变更/var/lib/etcd
[root@k8s-01 kubernetes]#  mv /var/lib/etcd /var/lib/etcd.bak
[root@k8s-02 kubernetes]#  mv /var/lib/etcd /var/lib/etcd.bak
  1. 恢复etcd数据,etcd集群用同一份snapshot恢复;
[root@k8s-01 /]# ETCDCTL_API=3 etcdctl snapshot restore /snap-202207182330.db     --endpoints=192.168.1.123:2379     --name=k8s-01    --cacert=/etc/kubernetes/pki/etcd/ca.crt  --cert=/etc/kubernetes/pki/etcd/peer.crt   --key=/etc/kubernetes/pki/etcd/peer.key       --initial-advertise-peer-urls=https://192.168.1.123:2380     --initial-cluster-token=etcd-cluster-0     --initial-cluster=k8s-01=https://192.168.1.123:2380,k8s-02=https://192.168.1.124:2380      --data-dir=/var/lib/etcd
[root@k8s-01 /]# scp snap-202207182330.db root@192.168.1.124:/
root@192.168.1.124's password:
snap-202207182330.db                                                                                                  100% 4780KB  45.8MB/s   00:00
[root@k8s-02 /]# ETCDCTL_API=3 etcdctl snapshot restore /snap-202207182330.db     --endpoints=192.168.1.124:2379     --name=k8s-02    --cacert=/etc/kubernetes/pki/etcd/ca.crt  --cert=/etc/kubernetes/pki/etcd/peer.crt   --key=/etc/kubernetes/pki/etcd/peer.key       --initial-advertise-peer-urls=https://192.168.1.124:2380     --initial-cluster-token=etcd-cluster-0     --initial-cluster=k8s-01=https://192.168.1.123:2380,k8s-02=https://192.168.1.124:2380      --data-dir=/var/lib/etcd

6.master节点上启动etcd和apiserver,查看pods

[root@k8s-01 lib]#  cd /etc/kubernetes/
[root@k8s-01 kubernetes]#  mv manifests-backup manifests
[root@k8s-02 lib]#  cd /etc/kubernetes/
[root@k8s-02 kubernetes]#  mv manifests-backup manifests
[root@k8s-01 lib]# kubectl get pods
NAME                     READY   STATUS    RESTARTS   AGE
nginx-6799fc88d8-2x6gw   1/1     Running   0          16m
nginx-6799fc88d8-82mjz   1/1     Running   0          16m
nginx-6799fc88d8-sbb6n   1/1     Running   0          16m
[root@k8s-01 ~]# kubectl get pods -n kube-system
NAME                                       READY   STATUS    RESTARTS       AGE
calico-kube-controllers-65898446b5-drjjj   1/1     Running   10 (16h ago)   16h
calico-node-9s7p2                          1/1     Running   0              16h
calico-node-fnbj4                          1/1     Running   0              16h
calico-node-nx6q6                          1/1     Running   0              16h
calico-node-qcffj                          1/1     Running   0              16h
coredns-7f6cbbb7b8-mn9hj                   1/1     Running   0              16h
coredns-7f6cbbb7b8-nrwbf                   1/1     Running   0              16h
etcd-k8s-01                                1/1     Running   1              16h
etcd-k8s-02                                1/1     Running   0              16h
kube-apiserver-k8s-01                      1/1     Running   2 (16h ago)    16h
kube-apiserver-k8s-02                      1/1     Running   0              16h
kube-controller-manager-k8s-01             1/1     Running   2              16h
kube-controller-manager-k8s-02             1/1     Running   0              16h
kube-proxy-d824j                           1/1     Running   0              16h
kube-proxy-k5gw4                           1/1     Running   0              16h
kube-proxy-mxmhp                           1/1     Running   0              16h
kube-proxy-nvpf4                           1/1     Running   0              16h
kube-scheduler-k8s-01                      1/1     Running   1              16h
kube-scheduler-k8s-02                      1/1     Running   0              16h
[root@k8s-01 ~]#

posted @ 2022-09-30 10:24  天宇轩-王  阅读(110)  评论(0编辑  收藏  举报