kubernetes-集群备份和恢复
1 [root@master ~]# ETCDCTL_API=3 etcdctl snapshot save /backup/k8s1/snapshot.db 2 Snapshot saved at /backup/k8s1/snapshot.db 3 [root@master ~]# du -h /backup/k8s1/snapshot.db 4 1.6M /backup/k8s1/snapshot.db
3:拷贝kubernetes目录下ssl文件
1 [root@master ~]# cp /etc/kubernetes/ssl/* /backup/k8s1/ 2 [root@master ~]# ll /backup/k8s1/ 3 总用量 1628 4 -rw-r--r--. 1 root root 1675 12月 10 21:21 admin-key.pem 5 -rw-r--r--. 1 root root 1391 12月 10 21:21 admin.pem 6 -rw-r--r--. 1 root root 997 12月 10 21:21 aggregator-proxy.csr 7 -rw-r--r--. 1 root root 219 12月 10 21:21 aggregator-proxy-csr.json 8 -rw-------. 1 root root 1675 12月 10 21:21 aggregator-proxy-key.pem 9 -rw-r--r--. 1 root root 1383 12月 10 21:21 aggregator-proxy.pem 10 -rw-r--r--. 1 root root 294 12月 10 21:21 ca-config.json 11 -rw-r--r--. 1 root root 1675 12月 10 21:21 ca-key.pem 12 -rw-r--r--. 1 root root 1350 12月 10 21:21 ca.pem 13 -rw-r--r--. 1 root root 1082 12月 10 21:21 kubelet.csr 14 -rw-r--r--. 1 root root 283 12月 10 21:21 kubelet-csr.json 15 -rw-------. 1 root root 1675 12月 10 21:21 kubelet-key.pem 16 -rw-r--r--. 1 root root 1452 12月 10 21:21 kubelet.pem 17 -rw-r--r--. 1 root root 1273 12月 10 21:21 kubernetes.csr 18 -rw-r--r--. 1 root root 488 12月 10 21:21 kubernetes-csr.json 19 -rw-------. 1 root root 1679 12月 10 21:21 kubernetes-key.pem 20 -rw-r--r--. 1 root root 1639 12月 10 21:21 kubernetes.pem 21 -rw-r--r--. 1 root root 1593376 12月 10 21:32 snapshot.db
4:模拟集群崩溃,执行clean.yml清除操作
1 [root@master ~]# cd /etc/ansible/
2 [root@master ansible]# ansible-playbook 99.clean.yml
1 [root@master ansible]# ansible-playbook 01.prepare.yml 2 [root@master ansible]# ansible-playbook 02.etcd.yml 3 [root@master ansible]# ansible-playbook 03.docker.yml 4 [root@master ansible]# ansible-playbook 04.kube-master.yml 5 [root@master ansible]# ansible-playbook 05.kube-node.yml
3:暂停etcd服务
1 [root@master ansible]# ansible etcd -m service -a 'name=etcd state=stopped'
4:清空数据1 [root@master ansible]# ansible etcd -m file -a 'name=/var/lib/etcd/member/ state=absent' 2 [DEPRECATION WARNING]: The TRANSFORM_INVALID_GROUP_CHARS settings is set to allow bad characters in group names by default, this will change, but still be user 3 configurable on deprecation. This feature will be removed in version 2.10. Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg. 4 [WARNING]: Invalid characters were found in group names but not replaced, use -vvvv to see details 5 6 192.168.1.203 | CHANGED => { 7 "ansible_facts": { 8 "discovered_interpreter_python": "/usr/bin/python" 9 }, 10 "changed": true, 11 "path": "/var/lib/etcd/member/", 12 "state": "absent" 13 } 14 192.168.1.202 | CHANGED => { 15 "ansible_facts": { 16 "discovered_interpreter_python": "/usr/bin/python" 17 }, 18 "changed": true, 19 "path": "/var/lib/etcd/member/", 20 "state": "absent" 21 } 22 192.168.1.200 | CHANGED => { 23 "ansible_facts": { 24 "discovered_interpreter_python": "/usr/bin/python" 25 }, 26 "changed": true, 27 "path": "/var/lib/etcd/member/", 28 "state": "absent" 29 }
4:将备份的etcd数据文件同步到每个etcd节点上
1 [root@master ansible]# for i in 202 203; do rsync -av /backup/k8s1 192.168.1.$i:/backup/; done 2 sending incremental file list 3 created directory /backup 4 k8s1/ 5 k8s1/admin-key.pem 6 k8s1/admin.pem 7 k8s1/aggregator-proxy-csr.json 8 k8s1/aggregator-proxy-key.pem 9 k8s1/aggregator-proxy.csr 10 k8s1/aggregator-proxy.pem 11 k8s1/ca-config.json 12 k8s1/ca-key.pem 13 k8s1/ca.pem 14 k8s1/kubelet-csr.json 15 k8s1/kubelet-key.pem 16 k8s1/kubelet.csr 17 k8s1/kubelet.pem 18 k8s1/kubernetes-csr.json 19 k8s1/kubernetes-key.pem 20 k8s1/kubernetes.csr 21 k8s1/kubernetes.pem 22 k8s1/snapshot.db 23 24 sent 1,615,207 bytes received 392 bytes 646,239.60 bytes/sec 25 total size is 1,613,606 speedup is 1.00 26 sending incremental file list 27 created directory /backup 28 k8s1/ 29 k8s1/admin-key.pem 30 k8s1/admin.pem 31 k8s1/aggregator-proxy-csr.json 32 k8s1/aggregator-proxy-key.pem 33 k8s1/aggregator-proxy.csr 34 k8s1/aggregator-proxy.pem 35 k8s1/ca-config.json 36 k8s1/ca-key.pem 37 k8s1/ca.pem 38 k8s1/kubelet-csr.json 39 k8s1/kubelet-key.pem 40 k8s1/kubelet.csr 41 k8s1/kubelet.pem 42 k8s1/kubernetes-csr.json 43 k8s1/kubernetes-key.pem 44 k8s1/kubernetes.csr 45 k8s1/kubernetes.pem 46 k8s1/snapshot.db 47 48 sent 1,615,207 bytes received 392 bytes 1,077,066.00 bytes/sec 49 total size is 1,613,606 speedup is 1.00
5:在每个etcd节点执行下面数据恢复操作,然后重启etcd
##说明:在/etc/systemd/system/etcd.service找到--inital-cluster etcd1=https://xxxx:2380,etcd2=https://xxxx:2380,etcd3=https://xxxx:2380替换恢复命令中的--initial-cluster{ }变量,--name=【当前etcd-node-name】,最后还需要填写当前节点的IP:2380
①【deploy操作】
1 [root@master ansible]# cd /backup/k8s1/ 2 [root@master k8s1]# ETCDCTL_API=3 etcdctl snapshot restore snapshot.db --name etcd1 --initial-cluster etcd1=https://192.168.1.200:2380,etcd2=https://192.168.1.202:2380,etcd3=https://192.168.1.203:2380 --initial-cluster-token etcd-cluster-0 --initial-advertise-peer-urls https://192.168.1.200:2380 3 2019-12-10 22:26:50.037127 I | mvcc: restore compact to 46505 4 2019-12-10 22:26:50.052409 I | etcdserver/membership: added member 12229714d8728d0e [https://192.168.1.200:2380] to cluster b8ef796b710cde7d 5 2019-12-10 22:26:50.052451 I | etcdserver/membership: added member 552fb05951af50c9 [https://192.168.1.203:2380] to cluster b8ef796b710cde7d 6 2019-12-10 22:26:50.052474 I | etcdserver/membership: added member 8b4f4a6559bf7c2c [https://192.168.1.202:2380] to cluster b8ef796b710cde7d
执行上面步骤后,会在当前节点目录下,生成一个【node-name】.etcd目录文件
1 [root@master k8s1]# tree etcd1.etcd/ 2 etcd1.etcd/ 3 └── member 4 ├── snap 5 │ ├── 0000000000000001-0000000000000003.snap 6 │ └── db 7 └── wal 8 └── 0000000000000000-0000000000000000.wal 9 [root@master k8s1]# cp -r etcd1.etcd/member /var/lib/etcd/ 10 [root@master k8s1]# systemctl restart etcd
②【etcd2节点操作】
1 [root@node1 ~]# cd /backup/k8s1/ 2 [root@node1 k8s1]# ETCDCTL_API=3 etcdctl snapshot restore snapshot.db --name etcd2 --initial-cluster etcd1=https://192.168.1.200:2380,etcd2=https://192.168.1.202:2380,etcd3=https://192.168.1.203:2380 --initial-cluster-token etcd-cluster-0 --initial-advertise-peer-urls https://192.168.1.202:2380 3 2019-12-10 22:28:35.175032 I | mvcc: restore compact to 46505 4 2019-12-10 22:28:35.232386 I | etcdserver/membership: added member 12229714d8728d0e [https://192.168.1.200:2380] to cluster b8ef796b710cde7d 5 2019-12-10 22:28:35.232507 I | etcdserver/membership: added member 552fb05951af50c9 [https://192.168.1.203:2380] to cluster b8ef796b710cde7d 6 2019-12-10 22:28:35.232541 I | etcdserver/membership: added member 8b4f4a6559bf7c2c [https://192.168.1.202:2380] to cluster b8ef796b710cde7d 7 [root@node1 k8s1]# tree etcd2.etcd/ 8 etcd2.etcd/ 9 └── member 10 ├── snap 11 │ ├── 0000000000000001-0000000000000003.snap 12 │ └── db 13 └── wal 14 └── 0000000000000000-0000000000000000.wal 15 [root@node1 k8s1]# cp -r etcd1.etcd/member /var/lib/etcd/ 16 [root@node1 k8s1]# systemctl restart etcd
③【etcd3节点操作】
1 [root@node2 ~]# cd /backup/k8s1/ 2 [root@node2 k8s1]# ETCDCTL_API=3 etcdctl snapshot restore snapshot.db --name etcd3 --initial-cluster etcd1=https://192.168.1.200:2380,etcd2=https://192.168.1.202:2380,etcd3=https://192.168.1.203:2380 --initial-cluster-token etcd-cluster-0 --initial-advertise-peer-urls https://192.168.1.203:2380 3 2019-12-10 22:28:55.943364 I | mvcc: restore compact to 46505 4 2019-12-10 22:28:55.988674 I | etcdserver/membership: added member 12229714d8728d0e [https://192.168.1.200:2380] to cluster b8ef796b710cde7d 5 2019-12-10 22:28:55.988726 I | etcdserver/membership: added member 552fb05951af50c9 [https://192.168.1.203:2380] to cluster b8ef796b710cde7d 6 2019-12-10 22:28:55.988754 I | etcdserver/membership: added member 8b4f4a6559bf7c2c [https://192.168.1.202:2380] to cluster b8ef796b710cde7d 7 [root@node2 k8s1]# tree etcd3.etcd/ 8 etcd3.etcd/ 9 └── member 10 ├── snap 11 │ ├── 0000000000000001-0000000000000003.snap 12 │ └── db 13 └── wal 14 └── 0000000000000000-0000000000000000.wa 15 [root@node2 k8s1]# cp -r etcd1.etcd/member /var/lib/etcd/ 16 [root@node2 k8s1]# systemctl restart etcd
6:在deploy节点上操作重建网络
1 [root@master ansible]# cd /etc/ansible/
2 [root@master ansible]# ansible-playbook tools/change_k8s_network.yml
7:查看pod、svc恢复是否成功
1 [root@master ansible]# kubectl get svc 2 NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE 3 kubernetes ClusterIP 10.68.0.1 <none> 443/TCP 5d5h 4 nginx ClusterIP 10.68.241.175 <none> 80/TCP 5d4h 5 tomcat ClusterIP 10.68.235.35 <none> 8080/TCP 76m
1 [root@master ansible]# kubectl get pods 2 NAME READY STATUS RESTARTS AGE 3 nginx-7c45b84548-4998z 1/1 Running 0 5d4h 4 tomcat-8fc9f5995-9kl5b 1/1 Running 0 77m
三、自动备份、自动恢复
1:一键备份
1 [root@master ansible]# ansible-playbook /etc/ansible/23.backup.yml
2:模拟故障
1 [root@master ansible]# ansible-playbook /etc/ansible/99.clean.yml
修改文件/etc/ansible/roles/cluster-restore/defaults/main.yml,指定要恢复的etcd快照备份,如果不修改就是最新的一次
3:执行自动恢复操作
1 [root@master ansible]# ansible-playbook /etc/ansible/24.restore.yml
2 [root@master ansible]# ansible-playbook /etc/ansible/tools/change_k8s_network.yml