深入学习Kubernetes(三):kubernets备份和恢复
一、备份恢复原理:
备份:从运行的etcd集群中备份数据到磁盘文件.
恢复:把etcd的备份文件恢复到etcd集群中,然后据此重建整个集群.
二、备份集群数据(手动)
[root@master1 ~]# mkdir -p /backup/k8s //创建备份数据目录 [root@master1 ~]# ETCDCTL_API=3 etcdctl snapshot save /backup/k8s/snapshot.db Snapshot saved at /backup/k8s/snapshot.db //备份etcd数据 [root@master1 ~]# du -sh /backup/k8s/snapshot.db 3.5M /backup/k8s/snapshot.db //查看数据大小 [root@master1 ~]# cp /etc/kubernetes/ssl/ca* /backup/k8s/ [root@master1 ~]# ls /backup/k8s/ ca-config.json ca.csr ca-csr.json ca-key.pem ca.pem snapshot.db //备份CA证书
# cp -r /backup/ backup-back
//多备份一份
-
如果使用kubeasz项目创建的集群,除了备份etcd数据外,还需要备份CA证书文件,以及ansible的hosts文件
三、恢复集群数据
3.1 模拟集群崩溃
//deploy节点执行: [root@master1 ~]# ansible-playbook /etc/ansible/99.clean.yml
- 运行99.clean.yml会把刚才备份的/backup下的文件清除掉,这是这个脚本的BUG,所以一写要多备份一份
3.2 恢复ca证书
[root@master1 bin]# mkdir -p /etc/kubernetes/ssl [root@master1 bin]# cp /backup/k8s/ca* /etc/kubernetes/ssl/
3.3重建集群
[root@master1 bin]# cd /etc/ansible #ansible-playbook 01.prepare.yml #ansible-playbook 02.etcd.yml #ansible-playbook 03.docker.yml #ansible-playbook 04.kube-master.yml #ansible-playbook 05.kube-node.yml //只需要这五步
3.4恢复etcd数据
3.4.1 停止 etcd服务
[root@marster1 ansible]# ansible etcd -m service -a 'name=etcd state=stopped' //停止 etcd服务 [root@marster1 ansible]# ansible etcd -m file -a 'name=/var/lib/etcd/member/ state=absent' //清空文件
3.4.2 把备份文件同传到其他两台etcd节点
[root@marster1 k8s]# scp -r /backup/ 192.168.0.111:/backup ca-config.json 100% 292 141.0KB/s 00:00 ca-csr.json 100% 243 83.4KB/s 00:00 ca-key.pem 100% 1675 692.0KB/s 00:00 ca.csr 100% 997 272.6KB/s 00:00 ca.pem 100% 1346 616.5KB/s 00:00 snapshot.db 100% 3904KB 32.9MB/s 00:00 [root@marster1 k8s]# scp -r /backup/ 192.168.0.113:/backup ca-config.json 100% 292 111.3KB/s 00:00 ca-csr.json 100% 243 137.1KB/s 00:00 ca-key.pem 100% 1675 820.4KB/s 00:00 ca.csr 100% 997 325.4KB/s 00:00 ca.pem 100% 1346 220.9KB/s 00:00 snapshot.db 100% 3904KB 37.2MB/s 00:00
3.4.3 登录三台etcd节点恢复数据
[root@node1 ~]# cd /backup/k8s/ [root@marster1 k8s]# ETCDCTL_API=3 etcdctl snapshot restore snapshot.db --name etcd1 --initial-cluster etcd1=https://192.168.0.110:2380,etcd2=https://192.168.0.111:2380,etcd3=https://192.168.0.113:2380 --initial-cluster-token etcd-cluster-0 --initial-advertise-peer-urls https://192.168.0.110:2380 [root@node1 ~]# cd /backup/k8s/ [root@node1 k8s]# ETCDCTL_API=3 etcdctl snapshot restore snapshot.db --name etcd2 --initial-cluster etcd1=https://192.168.0.110:2380,etcd2=https://192.168.0.111:2380,etcd3=https://192.168.0.113:2380 --initial-cluster-token etcd-cluster-0 --initial-advertise-peer-urls https://192.168.0.111:2380 [root@master2 ~]# cd /backup/k8s/ [root@master2 k8s]# ETCDCTL_API=3 etcdctl snapshot restore snapshot.db --name etcd3 --initial-cluster etcd1=https://192.168.0.110:2380,etcd2=https://192.168.0.111:2380,etcd3=https://192.168.0.113:2380 --initial-cluster-token etcd-cluster-0 --initial-advertise-peer-urls https://192.168.0.113:2380
3.5 执行上面的步骤后,会生成{{ NODE_NAME }}.etcd目录并拷贝/var/lib/etcd/
[root@marster1 k8s]# cp -r etcd1.etcd/member /var/lib/etcd/ [root@marster1 k8s]# systemctl restart etcd [root@node1 k8s]# cp -r etcd2.etcd/member /var/lib/etcd/ [root@node1 k8s]# systemctl restart etcd [root@master2 k8s]# cp -r etcd3.etcd/member /var/lib/etcd/ [root@master2 k8s]# systemctl restart etcd
3.6 查看各种数据
[root@marster1 k8s]# kubectl get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kubernetes ClusterIP 10.68.0.1 <none> 443/TCP 1d mysql ClusterIP 10.68.189.234 <none> 3306/TCP 1d nginx ClusterIP 10.68.235.171 <none> 80/TCP 1d [root@marster1 k8s]# kubectl get svc --all-namespaces NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE default kubernetes ClusterIP 10.68.0.1 <none> 443/TCP 1d default mysql ClusterIP 10.68.189.234 <none> 3306/TCP 1d default nginx ClusterIP 10.68.235.171 <none> 80/TCP 1d kube-system kube-dns ClusterIP 10.68.0.2 <none> 53/UDP,53/TCP 1d kube-system kubernetes-dashboard NodePort 10.68.133.57 <none> 443:39731/TCP 1d kube-system metrics-server ClusterIP 10.68.197.191 <none> 443/TCP 1d [root@marster1 k8s]# kubectl get pod NAME READY STATUS RESTARTS AGE mysql-665554f76b-czs6k 0/1 ImagePullBackOff 31 1d nginx-6f858d4d45-ct2qg 1/1 Running 0 1d
3.7在deploy节点重建网络
[root@marster1 k8s]# ansible-playbook /etc/ansible/tools/change_k8s_network.yml
四、自动备份和恢复
4.1 一键备份
[root@marster1 ansible]# ansible-playbook /etc/ansible/23.backup.yml PLAY [etcd] ************************************************************************************************************* TASK [Gathering Facts] ************************************************************************************************** ok: [192.168.0.110] ok: [192.168.0.111] ok: [192.168.0.113] TASK [cluster-backup : 准备备份目录] ****************************************************************************************** ok: [192.168.0.113] ok: [192.168.0.111] ok: [192.168.0.110] TASK [cluster-backup : 执行etcd 数据备份] ************************************************************************************* changed: [192.168.0.111] changed: [192.168.0.113] changed: [192.168.0.110] TASK [cluster-backup : 获取etcd 数据备份] ************************************************************************************* changed: [192.168.0.110] PLAY [deploy] *********************************************************************************************************** TASK [Creating backup dirs] ********************************************************************************************* changed: [192.168.0.110] => (item=/etc/ansible/roles/cluster-backup/files/ca) changed: [192.168.0.110] => (item=/etc/ansible/roles/cluster-backup/files/hosts) changed: [192.168.0.110] => (item=/etc/ansible/roles/cluster-backup/files/snapshot) TASK [Backing up CA sth] ************************************************************************************************ changed: [192.168.0.110] => (item=ca.pem) changed: [192.168.0.110] => (item=ca-key.pem) changed: [192.168.0.110] => (item=ca.csr) changed: [192.168.0.110] => (item=ca-csr.json) changed: [192.168.0.110] => (item=ca-config.json) TASK [Backing up ansible hosts-1] *************************************************************************************** changed: [192.168.0.110] TASK [Backing up ansible hosts-2] *************************************************************************************** changed: [192.168.0.110] TASK [Backing up etcd snapshot-1] *************************************************************************************** changed: [192.168.0.110] TASK [Backing up etcd snapshot-2] *************************************************************************************** changed: [192.168.0.110] PLAY RECAP ************************************************************************************************************** 192.168.0.110 : ok=10 changed=8 unreachable=0 failed=0 192.168.0.111 : ok=3 changed=1 unreachable=0 failed=0 192.168.0.113 : ok=3 changed=1 unreachable=0 failed=0
4.2查看备份数据
[root@marster1 ansible]# tree /etc/ansible/roles/cluster-backup/files/ /etc/ansible/roles/cluster-backup/files/ ├── ca //集群CA相关备份 │ ├── ca-config.json │ ├── ca.csr │ ├── ca-csr.json │ ├── ca-key.pem │ └── ca.pem ├── hosts //ansible hosts备份 │ ├── hosts │ └── hosts-201901161120 // 最近备份 ├── readme.md └── snapshot //etcd数据备份 ├── snapshot-201901161120.db └── snapshot.db //最近备份 3 directories, 10 files
- 检查/etc/ansible/roles/cluster-backup/files目录下是否有文件
4.3 模拟故障:
#ansible-playbook /etc/ansible/99.clean.yml
4.4 指定要恢复的etcd快照备份
修改文件: /etc/ansible/roles/cluster-restore/defaults/main.yml //指定要恢复的etcd快照备份,如果不修改就是最新的一次
4.5 恢复操作:
#ansible-playbook /etc/ansible/24.restore.yml #ansible-playbook /etc/ansible/tools/change_k8s_network.yml