etcd的备份和恢复-基于快照

概述

   etcd是一款开源、高度一致的分布式键值(key-value)存储系统。

   etcd使用Go语言编写的。

   etcd机器之间的通信通过Raft共识算法处理。

   etcd是CoreOS公司于2013年6月发起的开源项目,于2018年12月正式加 入云原生计算基金会(CNCF-Cloud Native Computing Foundation),并由CNCF支持。

 

       官网:https://etcd.io

       github地址:https://github.com/etcd-io/etcd

       官方硬件推荐:https://etcd.io/docs/v3.5/op-guide/hardware

       官方文档:https://etcd.io/docs/v3.5/op-guide/maintenance

 

ETCD具有如下属性:

       完全复制:集群中的每个节点都可以使用完整的存档

       高可用性:Etcd可用于避免硬件的单点故障或网络问题

       一致性:每次读取都会返回跨多主机的最新写入

       简单:包括一个定义良好、面向用户的API(GRPC)

       安全:实现了带有可选的客户端证书身份验证的自动化TLS

       快速:每秒10000次写入的基准速度

       可靠:使用Raft算法实现了存储的合理分布式Etcd的工作原理

 

ETCD 特点

  • 读写性能强:按照官网给出的Benchmark,在2CPU、8G内存、SSD磁盘这样的配置下,单节点的写性能可以达到16K QPS, 而先写后读也能达到12K QPS。这个性能还是相当可观的。
  • 简单:安装配置简单,而且提供了 HTTP API 进行交互,使用也很简单。
  • 支持HTTPS:更安全可靠。
  • 跨平台支持:支持不同系统。
  • 二进制文件小和社区活跃。


ETCD的数据备份与恢复

手动恢复etcd集群

 

  查看k8s pod

  

创建备份目录
root@etcd1:~# mkdir /backup

创建数据备份
root@etcd1:~# etcdctl snapshot save /backup/snapshot.db
{"level":"info","ts":"2022-07-28T13:51:18.588+0800","caller":"snapshot/v3_snapshot.go:65","msg":"created temporary db file","path":"/backup/snapshot.db.part"}
{"level":"info","ts":"2022-07-28T13:51:18.591+0800","logger":"client","caller":"v3/maintenance.go:211","msg":"opened snapshot stream; downloading"}
{"level":"info","ts":"2022-07-28T13:51:18.592+0800","caller":"snapshot/v3_snapshot.go:73","msg":"fetching snapshot","endpoint":"127.0.0.1:2379"}
{"level":"info","ts":"2022-07-28T13:51:18.606+0800","logger":"client","caller":"v3/maintenance.go:219","msg":"completed snapshot read; closing"}
{"level":"info","ts":"2022-07-28T13:51:18.609+0800","caller":"snapshot/v3_snapshot.go:88","msg":"fetched snapshot","endpoint":"127.0.0.1:2379","size":"1.4 MB","took":"now"}
{"level":"info","ts":"2022-07-28T13:51:18.610+0800","caller":"snapshot/v3_snapshot.go:97","msg":"saved","path":"/backup/snapshot.db"}
Snapshot saved at /backup/snapshot.db

 

  

  删除deployment模拟故障

root@deploy:/etc/kubeasz# kubectl delete deployments.apps nginx

 

  节点1恢复

root@etcd1:~# systemctl stop etcd
root@etcd1:~# rm -fr /var/lib/etcd/
root@etcd1:~# etcdctl snapshot restore /backup/snapshot.db \
--name etcd-192.168.100.8 \
--initial-cluster=etcd-192.168.100.10=https://192.168.100.10:2380,etcd-192.168.100.8=https://192.168.100.8:2380,etcd-192.168.100.9=https://192.168.100.9:2380 \
--initial-cluster-token=etcd-cluster-0 \
--initial-advertise-peer-urls=https://192.168.100.8:2380 \
--data-dir=/var/lib/etcd

  节点2恢复

root@etcd2:~# systemctl stop etcd
root@etcd2:~# rm -fr /var/lib/etcd/
root@etcd2:~# etcdctl snapshot restore /root/snapshot.db \
--name etcd-192.168.100.9 \
--initial-cluster=etcd-192.168.100.10=https://192.168.100.10:2380,etcd-192.168.100.8=https://192.168.100.8:2380,etcd-192.168.100.9=https://192.168.100.9:2380 \
--initial-cluster-token=etcd-cluster-0 \
--initial-advertise-peer-urls=https://192.168.100.9:2380 \
--data-dir=/var/lib/etcd

  

  节点3恢复

root@etcd3:~# systemctl stop etcd
root@etcd3:~# rm -fr /var/lib/etcd
root@etcd3:~# etcdctl snapshot restore /root/snapshot.db \
--name etcd-192.168.100.10 \
--initial-cluster=etcd-192.168.100.10=https://192.168.100.10:2380,etcd-192.168.100.8=https://192.168.100.8:2380,etcd-192.168.100.9=https://192.168.100.9:2380 \
--initial-cluster-token=etcd-cluster-0 \
--initial-advertise-peer-urls=https://192.168.100.10:2380 \
--data-dir=/var/lib/etcd

 

  集群节点同时启动etcd

systemctl start etcd

 

  查看etcd集群状态

root@etcd1:~# export NODE_IPS="192.168.100.8 192.168.100.9 192.168.100.10"
root@etcd1:~# for n in ${NODE_IPS};do etcdctl --endpoints=https://${n}:2379 --cacert=/etc/kubernetes/ssl/ca.pem --cert=/etc/kubernetes/ssl/etcd.pem --key=/etc/kubernetes/ssl/etcd-key.pem --write-out=table endpoint status;done

 

  验证k8s pod是否恢复

 

 

  kubeasz 集群备份与恢复

  查看ezctl帮助信息

  查看k8s pod

  1、备份数据

 

  使用ezctl备份etcd数据

root@deploy:/etc/kubeasz# ./ezctl backup k8s-cluster1

 

 

 

       etcd数据备份到集群目录下的backup文件目录

  

 

  etcd数据恢复机制:以当前时间戳命名最新的备份,并拷贝一份命名为snapshot.db作为最新恢复使用。ansible剧本具体实现

  

       ansible备份剧本:

  ansible恢复剧本:

       /etc/kubeasz/roles/cluster-restore/tasks/main.yaml

  2、恢复数据

       删除deployment模拟故障

root@deploy:/etc/kubeasz# kubectl delete deployments.apps nginx

 

  使用ezctl脚本自动化恢复etcd数据。恢复的数据是使用snapshot.db最新的备份进行恢复

root@deploy:/etc/kubeasz# ./ezctl restore k8s-cluster1

  也可以根据历史备份的数据进行恢复

       将过去的时间备份复制重新命名为snapshot.db作为脚本进行备份

  

root@deploy:/etc/kubeasz/clusters/k8s-cluster1/backup# cp snapshot_202207281742.db snapshot.db

root@deploy:/etc/kubeasz# ./ezctl restore k8s-cluster1

 

posted @ 2022-07-29 09:03  PunchLinux  阅读(1605)  评论(0编辑  收藏  举报