Velero 集群资源备份还原方案
摘要
目前XXX云平台基于中间件的数据备份主要基于常见的中间件,mysql ,redis ,zk , es 进行的数据层面备份还原设计,主备集群的实现也是基于 etcd 进行的同步方案的设计实现;
数据层备份: 不破坏和不干预组件运行的状态和快照信息,做数据层的备份还原,依赖定制化的脚本实现,没有时间和空间的破坏干预,可以保持应用组件运行的连续性;
etcd 备份: 直接备份 Etcd 是将集群的全部资源备份起来,可以对 Kubernetes 集群进行整体的备份和同步;
Velero 备份: 除了对 Kubernetes 集群进行整体备份外,还可以通过对 Type、Namespace、Label , PV 等对象进行分类备份或者恢复。
关键词: velero 备份; 还原; 迁移; 数据层; 集群;
Velero 简介
Velero 是一个云原生的灾难恢复和迁移工具,采用 Go 语言编写,可以安全的备份、恢复和迁移Kubernetes集群资源和持久卷。
Velero 是西班牙语,意思是帆船,非常符合 Kubernetes 社区的命名风格。
Velero目前包含以下特性:
支持Kubernetes集群数据备份和恢复
支持复制当前Kubernetes集群的资源到其它Kubernetes集群
支持复制生产环境到开发以及测试环境
Velero组件一共分两部分,分别是服务端和客户端。
服务端:运行在Kubernetes集群中
客户端:运行在本地的velero命令行工具,需要在机器上已配置好kubectl及集群kubeconfig
velero使用场景
灾备场景:提供备份恢复k8s集群的能力
迁移场景:提供拷贝集群资源到其他集群的能力(复制同步开发,测试,生产环境的集群配置,简化环境配置)
Minio 信息
Velero 安装
声明存储对接密钥(credentials-velero):
[default]
aws_access_key_id = admin123
aws_secret_access_key = admin123
二进制安装脚本部署:
velero install \
--provider aws \
--bucket velero \
--secret-file credentials-velero \
--use-volume-snapshots=true \
--plugins velero/velero-plugin-for-aws:v1.0.0 \
--backup-location-config region=minio,s3ForcePathStyle="true",s3Url=http://192.168.139.139:31082
备注: provider 只有 aws gcp azure ,bucket 备份位置,之前minio页面只有velero这一个,所以使用它做备份位置
备份还原
a. 备份命名空间
# 查看待备份命名空间下资源
kubectl get po -n tigera-operator
# 命名空间备份
velero backup create tigera-operator-backup --include-namespaces tigera-operator --wait #执行备份
velero backup describe tigera-operator-backup
备份完毕后,登录minio页面可以看到数据已经在了;
b. 备份验证恢复
# 移除命名空间
kubectl delete ns tigera-operator-backup
# 命名空间恢复
velero restore create --from-backup tigera-operator-backup
c. 定时备份
# 每日1点进行备份
velero create schedule <SCHEDULE NAME> --schedule="0 1 * * *"
# 每日1点进行备份,备份保留72小时
velero create schedule <SCHEDULE NAME> --schedule="0 1 * * *" --ttl 72h
# 每5小时进行一次备份
velero create schedule <SCHEDULE NAME> --schedule="@every 5h"
# 每日对 指定 namespace 进行一次备份 (如panshi-qtc-dev)
velero create schedule <SCHEDULE NAME> --schedule="@every 24h" --include-namespaces panshi-qtc-dev
d. 资源查看
velero get backup #备份查看
velero get schedule #查看定时备份
velero get restore #查看已有的恢复
velero get plugins #查看插件
e. 资源恢复
velero restore create --from-backup all-ns-backup #恢复集群所有备份,(对已经存在的服务不会覆盖)
velero restore create --from-backup all-ns-backup --include-namespaces default,nginx-example #仅恢复 default nginx-example namespace
# Velero可以将资源还原到与其备份来源不同的命名空间中。为此,请使用--namespace-mappings标志
velero restore create RESTORE_NAME --from-backup BACKUP_NAME --namespace-mappings old-ns-1:new-ns-1,old-ns-2:new-ns-2
# 例如下面将test-velero 命名空间资源恢复到test-velero-1下面
velero restore create restore-for-test --from-backup everyday-1-20210203131802 --namespace-mappings test-velero:test-velero-1
存储备份
可以先创建一个configmap自定义一些内容,可有可无的,不想自定义就不创建,
参考:https://velero.io/docs/v1.5/restic/#troubleshooting
安装时velero需加上--use-restic参数表示使用restic备份pv数据
velero install \
--provider aws \
--bucket velero \
--secret-file examples/minio/credentials-velero \
--use-volume-snapshots=true \
--plugins velero/velero-plugin-for-aws:v1.0.0 \
--use-restic \
--snapshot-location-config region=minio \ --backup-location-config region=minio,s3ForcePathStyle="true",s3Url=http://192.178.7.5:32027
a. 存储应用备份
# 备份带pv的pod
velero backup create pvc-backup --snapshot-volumes --include-namespaces test-velero
# 恢复带pv的pod
velero restore create --from-backup pvc-backup --restore-volumes
备份pv数据需要云厂商支持,参考:
https://blog.csdn.net/easylife206/article/details/102927512
https://blog.51cto.com/kaliarch/2531077?source=drh
restic相关知识:https://github.com/restic/restic
# 使用 Restic 给带有 PVC 的 Pod 进行备份,必须先给 Pod 加上注解,格式:
kubectl -n YOUR_POD_NAMESPACE annotate pod/YOUR_POD_NAME backup.velero.io/backup-volumes=YOUR_VOLUME_NAME_1,YOUR_VOLUME_NAME_2,...
YOUR_VOLUME_NAME_1指的是pod spec.template.spec.volumes.name的值
比如volumes是www,则 annotate 就要是backup-volumes=www
# 添加标签注解
kubectl -n test-velero annotate pod nfs-pvc-7d75fbbcdf-dn7xw backup.velero.io/backup-volumes=www
# 查看结果
kubectl get po -n test-velero nfs-pvc-7d75fbbcdf-dn7xw -o jsonpath='{.metadata.annotations}'
# 备份
velero backup create pvc-backup-2 --snapshot-volumes --include-namespaces test-velero
# 恢复
velero restore create --from-backup pvc-backup-2 --restore-volumes
备份原理:https://velero.io/docs/v1.5/restic/#troubleshooting
代码解读
安装流程梳理:
velero install -h
[root@machine-demo velero-v1.9.2-linux-amd64]# velero install -h
Install Velero onto a Kubernetes cluster using the supplied provider information, such as
the provider's name, a bucket name, and a file containing the credentials to access that bucket.
A prefix within the bucket and configuration for the backup store location may also be supplied.
Additionally, volume snapshot information for the same provider may be supplied.
All required CustomResourceDefinitions will be installed to the server, as well as the
Velero Deployment and associated Restic DaemonSet.
The provided secret data will be created in a Secret named 'cloud-credentials'.
All namespaced resources will be placed in the 'velero' namespace by default.
The '--namespace' flag can be used to specify a different namespace to install into.
Use '--wait' to wait for the Velero Deployment to be ready before proceeding.
Use '-o yaml' or '-o json' with '--dry-run' to output all generated resources as text instead of sending the resources to the server.
This is useful as a starting point for more customized installations.
Usage:
velero install [flags]
Examples:
# velero install --provider gcp --plugins velero/velero-plugin-for-gcp:v1.0.0 --bucket mybucket --secret-file ./gcp-service-account.json
# velero install --provider aws --plugins velero/velero-plugin-for-aws:v1.0.0 --bucket backups --secret-file ./aws-iam-creds --backup-location-config region=us-east-2 --snapshot-location-config region=us-east-2
# velero install --provider aws --plugins velero/velero-plugin-for-aws:v1.0.0 --bucket backups --secret-file ./aws-iam-creds --backup-location-config region=us-east-2 --snapshot-location-config region=us-east-2 --use-restic
# velero install --provider gcp --plugins velero/velero-plugin-for-gcp:v1.0.0 --bucket gcp-backups --secret-file ./gcp-creds.json --wait
# velero install --provider aws --plugins velero/velero-plugin-for-aws:v1.0.0 --bucket backups --backup-location-config region=us-west-2 --snapshot-location-config region=us-west-2 --no-secret --pod-annotations iam.amazonaws.com/role=arn:aws:iam::<AWS_ACCOUNT_ID>:role/<VELERO_ROLE_NAME>
# velero install --provider gcp --plugins velero/velero-plugin-for-gcp:v1.0.0 --bucket gcp-backups --secret-file ./gcp-creds.json --velero-pod-cpu-request=1000m --velero-pod-cpu-limit=5000m --velero-pod-mem-request=512Mi --velero-pod-mem-limit=1024Mi
# velero install --provider gcp --plugins velero/velero-plugin-for-gcp:v1.0.0 --bucket gcp-backups --secret-file ./gcp-creds.json --restic-pod-cpu-request=1000m --restic-pod-cpu-limit=5000m --restic-pod-mem-request=512Mi --restic-pod-mem-limit=1024Mi
# velero install --provider azure --plugins velero/velero-plugin-for-microsoft-azure:v1.0.0 --bucket $BLOB_CONTAINER --secret-file ./credentials-velero --backup-location-config resourceGroup=$AZURE_BACKUP_RESOURCE_GROUP,storageAccount=$AZURE_STORAGE_ACCOUNT_ID[,subscriptionId=$AZURE_BACKUP_SUBSCRIPTION_ID] --snapshot-location-config apiTimeout=<YOUR_TIMEOUT>[,resourceGroup=$AZURE_BACKUP_RESOURCE_GROUP,subscriptionId=$AZURE_BACKUP_SUBSCRIPTION_ID]
Flags:
--backup-location-config mapStringString Configuration to use for the backup storage location. Format is key1=value1,key2=value2
--bucket string Name of the object storage bucket where backups should be stored
--cacert string File containing a certificate bundle to use when verifying TLS connections to the object store. Optional.
--crds-only Only generate CustomResourceDefinition resources. Useful for updating CRDs for an existing Velero install.
--default-restic-prune-frequency duration How often 'restic prune' is run for restic repositories by default. Optional.
--default-volumes-to-restic Bool flag to configure Velero server to use restic by default to backup all pod volumes on all backups. Optional.
--dry-run Generate resources, but don't send them to the cluster. Use with -o. Optional.
--garbage-collection-frequency duration How often the garbage collection runs for expired backups.(default 1h)
-h, --help help for install
--image string Image to use for the Velero and restic server pods. Optional. (default "velero/velero:v1.9.2")
--label-columns stringArray A comma-separated list of labels to be displayed as columns
--no-default-backup-location Flag indicating if a default backup location should be created. Must be used as confirmation if --bucket or --provider are not provided. Optional.
--no-secret Flag indicating if a secret should be created. Must be used as confirmation if --secret-file is not provided. Optional.
-o, --output string Output display format. For create commands, display the object but do not send it to the server. Valid formats are 'table', 'json', and 'yaml'. 'table' is not valid for the install command.
--plugins stringArray Plugin container images to install into the Velero Deployment
--pod-annotations mapStringString Annotations to add to the Velero and restic pods. Optional. Format is key1=value1,key2=value2
--pod-labels mapStringString Labels to add to the Velero and restic pods. Optional. Format is key1=value1,key2=value2
--prefix string Prefix under which all Velero data should be stored within the bucket. Optional.
--provider string Provider name for backup and volume storage
--restic-pod-cpu-limit string CPU limit for restic pod. A value of "0" is treated as unbounded. Optional. (default "1000m")
--restic-pod-cpu-request string CPU request for restic pod. A value of "0" is treated as unbounded. Optional. (default "500m")
--restic-pod-mem-limit string Memory limit for restic pod. A value of "0" is treated as unbounded. Optional. (default "1Gi")
--restic-pod-mem-request string Memory request for restic pod. A value of "0" is treated as unbounded. Optional. (default "512Mi")
--restore-only Run the server in restore-only mode. Optional.
--sa-annotations mapStringString Annotations to add to the Velero ServiceAccount. Add iam.gke.io/gcp-service-account=[GSA_NAME]@[PROJECT_NAME].iam.gserviceaccount.com for workload identity. Optional. Format is key1=value1,key2=value2
--secret-file string File containing credentials for backup and volume provider. If not specified, --no-secret must be used for confirmation. Optional.
--show-labels Show labels in the last column
--snapshot-location-config mapStringString Configuration to use for the volume snapshot location. Format is key1=value1,key2=value2
--use-restic Create restic daemonset. Optional.
--use-volume-snapshots Whether or not to create snapshot location automatically. Set to false if you do not plan to create volume snapshots via a storage provider. (default true)
--velero-pod-cpu-limit string CPU limit for Velero pod. A value of "0" is treated as unbounded. Optional. (default "1000m")
--velero-pod-cpu-request string CPU request for Velero pod. A value of "0" is treated as unbounded. Optional. (default "500m")
--velero-pod-mem-limit string Memory limit for Velero pod. A value of "0" is treated as unbounded. Optional. (default "512Mi")
--velero-pod-mem-request string Memory request for Velero pod. A value of "0" is treated as unbounded. Optional. (default "128Mi")
--wait Wait for Velero deployment to be ready. Optional.
Global Flags:
--add_dir_header If true, adds the file directory to the header
--alsologtostderr log to standard error as well as files
--colorized optionalBool Show colored output in TTY. Overrides 'colorized' value from $HOME/.config/velero/config.json if present. Enabled by default
--features stringArray Comma-separated list of features to enable for this Velero process. Combines with values from $HOME/.config/velero/config.json if present
--kubeconfig string Path to the kubeconfig file to use to talk to the Kubernetes apiserver. If unset, try the environment variable KUBECONFIG, as well as in-cluster configuration
--kubecontext string The context to use to talk to the Kubernetes apiserver. If unset defaults to whatever your current-context is (kubectl config current-context)
--log_backtrace_at traceLocation when logging hits line file:N, emit a stack trace (default :0)
--log_dir string If non-empty, write log files in this directory
--log_file string If non-empty, use this log file
--log_file_max_size uint Defines the maximum size a log file can grow to. Unit is megabytes. If the value is 0, the maximum file size is unlimited. (default 1800)
--logtostderr log to standard error instead of files (default true)
-n, --namespace string The namespace in which Velero should operate (default "velero")
--skip_headers If true, avoid header prefixes in the log messages
--skip_log_headers If true, avoid headers when opening log files
--stderrthreshold severity logs at or above this threshold go to stderr (default 2)
-v, --v Level number for the log level verbosity
--vmodule moduleSpec comma-separated list of pattern=N settings for file-filtered logging
代码位置(velero/pkg/cmd/cli/install/install.go):
velero 运行机制:
a. 服务端
备份还原相关的资源均是通过命令行的方式创建的 CR 进行调谐出来的资源,状态信息也是根据 CR 状态信息获取的;
调用流程: velero (cmd) -> velero (operator)-> 结果
容器化运行 velero:
运行流程: 通过编译生成 velero 的二进制,velero 运行将会启动 operator 的监听,因此容器中启动 ./velero 程序就是开启了监听调谐逻辑;
kubectl get deploy -n velero velero -oyaml
b. 客户端
velero install xxx
velero create xxx
velero debug xxx
velero xxx xxx
-----------------------------------------------------------
Usage:
velero [command]
Available Commands:
backup Work with backups
backup-location Work with backup storage locations
bug Report a Velero bug
client Velero client related commands
completion Generate completion script
create Create velero resources
debug Generate debug bundle
delete Delete velero resources
describe Describe velero resources
get Get velero resources
help Help about any command
install Install Velero
plugin Work with plugins
restic Work with restic
restore Work with restores
schedule Work with schedules
snapshot-location Work with snapshot locations
uninstall Uninstall Velero
version Print the velero version and associated image
备注:客户端的操作实际上就是创建和查询相关 CR 资源,所有的调谐逻辑由 Controller 完成相关调谐逻辑,基于反馈机制调谐达到预期目标;
存储配置机制:
服务端代码入口: pkg/cmd/velero/velero.go
服务端启动: pkg/cmd/server/server.go
package credentials
import (
"github.com/pkg/errors"
corev1api "k8s.io/api/core/v1"
kbclient "sigs.k8s.io/controller-runtime/pkg/client"
"github.com/vmware-tanzu/velero/pkg/util/kube"
)
// SecretStore defines operations for interacting with credentials
// that are stored in Secret.
type SecretStore interface {
// Get returns the secret key defined by the given selector
Get(selector *corev1api.SecretKeySelector) (string, error)
}
type namespacedSecretStore struct {
client kbclient.Client
namespace string
}
// NewNamespacedSecretStore returns a SecretStore which can interact with credentials
// for the given namespace.
func NewNamespacedSecretStore(client kbclient.Client, namespace string) (SecretStore, error) {
return &namespacedSecretStore{
client: client,
namespace: namespace,
}, nil
}
// Buffer returns the secret key defined by the given selector.
func (n *namespacedSecretStore) Get(selector *corev1api.SecretKeySelector) (string, error) {
creds, err := kube.GetSecretKey(n.client, n.namespace, selector)
if err != nil {
return "", errors.Wrap(err, "unable to get key for secret")
}
return string(creds), nil
}
命令行机制:
# 命令过程解析
velero backup create tigera-operator-backup --include-namespaces tigera-operator --wait #执行备份
备份代码入口(pkg/cmd/cli/backup/backup.go):
备注:本文解读 velero 代码结构,主要从几个关键机制进行解读,并不会根据每一个步骤进行展开,上述机制包含了 velero 运行的所有层面,可以理解并针对 velero 的一些功能进行定制化开发的需要;
优化思考
velero 现在存在的问题点(可以优化的内容):
优化点1:目前 velero 支持多种存储类型,但是需要 velero 安装时指定配置账户密码,后续更改和指定并不灵活,临时性备份还原可以保留当前模式,但是也造成了粘合度不高,使用的复杂度相对较高;
优化点2: 当下的 velero 基于集群备份的场景主要是单集群模式,但是跨集群迁移是十分必要的,对于单集群向其他集群迁移时,需要手动配置,人工操作完成;可以结合分布式多集群管理方案,引入集群调度模式,实现跨集群的备份调度和资源迁移能力;
优化点3:...
总结
etcd 级别的集群备份,中间件的数据层备份能力,但是对于应用跨集群的迁移能力目前还不完善的;分布式数据外延能力,跨集群集群资源迁移备份能力,可以通过引入 velero 并基于集群层级进行定制化开发可以完善云原生化扩展能力和数据安全;
附录:
http://t.zoukankan.com/cheyunhua-p-14431842.html (k8s备份工具之velero)
https://www.cnblogs.com/zphqq/p/13155394.html (veleo备份原理)