Velero 集群资源备份还原方案

摘要

目前XXX云平台基于中间件的数据备份主要基于常见的中间件,mysql ,redis ,zk , es 进行的数据层面备份还原设计,主备集群的实现也是基于 etcd 进行的同步方案的设计实现;

数据层备份:  不破坏和不干预组件运行的状态和快照信息,做数据层的备份还原,依赖定制化的脚本实现,没有时间和空间的破坏干预,可以保持应用组件运行的连续性;

etcd 备份: 直接备份 Etcd 是将集群的全部资源备份起来,可以对 Kubernetes 集群进行整体的备份和同步;

Velero 备份: 除了对 Kubernetes 集群进行整体备份外,还可以通过对 Type、Namespace、Label , PV 等对象进行分类备份或者恢复。

关键词: velero 备份; 还原; 迁移; 数据层; 集群;

Velero 简介

Velero 是一个云原生的灾难恢复和迁移工具,采用 Go 语言编写,可以安全的备份、恢复和迁移Kubernetes集群资源和持久卷。
Velero 是西班牙语,意思是帆船,非常符合 Kubernetes 社区的命名风格。

Velero目前包含以下特性:

支持Kubernetes集群数据备份和恢复
支持复制当前Kubernetes集群的资源到其它Kubernetes集群
支持复制生产环境到开发以及测试环境
Velero组件一共分两部分,分别是服务端和客户端。

服务端:运行在Kubernetes集群中
客户端:运行在本地的velero命令行工具,需要在机器上已配置好kubectl及集群kubeconfig
velero使用场景

灾备场景:提供备份恢复k8s集群的能力
迁移场景:提供拷贝集群资源到其他集群的能力(复制同步开发,测试,生产环境的集群配置,简化环境配置)

Minio 信息

Velero 安装

声明存储对接密钥(credentials-velero):

[default]
aws_access_key_id = admin123
aws_secret_access_key = admin123

二进制安装脚本部署:

velero install \
   --provider aws \
   --bucket velero \
   --secret-file credentials-velero \
   --use-volume-snapshots=true \
   --plugins velero/velero-plugin-for-aws:v1.0.0 \
   --backup-location-config region=minio,s3ForcePathStyle="true",s3Url=http://192.168.139.139:31082

备注: provider 只有 aws gcp azure ,bucket 备份位置,之前minio页面只有velero这一个,所以使用它做备份位置

备份还原

a. 备份命名空间

# 查看待备份命名空间下资源
kubectl get po -n tigera-operator
# 命名空间备份
velero backup create tigera-operator-backup --include-namespaces tigera-operator  --wait #执行备份
velero backup describe tigera-operator-backup   

备份完毕后,登录minio页面可以看到数据已经在了;

b. 备份验证恢复

# 移除命名空间
kubectl delete ns tigera-operator-backup
# 命名空间恢复
velero restore create --from-backup tigera-operator-backup

c. 定时备份

# 每日1点进行备份
velero create schedule <SCHEDULE NAME> --schedule="0 1 * * *"
# 每日1点进行备份,备份保留72小时
velero create schedule <SCHEDULE NAME> --schedule="0 1 * * *" --ttl 72h
# 每5小时进行一次备份
velero create schedule <SCHEDULE NAME> --schedule="@every 5h"
# 每日对 指定 namespace 进行一次备份 (如panshi-qtc-dev)
velero create schedule <SCHEDULE NAME> --schedule="@every 24h" --include-namespaces panshi-qtc-dev

d. 资源查看

velero  get  backup   #备份查看
velero  get  schedule #查看定时备份
velero  get  restore  #查看已有的恢复
velero  get  plugins  #查看插件

e. 资源恢复

velero restore create --from-backup all-ns-backup  #恢复集群所有备份,(对已经存在的服务不会覆盖)
velero restore create --from-backup all-ns-backup --include-namespaces default,nginx-example #仅恢复 default nginx-example namespace

# Velero可以将资源还原到与其备份来源不同的命名空间中。为此,请使用--namespace-mappings标志
velero restore create RESTORE_NAME --from-backup BACKUP_NAME --namespace-mappings old-ns-1:new-ns-1,old-ns-2:new-ns-2
# 例如下面将test-velero 命名空间资源恢复到test-velero-1下面
velero restore create restore-for-test --from-backup everyday-1-20210203131802 --namespace-mappings test-velero:test-velero-1

存储备份

可以先创建一个configmap自定义一些内容,可有可无的,不想自定义就不创建,
参考:https://velero.io/docs/v1.5/restic/#troubleshooting
安装时velero需加上--use-restic参数表示使用restic备份pv数据

velero install \
   --provider aws \
   --bucket velero \
   --secret-file examples/minio/credentials-velero \
   --use-volume-snapshots=true \
   --plugins velero/velero-plugin-for-aws:v1.0.0 \
   --use-restic \    
   --snapshot-location-config region=minio \   --backup-location-config region=minio,s3ForcePathStyle="true",s3Url=http://192.178.7.5:32027

a. 存储应用备份

# 备份带pv的pod
velero backup create pvc-backup  --snapshot-volumes --include-namespaces test-velero
# 恢复带pv的pod
velero  restore create --from-backup pvc-backup --restore-volumes

备份pv数据需要云厂商支持,参考:
https://blog.csdn.net/easylife206/article/details/102927512
https://blog.51cto.com/kaliarch/2531077?source=drh

restic相关知识:https://github.com/restic/restic

# 使用 Restic 给带有 PVC 的 Pod 进行备份,必须先给 Pod 加上注解,格式:
kubectl -n YOUR_POD_NAMESPACE annotate pod/YOUR_POD_NAME backup.velero.io/backup-volumes=YOUR_VOLUME_NAME_1,YOUR_VOLUME_NAME_2,...

YOUR_VOLUME_NAME_1指的是pod spec.template.spec.volumes.name的值
比如volumes是www,则 annotate 就要是backup-volumes=www

# 添加标签注解
kubectl -n test-velero annotate  pod nfs-pvc-7d75fbbcdf-dn7xw  backup.velero.io/backup-volumes=www
# 查看结果
kubectl  get po -n test-velero nfs-pvc-7d75fbbcdf-dn7xw -o jsonpath='{.metadata.annotations}'
# 备份
velero backup create pvc-backup-2  --snapshot-volumes --include-namespaces test-velero
# 恢复
velero  restore create --from-backup pvc-backup-2 --restore-volumes

备份原理:https://velero.io/docs/v1.5/restic/#troubleshooting

代码解读

安装流程梳理:

velero install -h

[root@machine-demo velero-v1.9.2-linux-amd64]# velero install -h
Install Velero onto a Kubernetes cluster using the supplied provider information, such as
the provider's name, a bucket name, and a file containing the credentials to access that bucket.
A prefix within the bucket and configuration for the backup store location may also be supplied.
Additionally, volume snapshot information for the same provider may be supplied.

All required CustomResourceDefinitions will be installed to the server, as well as the
Velero Deployment and associated Restic DaemonSet.

The provided secret data will be created in a Secret named 'cloud-credentials'.

All namespaced resources will be placed in the 'velero' namespace by default. 

The '--namespace' flag can be used to specify a different namespace to install into.

Use '--wait' to wait for the Velero Deployment to be ready before proceeding.

Use '-o yaml' or '-o json'  with '--dry-run' to output all generated resources as text instead of sending the resources to the server.
This is useful as a starting point for more customized installations.

Usage:
  velero install [flags]

Examples:
  # velero install --provider gcp --plugins velero/velero-plugin-for-gcp:v1.0.0 --bucket mybucket --secret-file ./gcp-service-account.json

  # velero install --provider aws --plugins velero/velero-plugin-for-aws:v1.0.0 --bucket backups --secret-file ./aws-iam-creds --backup-location-config region=us-east-2 --snapshot-location-config region=us-east-2

  # velero install --provider aws --plugins velero/velero-plugin-for-aws:v1.0.0 --bucket backups --secret-file ./aws-iam-creds --backup-location-config region=us-east-2 --snapshot-location-config region=us-east-2 --use-restic

  # velero install --provider gcp --plugins velero/velero-plugin-for-gcp:v1.0.0 --bucket gcp-backups --secret-file ./gcp-creds.json --wait

  # velero install --provider aws --plugins velero/velero-plugin-for-aws:v1.0.0 --bucket backups --backup-location-config region=us-west-2 --snapshot-location-config region=us-west-2 --no-secret --pod-annotations iam.amazonaws.com/role=arn:aws:iam::<AWS_ACCOUNT_ID>:role/<VELERO_ROLE_NAME>

  # velero install --provider gcp --plugins velero/velero-plugin-for-gcp:v1.0.0 --bucket gcp-backups --secret-file ./gcp-creds.json --velero-pod-cpu-request=1000m --velero-pod-cpu-limit=5000m --velero-pod-mem-request=512Mi --velero-pod-mem-limit=1024Mi

  # velero install --provider gcp --plugins velero/velero-plugin-for-gcp:v1.0.0 --bucket gcp-backups --secret-file ./gcp-creds.json --restic-pod-cpu-request=1000m --restic-pod-cpu-limit=5000m --restic-pod-mem-request=512Mi --restic-pod-mem-limit=1024Mi

  # velero install --provider azure --plugins velero/velero-plugin-for-microsoft-azure:v1.0.0 --bucket $BLOB_CONTAINER --secret-file ./credentials-velero --backup-location-config resourceGroup=$AZURE_BACKUP_RESOURCE_GROUP,storageAccount=$AZURE_STORAGE_ACCOUNT_ID[,subscriptionId=$AZURE_BACKUP_SUBSCRIPTION_ID] --snapshot-location-config apiTimeout=<YOUR_TIMEOUT>[,resourceGroup=$AZURE_BACKUP_RESOURCE_GROUP,subscriptionId=$AZURE_BACKUP_SUBSCRIPTION_ID]

Flags:
      --backup-location-config mapStringString     Configuration to use for the backup storage location. Format is key1=value1,key2=value2
      --bucket string                              Name of the object storage bucket where backups should be stored
      --cacert string                              File containing a certificate bundle to use when verifying TLS connections to the object store. Optional.
      --crds-only                                  Only generate CustomResourceDefinition resources. Useful for updating CRDs for an existing Velero install.
      --default-restic-prune-frequency duration    How often 'restic prune' is run for restic repositories by default. Optional.
      --default-volumes-to-restic                  Bool flag to configure Velero server to use restic by default to backup all pod volumes on all backups. Optional.
      --dry-run                                    Generate resources, but don't send them to the cluster. Use with -o. Optional.
      --garbage-collection-frequency duration      How often the garbage collection runs for expired backups.(default 1h)
  -h, --help                                       help for install
      --image string                               Image to use for the Velero and restic server pods. Optional. (default "velero/velero:v1.9.2")
      --label-columns stringArray                  A comma-separated list of labels to be displayed as columns
      --no-default-backup-location                 Flag indicating if a default backup location should be created. Must be used as confirmation if --bucket or --provider are not provided. Optional.
      --no-secret                                  Flag indicating if a secret should be created. Must be used as confirmation if --secret-file is not provided. Optional.
  -o, --output string                              Output display format. For create commands, display the object but do not send it to the server. Valid formats are 'table', 'json', and 'yaml'. 'table' is not valid for the install command.
      --plugins stringArray                        Plugin container images to install into the Velero Deployment
      --pod-annotations mapStringString            Annotations to add to the Velero and restic pods. Optional. Format is key1=value1,key2=value2
      --pod-labels mapStringString                 Labels to add to the Velero and restic pods. Optional. Format is key1=value1,key2=value2
      --prefix string                              Prefix under which all Velero data should be stored within the bucket. Optional.
      --provider string                            Provider name for backup and volume storage
      --restic-pod-cpu-limit string                CPU limit for restic pod. A value of "0" is treated as unbounded. Optional. (default "1000m")
      --restic-pod-cpu-request string              CPU request for restic pod. A value of "0" is treated as unbounded. Optional. (default "500m")
      --restic-pod-mem-limit string                Memory limit for restic pod. A value of "0" is treated as unbounded. Optional. (default "1Gi")
      --restic-pod-mem-request string              Memory request for restic pod. A value of "0" is treated as unbounded. Optional. (default "512Mi")
      --restore-only                               Run the server in restore-only mode. Optional.
      --sa-annotations mapStringString             Annotations to add to the Velero ServiceAccount. Add iam.gke.io/gcp-service-account=[GSA_NAME]@[PROJECT_NAME].iam.gserviceaccount.com for workload identity. Optional. Format is key1=value1,key2=value2
      --secret-file string                         File containing credentials for backup and volume provider. If not specified, --no-secret must be used for confirmation. Optional.
      --show-labels                                Show labels in the last column
      --snapshot-location-config mapStringString   Configuration to use for the volume snapshot location. Format is key1=value1,key2=value2
      --use-restic                                 Create restic daemonset. Optional.
      --use-volume-snapshots                       Whether or not to create snapshot location automatically. Set to false if you do not plan to create volume snapshots via a storage provider. (default true)
      --velero-pod-cpu-limit string                CPU limit for Velero pod. A value of "0" is treated as unbounded. Optional. (default "1000m")
      --velero-pod-cpu-request string              CPU request for Velero pod. A value of "0" is treated as unbounded. Optional. (default "500m")
      --velero-pod-mem-limit string                Memory limit for Velero pod. A value of "0" is treated as unbounded. Optional. (default "512Mi")
      --velero-pod-mem-request string              Memory request for Velero pod. A value of "0" is treated as unbounded. Optional. (default "128Mi")
      --wait                                       Wait for Velero deployment to be ready. Optional.

Global Flags:
      --add_dir_header                   If true, adds the file directory to the header
      --alsologtostderr                  log to standard error as well as files
      --colorized optionalBool           Show colored output in TTY. Overrides 'colorized' value from $HOME/.config/velero/config.json if present. Enabled by default
      --features stringArray             Comma-separated list of features to enable for this Velero process. Combines with values from $HOME/.config/velero/config.json if present
      --kubeconfig string                Path to the kubeconfig file to use to talk to the Kubernetes apiserver. If unset, try the environment variable KUBECONFIG, as well as in-cluster configuration
      --kubecontext string               The context to use to talk to the Kubernetes apiserver. If unset defaults to whatever your current-context is (kubectl config current-context)
      --log_backtrace_at traceLocation   when logging hits line file:N, emit a stack trace (default :0)
      --log_dir string                   If non-empty, write log files in this directory
      --log_file string                  If non-empty, use this log file
      --log_file_max_size uint           Defines the maximum size a log file can grow to. Unit is megabytes. If the value is 0, the maximum file size is unlimited. (default 1800)
      --logtostderr                      log to standard error instead of files (default true)
  -n, --namespace string                 The namespace in which Velero should operate (default "velero")
      --skip_headers                     If true, avoid header prefixes in the log messages
      --skip_log_headers                 If true, avoid headers when opening log files
      --stderrthreshold severity         logs at or above this threshold go to stderr (default 2)
  -v, --v Level                          number for the log level verbosity
      --vmodule moduleSpec               comma-separated list of pattern=N settings for file-filtered logging

代码位置(velero/pkg/cmd/cli/install/install.go):

velero 运行机制:

a. 服务端

备份还原相关的资源均是通过命令行的方式创建的 CR 进行调谐出来的资源,状态信息也是根据 CR 状态信息获取的;

调用流程:  velero (cmd) ->  velero (operator)-> 结果

容器化运行 velero:

运行流程: 通过编译生成 velero 的二进制,velero 运行将会启动 operator 的监听,因此容器中启动 ./velero 程序就是开启了监听调谐逻辑;

kubectl  get deploy -n velero             velero -oyaml

b. 客户端

velero install xxx
velero create xxx
velero debug xxx
velero xxx   xxx

-----------------------------------------------------------
Usage:
  velero [command]

Available Commands:
  backup            Work with backups
  backup-location   Work with backup storage locations
  bug               Report a Velero bug
  client            Velero client related commands
  completion        Generate completion script
  create            Create velero resources
  debug             Generate debug bundle
  delete            Delete velero resources
  describe          Describe velero resources
  get               Get velero resources
  help              Help about any command
  install           Install Velero
  plugin            Work with plugins
  restic            Work with restic
  restore           Work with restores
  schedule          Work with schedules
  snapshot-location Work with snapshot locations
  uninstall         Uninstall Velero
  version           Print the velero version and associated image

备注:客户端的操作实际上就是创建和查询相关 CR 资源,所有的调谐逻辑由 Controller 完成相关调谐逻辑,基于反馈机制调谐达到预期目标;

存储配置机制:

服务端代码入口: pkg/cmd/velero/velero.go

服务端启动: pkg/cmd/server/server.go

package credentials

import (
	"github.com/pkg/errors"
	corev1api "k8s.io/api/core/v1"
	kbclient "sigs.k8s.io/controller-runtime/pkg/client"

	"github.com/vmware-tanzu/velero/pkg/util/kube"
)

// SecretStore defines operations for interacting with credentials
// that are stored in Secret.
type SecretStore interface {
	// Get returns the secret key defined by the given selector
	Get(selector *corev1api.SecretKeySelector) (string, error)
}

type namespacedSecretStore struct {
	client    kbclient.Client
	namespace string
}

// NewNamespacedSecretStore returns a SecretStore which can interact with credentials
// for the given namespace.
func NewNamespacedSecretStore(client kbclient.Client, namespace string) (SecretStore, error) {
	return &namespacedSecretStore{
		client:    client,
		namespace: namespace,
	}, nil
}

// Buffer returns the secret key defined by the given selector.
func (n *namespacedSecretStore) Get(selector *corev1api.SecretKeySelector) (string, error) {
	creds, err := kube.GetSecretKey(n.client, n.namespace, selector)
	if err != nil {
		return "", errors.Wrap(err, "unable to get key for secret")
	}

	return string(creds), nil
}

命令行机制:

# 命令过程解析
velero backup create tigera-operator-backup --include-namespaces tigera-operator  --wait #执行备份

备份代码入口(pkg/cmd/cli/backup/backup.go):

备注:本文解读 velero 代码结构,主要从几个关键机制进行解读,并不会根据每一个步骤进行展开,上述机制包含了 velero 运行的所有层面,可以理解并针对 velero 的一些功能进行定制化开发的需要;

优化思考

velero 现在存在的问题点(可以优化的内容):

优化点1:目前 velero 支持多种存储类型,但是需要 velero 安装时指定配置账户密码,后续更改和指定并不灵活,临时性备份还原可以保留当前模式,但是也造成了粘合度不高,使用的复杂度相对较高;

优化点2: 当下的 velero 基于集群备份的场景主要是单集群模式,但是跨集群迁移是十分必要的,对于单集群向其他集群迁移时,需要手动配置,人工操作完成;可以结合分布式多集群管理方案,引入集群调度模式,实现跨集群的备份调度和资源迁移能力;

优化点3:...

总结

etcd 级别的集群备份,中间件的数据层备份能力,但是对于应用跨集群的迁移能力目前还不完善的;分布式数据外延能力,跨集群集群资源迁移备份能力,可以通过引入 velero 并基于集群层级进行定制化开发可以完善云原生化扩展能力和数据安全;

附录:

http://t.zoukankan.com/cheyunhua-p-14431842.html (k8s备份工具之velero)
https://www.cnblogs.com/zphqq/p/13155394.html (veleo备份原理)

posted @ 2023-01-29 19:12  流雨声  阅读(721)  评论(0编辑  收藏  举报