Argo Rollouts BlueGreen 基础
Argo Rollouts BlueGreen 更新过程
1. 从稳定状态开始,activeService 和 PreviewService 都指向revision 1 的ReplicaSet。
2. 用户通过修改 Pod 模板(spec.template.spec)来发起更新。
3. 创建的ReplicaSet 的 revision 2 的大小为 0。
4. PreviewService被修改为指向revision 2的ReplicaSet 。 activeService 仍然指向 revision 1的ReplicaSet。
5. revision 2 的ReplicaSet将缩放为spec.replicas 或previewReplicaCount
6. 一旦revision 2的ReplicaSet Pod 完全可用,prePromotionAnalysis就会开始执行
7. prePromotionAnalysis 成功后,如果 autoPromotionEnabled 为 false 或 autoPromotionSeconds 非零,蓝色/绿色会暂停。
8. rollout可以由用户手动恢复,也可以通过autoPromotionSeconds 自动恢复。
9. 如果使用了 PreviewReplicaCount 功能,则revision 2 的ReplicaSet将缩放为 spec.replicas。
10. 此次rollout通过更新 activeService 以指向它来“升级” revision 2 的ReplicaSet。此时,没有任何服务指向修订版 1
11. postPromotionAnalytics 分析开始执行
12. 一旦 postPromotionAnalysis 成功完成,更新就会成功,并且revision 2 的ReplicaSet将被标记为稳定。此次rollout被认为已得到fully-promoted。
13. 等待scaleDownDelaySeconds(默认30秒)后,revision 1的ReplicaSet被缩小。
Argo Rollouts BlueGreen 完整配置
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: example-rollout-canary
spec:
# 运行的Pod实例数量,默认为1.
# Defaults to 1.
replicas: 5
analysis:
# 保留成功的数量
# Defaults to 5.
successfulRunHistoryLimit: 10
# 保留失败的数量
# Stages for unsuccessful: "Error", "Failed", "Inconclusive"
# Defaults to 5.
unsuccessfulRunHistoryLimit: 10
# 筛选Pod对象的标签选择器.
selector:
matchLabels:
app: guestbook
# WorkloadRef holds a references to a workload that provides Pod template
# (e.g. Deployment). If used, then do not use Rollout template property.
workloadRef:
apiVersion: apps/v1
kind: Deployment
name: rollout-ref-deployment
# 指定迁移到 Rollout 后是否缩减工作负载(Deployment)
# "never": Deployment 不会减少
# "onsuccess": 在 Rollout 变得健康后,Deployment会减少
# "progressively": 随着 Rollout 的增加,Deployment 也随之减少
# If the Rollout fails the Deployment will be scaled back up.
scaleDown: never|onsuccess|progressively
# Template describes the pods that will be created. Same as deployment.
# If used, then do not use Rollout workloadRef property.
template:
spec:
containers:
- name: guestbook
image: argoproj/rollouts-demo:blue
# 无容器crash的情况下,新建的Pod被视为可用的最短时长,默认为0,即立即转为Ready
minReadySeconds: 30
# 更新历史中保留的ReplicaSet Revision数量.
# Defaults to 10
revisionHistoryLimit: 3
# 是否置为暂停状态
paused: true
# 更新过程中,更新步骤的最大等待时长,默认为600秒;
# Defaults to 600s
progressDeadlineSeconds: 600
# 未使用analysis或experiment而progressDeadlineSeconds超时的情况下,是否中止更新过程,默认为false;
progressDeadlineAbort: false
# 重启Pod的时刻,其值为UTC时间戳格式;
restartAt: "2020-03-30T21:19:35Z"
# 回滚窗口
rollbackWindow:
revisions: 3
# 更新策略,支持canary和blueGreen两种;
strategy:
# Blue-green update strategy
blueGreen:
# 当前活动状态的服务,也是即将更新的服务
# Required.
activeService: active-service
# Promote操作之前要运行的Analysis,分析的结果决定了Rollout是进行流量切换,还是中止Rollout
prePromotionAnalysis:
templates:
- templateName: success-rate
args:
- name: service-name
value: guestbook-svc.default.svc.cluster.local
# Promote操作之后要运行的Analysis ,若分析运行失败或出错,则Rollout进入中止状态并将流量切换回之前的稳定ReplicaSet
postPromotionAnalysis:
templates:
- templateName: success-rate
args:
- name: service-name
value: guestbook-svc.default.svc.cluster.local
# 预览版的服务,也是要更新成的目标服务版本
previewService: preview-service
# preview版本RS应运行的Pod数,默认为100%
previewReplicaCount: 1
# 是否允许自动进行Promote,默认值为true.
autoPromotionEnabled: false
# 在指定的时长之后执行Promote
autoPromotionSeconds: 30
# 缩容前一个ReplicaSet规模的延迟时长,默认为30s;
scaleDownDelaySeconds: 30
# 在旧RS上启动缩容之前,可运行着的旧RS的数量;
# down. Defaults to nil
scaleDownDelayRevisionLimit: 2
# 启用了trafficRouting时,因更新中止 而收缩Canary版本Pod数量之前的延迟时长,默认为30s;
abortScaleDownDelaySeconds: 30
# 期望的ReplicaSet和之前的ReplicaSet之间的反亲和关系
antiAffinity:
requiredDuringSchedulingIgnoredDuringExecution: {}
preferredDuringSchedulingIgnoredDuringExecution:
weight: 1 # Between 1 - 100
# 在当前活动的pod上添加元数据
activeMetadata:
labels:
role: active
# 更新期间添加到preview版本相关Pod上的元数据
previewMetadata:
labels:
role: preview
# 更新期间最多允许处于不可用状态的Pod数量或百分比,如果 MaxSurge 为 0,则该值不能为 0。
maxUnavailable: 0
Argo Rollouts BlueGreen 示例
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
name: success-rate
namespace: argo-demo
spec:
args:
- name: service-name
metrics:
- name: success-rate
successCondition: result[0] >= 0.95
interval: 20s
count: 5
failureLimit: 5
provider:
prometheus:
address: http://prometheus.istio-system.svc.wgs.local:9090
query: |
sum(irate(
istio_requests_total{reporter="source",destination_service=~"{{args.service-name}}",response_code!~"5.*"}[1m]
)) /
sum(irate(
istio_requests_total{reporter="source",destination_service=~"{{args.service-name}}"}[1m]
))
---
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: rollout-nginx-bluegreen-with-analysis
namespace: argo-demo
spec:
replicas: 3
revisionHistoryLimit: 5
selector:
matchLabels:
app: rollout-nginx-bluegreen
template:
metadata:
labels:
app: rollout-nginx-bluegreen
spec:
containers:
- name: nginx
image: nginx:1.24-alpine
ports:
- containerPort: 80
strategy:
blueGreen:
activeService: nginx
previewService: nginx-preview
prePromotionAnalysis:
templates:
- templateName: success-rate
args:
- name: service-name
value: nginx-preview.argo-demo.svc.wgs.local
postPromotionAnalysis:
templates:
- templateName: success-rate
args:
- name: service-name
value: nginx.argo-demo.svc.wgs.local
autoPromotionEnabled: true
---
kind: Service
apiVersion: v1
metadata:
name: nginx
namespace: argo-demo
spec:
selector:
app: rollout-nginx-bluegreen
ports:
- protocol: TCP
port: 80
targetPort: 80
---
kind: Service
apiVersion: v1
metadata:
name: nginx-preview
namespace: argo-demo
spec:
selector:
app: rollout-nginx-bluegreen
ports:
- protocol: TCP
port: 80
targetPort: 80
参考文档
https://argoproj.github.io/argo-rollouts/features/bluegreen/