【原创】Kubernetes-Statefulset(有状态负载)原理和实践
一、什么是有状态负载(Statufulset)?
StatefulSet 主要用于管理有状态的应用,它创建的Pod有持久型的标识符,即便Pod被调度的集群中不同的node节点或销毁重启后,标识符任然会保留,另外,支持Pod实例有序的部署和删除,它有如下特点:
1、Pod一致性:PodName、HostName、Pod的启动和停止的顺序在运行的过程中会保持一致
2、稳定的存储:通过VolumeClaimTemplate为每个Pod创建一个PVC和PV,即使删除掉Pod或进行缩容,不会删掉卷,当重启或者扩容后会自动将之前的卷进行挂载,这样就可以保证Pod有稳定的存储
3、稳定的网络:Statufulset结合headless service会给个创建的Pod配置一个DNS,其格式为(podname).(headless server name).namespace.svc.cluster.local,Pod实例之间可以通过域名进行访问
4、稳定的次序:即Pod是有顺序的,在部署或者扩展的时候要依据定义的顺序依次依次进行(即从0到N-1,在下一个Pod运行之前所有之前的Pod必须都是Running和Ready状态),删除或缩容的时候,会从N-1到0
二、Statufulset的使用场景
在应用中对上文Statufulset的特点有需求的可以考虑使用Statufulset,在实际的应用中,经常在分布式应用中使用,如多个mysql实例,各个实例之间有其对应关系,如:主从、主备,对数据的持久化保存、启动顺序、以及实例之间相互访问的场景。
三、Statufulset的创建和使用
官方推荐的创建Statufulset的顺序为:创建PV->创建PVC->创建Headless Service->创建StatufulSet,读者可能会好奇,为什么需要需要PV、PVC和Headless Service?
1、为什么需要PV和PVC?
创建PV和PVC然后挂载到Pod的容器中,实现数据持久化的保存,本文采用的静态创建PVC进行讲述,更方便的做法是采用storageclass动态创建存储卷,这样可以减少集群管理员创建PV这个过程,这里另外的文章再详细描述。
2、为什么需要Headless Service?
笔者另外的的一篇文章“K8S-Serivce的原理和实践”详细介绍了的Headless Service的创建和特点,通过对Headless Service的名称进行域名解析后会返回后端所有的Pod的IP,而通过Statufulset副本控制器创建的每个Pod都会其配置一个DNS,这个域名的格式为:(podname).(headless server name).namespace.svc.cluster.local,从这里可以得知为什么需要先创建Headless Service,一个作用是返回后端所有的Pod以便Headless Service为每个Pod配置DNS,Pod配置DNS的时候会将Headless Service的名称作为Pod域名的一部分,可以测试下如果少了Service,StatufulSet能否创建成功
[root@k8s-master zhanglei]# cat statesfulset-test.yaml apiVersion: apps/v1 kind: StatefulSet metadata: name: myapp-statefulset spec: # serviceName: myapp-headless-service replicas: 2 selector: matchLabels: app: myapp-pod template: metadata: labels: app: myapp-pod spec: containers: - name: myapp image: ikubernetes/myapp:v1 ports: - containerPort: 80 name: web volumeMounts: - name: myappdata-pvc mountPath: /usr/share/nginx/html volumeClaimTemplates: - metadata: name: myappdata-pvc spec: accessModes: [ "ReadWriteOnce" ] resources: requests: storage: 0.05Gi
[root@k8s-master zhanglei]# kubectl create -f sts-testservice.yaml
error: error validating "sts-testservice.yaml": error validating data: ValidationError(StatefulSet.spec): missing required field "serviceName" in io.k8s.api.apps.v1.StatefulSetSpec; if you choose to ignore these errors, turn validation off with --validate=false
在yaml文件中,注释掉serviceName后,再执行创建的操作会报错:缺少必填字段serviceName,可以看出若未指定此字段,则Statufulset不会创建成功。
在笔者前面的文章中K8S-PV和PVC的原理和实践介绍了PV和PVC的创建过程、在“K8S-Serivce的原理和实践”介绍了Headless Service的创建过程,这里都不再进行赘述看下已经创建好的PV、PVC和Headless Service
[root@k8s-master zhanglei]# kubectl get pv |grep pv-statefulset pv-statefulset-03 107374182400m RWO Recycle Bound default/myappdata-pvc-myapp-statefulset-0 15d pv-statefulset-04 107374182400m RWO Recycle Bound default/myappdata-pvc-myapp-statefulset-1 14d
[root@k8s-master zhanglei]# kubectl get pvc |grep myappdata myappdata-pvc-myapp-statefulset-0 Bound pv-statefulset-03 107374182400m RWO 14d myappdata-pvc-myapp-statefulset-1 Bound pv-statefulset-04 107374182400m RWO 14d
[root@k8s-master zhanglei]# cat headless-svc-stu.yaml apiVersion: v1 kind: Service metadata: name: myapp-headless-service labels: app: statefulset spec: ports: - port: 80 name: web clusterIP: None selector: app: myapp-pod
创建Statufulset
[root@k8s-master zhanglei]# cat statesfulset-test.yaml apiVersion: apps/v1 kind: StatefulSet metadata: name: myapp-statefulset spec: serviceName: myapp-headless-service # 指定已经创建成功的headless Service replicas: 2 # 指定期望副本数为2 selector: matchLabels: app: myapp-pod template: metadata: labels: app: myapp-pod spec: containers: - name: myapp image: ikubernetes/myapp:v1 ports: - containerPort: 80 name: web volumeMounts: - name: myappdata-pvc mountPath: /usr/share/nginx/html volumeClaimTemplates: # 数据持久化声明 - metadata: name: myappdata-pvc spec: accessModes: [ "ReadWriteOnce" ] # 声明访问模式 resources: requests: storage: 0.05Gi # 声明容量
[root@k8s-master zhanglei]# kubectl get sts myapp-statefulset -o wide NAME READY AGE CONTAINERS IMAGES myapp-statefulset 2/2 14d myapp ikubernetes/myapp:v1
[root@k8s-master zhanglei]# kubectl describe sts myapp-statefulset Name: myapp-statefulset Namespace: default CreationTimestamp: Sat, 23 May 2020 18:25:02 +0800 Selector: app=myapp-pod Labels: <none> Annotations: <none> Replicas: 2 desired | 2 total Update Strategy: RollingUpdate Partition: 0 Pods Status: 2 Running / 0 Waiting / 0 Succeeded / 0 Failed Pod Template: Labels: app=myapp-pod Containers: myapp: Image: ikubernetes/myapp:v1 Port: 80/TCP Host Port: 0/TCP Environment: <none> Mounts: /usr/share/nginx/html from myappdata-pvc (rw) Volumes: <none> Volume Claims: Name: myappdata-pvc StorageClass: Labels: <none> Annotations: <none> Capacity: 53687091200m Access Modes: [ReadWriteOnce] Events: <none>
看下Pod的的状态,如下所示,是Running状态
[root@k8s-master zhanglei]# kubectl get pod -o wide | grep myapp-statefulset myapp-statefulset-0 1/1 Running 0 5d21h 10.122.235.239 k8s-master <none> <none> myapp-statefulset-1 1/1 Running 0 112m 10.122.235.253 k8s-master <none> <none>
验证域名:在前面提到Statufulset副本控制器结合headless service会为每个创建的Pod配置一个DNS域名,先接解析headless service的名称返回
[root@k8s-master zhanglei]# dig -t A myapp-headless-service.default.svc.cluster.local. @10.10.0.10 ; <<>> DiG 9.11.4-P2-RedHat-9.11.4-26.P2.el8 <<>> -t A myapp-headless-service.default.svc.cluster.local. @10.10.0.10 ;; global options: +cmd ;; Got answer: ;; WARNING: .local is reserved for Multicast DNS ;; You are currently testing what happens when an mDNS query is leaked to DNS ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 22543 ;; flags: qr aa rd; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1 ;; WARNING: recursion requested but not available ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ; COOKIE: 8e8a5971efec82f4 (echoed) ;; QUESTION SECTION: ;myapp-headless-service.default.svc.cluster.local. IN A ;; ANSWER SECTION: # 返回了有状态负载创建的所有Pod myapp-headless-service.default.svc.cluster.local. 30 IN A 10.122.235.253 myapp-headless-service.default.svc.cluster.local. 30 IN A 10.122.235.239 ;; Query time: 13 msec ;; SERVER: 10.10.0.10#53(10.10.0.10) ;; WHEN: 日 6月 07 18:05:46 CST 2020 ;; MSG SIZE rcvd: 217
可以看到通过对headless service的名称的域名解析后返回了所有的Pod的列表,再对单个的Pod的进行域名解析
[root@k8s-master zhanglei]# dig -t A myapp-statefulset-0.myapp-headless-service.default.svc.cluster.local. @10.10.0.10 #对Pod-0进行域名解析 ; <<>> DiG 9.11.4-P2-RedHat-9.11.4-26.P2.el8 <<>> -t A myapp-statefulset-0.myapp-headless-service.default.svc.cluster.local. @10.10.0.10 ;; global options: +cmd ;; Got answer: ;; WARNING: .local is reserved for Multicast DNS ;; You are currently testing what happens when an mDNS query is leaked to DNS ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 46972 ;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 ;; WARNING: recursion requested but not available ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ; COOKIE: d930083e06cfaca9 (echoed) ;; QUESTION SECTION: ;myapp-statefulset-0.myapp-headless-service.default.svc.cluster.local. IN A ;; ANSWER SECTION: myapp-statefulset-0.myapp-headless-service.default.svc.cluster.local. 30 IN A 10.122.235.239 # 返回了其IP地址 ;; Query time: 19 msec ;; SERVER: 10.10.0.10#53(10.10.0.10) ;; WHEN: 日 6月 07 18:09:18 CST 2020 ;; MSG SIZE rcvd: 193
同样也可以对myapp-statefulset-1这个Pod进行域名解析会返回此Pod的IP,2个Pod实例之间可以通过域名进行访问,适合数据库的主、从Pod实例互相访问的场景。
验证服务的稳定性:
[root@k8s-master zhanglei]# kubectl describe pod myapp-statefulset-0 | grep ClaimName ClaimName: myappdata-pvc-myapp-statefulset-0 [root@k8s-master zhanglei]# kubectl delete pod myapp-statefulset-0 pod "myapp-statefulset-0" deleted [root@k8s-master zhanglei]# kubectl get pod NAME READY STATUS RESTARTS AGE myapp-statefulset-0 1/1 Running 0 14s myapp-statefulset-1 1/1 Running 0 129m
删除Pod后,重新创建的Pod名字与删除的一致,且使用同一个PVC,Pod的名称保持了一致性,因为使用还是原来的PVC,因此数据并未丢失,实现了持久化。
验证扩缩容的顺序:
现在是2个Pod,先缩容到1个,如下所示,可以看到缩容后停止的是myapp-statefulset-1 Pod,即验证先从序号为N-1开始删除,以N-1到0的顺序
[root@k8s-master zhanglei]# kubectl get sts NAME READY AGE myapp-statefulset 2/2 14d [root@k8s-master zhanglei]# kubectl scale sts myapp-statefulset --replicas=1 statefulset.apps/myapp-statefulset scaled [root@k8s-master zhanglei]# kubectl get pod |grep myapp myapp-statefulset-0 1/1 Running 0 5m43s
[root@k8s-master zhanglei]# kubectl get pvc|grep myappdata-pvc-myapp-statefulset myappdata-pvc-myapp-statefulset-0 Bound pv-statefulset-03 107374182400m RWO 15d myappdata-pvc-myapp-statefulset-1 Bound pv-statefulset-04 107374182400m RWO 15d
虽然对Pod进行了缩容,但是之前挂载在myapp-statefulset-1 Pod上的PVC卷并未删除,保留了历史数据,再扩容到3个Pod
[root@k8s-master zhanglei]# kubectl get pod |grep myapp-statefulset myapp-statefulset-0 1/1 Running 0 13m myapp-statefulset-1 1/1 Running 0 52s myapp-statefulset-2 0/1 Pending 0 49s
可以看到其扩容的创建Pod的顺序为0,1,2,其中myapp-statefulset-2还处于Pending状态,它会等myapp-statefulset-1为Running状态后才会执行创建
验证volume共享:
[root@k8s-master zhanglei]# kubectl describe pv pv-statefulset-testservice Name: pv-statefulset-testservice Labels: release=stable Annotations: pv.kubernetes.io/bound-by-controller: yes Finalizers: [kubernetes.io/pv-protection] StorageClass: Status: Bound Claim: default/myappdata-pvc-myapp-statefulset-2 Reclaim Policy: Recycle Access Modes: RWO VolumeMode: Filesystem Capacity: 107374182400m Node Affinity: <none> Message: Source: Type: HostPath (bare host directory volume) Path: /data/pod/volume7 # 宿主机的目录 HostPathType: Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal RecyclerPod 11m persistentvolume-controller Recycler pod: Successfully assigned default/recycler-for-pv-statefulset-testservice to k8s-master Normal RecyclerPod 11m persistentvolume-controller Recycler pod: Pulling image "busybox:1.27" Normal RecyclerPod 11m persistentvolume-controller Recycler pod: Successfully pulled image "busybox:1.27" Normal RecyclerPod 11m persistentvolume-controller Recycler pod: Created container pv-recycler Normal RecyclerPod 11m persistentvolume-controller Recycler pod: Started container pv-recycler Normal VolumeRecycled 11m persistentvolume-controller Volume recycled
登录到容器共享目录 /usr/share/nginx/html(describe Pod Mounts可查看)目录下创建1个sts.txt文件
[root@k8s-master zhanglei]# kubectl exec -it myapp-statefulset-2 -- sh / # ls bin dev etc home lib media mnt proc root run sbin srv sys tmp usr var / # cd /usr/share/nginx/html /usr/share/nginx/html # touch sts.txt
回到宿主机目录下验证该文件是否同步到宿主机/data/pod/volume7下,可以看到已完成了同步,验证完成,另外在宿主机此目录下的写入也会同步到容器映射目录。
[root@k8s-master volume7]# ls sts.txt
四、总结
StatufulSet非常适合类似数据库实例部署等对数据持久性、启动顺序、实例之间相互访问的场景,在创建的过程中要注意创建顺序:创建PV->创建PVC->创建Headless Service->创建StatufulSet。
作者简介:云计算容器\Docker\K8S方向产品经理,学点技术,为更好地设计产品。