K8S 健康检查


1. 健康检查(Probe)的定义

  k8s 在 Docker 技术的基础上,为应用提供容器跨多个服务器主机的容器部署和管理、服务发现、负载均衡和动态伸缩等一系列完整功能,可方便地进行大规模容器集群管理。云上应用程序在运行过程中,由于一些不确定因素(例如网络瞬间不可达、配置错误、程序内部错误等),经常导致出现一些异常状况。为此 k8s 提供了一套完善的容器健康检查的探测机制。健康检查又称为探针(Probe),是由 kubelet 对容器执行的定期诊断。

2. 探针的种类

2.1 存活检查(livenessprobe,存活探针)

判断容器是否正在运行。如果探测失败,则 kubelet 会杀死容器,并且容器将根据 restartPolicy 来设置 Pod 状态,如果容器不提供存活探针,则默认状态为 Success。

2.2 就绪检查(readinessprobe,就绪探针,业务探针)

判断容器是否准备好接受请求。如果探测失败,端点控制器将从与 Pod 匹配的所有 service endpoints 中剔除删除该 Pod 的 IP 地址,这样失败的 Pod 就无法提供服务了。初始延迟之前的就绪状态默认为 Failure。如果容器不提供就绪探针,则默认状态为 Success。

2.3 启动检查(startupprobe,启动探针,1.17 版本新增)

判断容器内的应用程序是否已启动,主要针对于不能确定具体启动时间的应用。如果匹配了 startupProbe 探测,则在 startupProbe 状态为 Success 之前,其他所有探针都处于无效状态,直到它成功后其他探针才起作用。如果 startupProbe 失败,kubelet 将杀死容器,容器将根据 restartPolicy 来重启。如果容器没有配置 startupProbe,则默认状态为 Success。

如果以上三种规则同时定义。在 readinessProbe 检测成功之前,Pod 的运行状态是不会变成 ready 状态的。

3. Probe 支持的三种检测方法

3.1 exec

在容器内执行指定的 shell 命令,如果命令返回 0,说明容器运行状态正常;如果命名返回非 0 值,说明容器运行状态异常。

3.2 tcpSocket

使用 TCP Socket 连接容器中的指定端口,如果能够建立连接,kubelet 会认为容器处于健康状态;如果无法建立连接,kubelet 认为容器处于异常状态。

3.3 httpGet

使用 HTTP GET 请求指定的 URI,如果返回了成功状态码(2xx 或 3xx),kubelet 会认为容器处于健康状态;如果返回了失败的状态码(除 2xx 和 3xx 外的状态码),则 kubelet 会认为容器处于异常状态。

每次探测都将获得以下三种结果之一:
● 成功:容器通过了诊断
● 失败:容器未通过诊断
● 未知:诊断失败,因此不会采取任何行动

4. 可选参数

行为属性名称 默认值 最小值 备注
initialDelaySeconds 0 秒 0 秒 探测延迟时长,容器启动后多久开始进行第一次探测工作。
timeoutSeconds 1 秒 1 秒 探测的超时时长。
periodSeconds 10 秒 1 秒 探测频度,频率过高会对 pod 带来较大的额外开销,频率过低则无法及时反映容器产生的错误。
failureThreshold 3 1 处于成功状态时,探测连续失败几次可被认为失败。
successThreshold 1 1 处于失败状态时,探测连续成功几次,被认为成功。

5. 探测示例

5.1 exec

官方示例

apiVersion: v1
kind: Pod
metadata:
  labels:
    test: liveness
  name: liveness-exec
spec:
  containers:
  - name: liveness
    image: k8s.gcr.io/busybox
    args:
    - /bin/sh
    - -c
    - touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600
    livenessProbe:
      exec:
        command:
        - cat
        - /tmp/healthy
      initialDelaySeconds: 5
      periodSeconds: 5

在这个配置文件中,可以看到 Pod 中只有一个容器。 periodSeconds 字段指定了 kubelet 应该每 5 秒执行一次存活探测。 initialDelaySeconds 字段告诉 kubelet 在执行第一次探测前应该等待 5 秒。 kubelet 在容器内执行命令 cat /tmp/healthy 来进行探测。 如果命令执行成功并且返回值为 0,kubelet 就会认为这个容器是健康存活的。 如果这个命令返回非 0 值,kubelet 会杀死这个容器并重新启动它。

编写 yaml 资源配置清单

[root@master ~]#vim exec.yaml
[root@master ~]#cat exec.yaml
apiVersion: v1
kind: Pod
metadata:
  name: liveness-exec
  namespace: default
spec:
  containers:
  - name: liveness-exec-container
    image: busybox
    imagePullPolicy: IfNotPresent
    command: ["/bin/sh","-c","touch /tmp/live; sleep 30; rm -rf /tmp/live; sleep 600"]
    livenessProbe:
      exec:
        command: ["test","-e","/tmp/live"]
      initialDelaySeconds: 1
      periodSeconds: 3

在这个配置文件中,可以看到 Pod 只有一个容器。
容器中的 command 字段表示创建一个 /tmp/live 文件后休眠 30 秒,休眠结束后删除该文件,并休眠 10 分钟。
仅使用 livenessProbe 存活探针,并使用 exec 检查方式,对 /tmp/live 文件进行存活检测。
initialDelaySeconds 字段表示 kubelet 在执行第一次探测前应该等待 1 秒。
periodSeconds 字段表示 kubelet 每隔 3 秒执行一次存活探测。

创建资源

[root@master ~]#kubectl create -f exec.yaml
pod/liveness-exec created

跟踪查看 pod 状态

[root@master ~]#kubectl get pod -o wide -w
NAME            READY   STATUS    RESTARTS   AGE   IP            NODE     NOMINATED NODE   READINESS GATES
liveness-exec   1/1     Running   0          36s   10.244.1.15   node01   <none>           <none>
liveness-exec   1/1     Running   1 (1s ago)   74s   10.244.1.15   node01   <none>           <none>
liveness-exec   1/1     Running   2 (0s ago)   2m22s   10.244.1.15   node01   <none>           <none>
liveness-exec   1/1     Running   3 (1s ago)   3m32s   10.244.1.15   node01   <none>           <none>
liveness-exec   1/1     Running   4 (0s ago)   4m40s   10.244.1.15   node01   <none>           <none>
liveness-exec   1/1     Running   5 (0s ago)   5m49s   10.244.1.15   node01   <none>           <none>
liveness-exec   0/1     CrashLoopBackOff   5 (0s ago)   6m58s   10.244.1.15   node01   <none>           <none>
liveness-exec   1/1     Running            6 (82s ago)   8m20s   10.244.1.15   node01   <none>           <none>
liveness-exec   0/1     CrashLoopBackOff   6 (0s ago)    9m28s   10.244.1.15   node01   <none>           <none>
liveness-exec   1/1     Running            7 (2m41s ago)   12m     10.244.1.15   node01   <none>           <none>
liveness-exec   0/1     CrashLoopBackOff   7 (1s ago)      13m     10.244.1.15   node01   <none>           <none>
......

查看 pod 事件

[root@master ~]#kubectl describe pod liveness-exec
Name:         liveness-exec
Namespace:    default
......
Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  16m                   default-scheduler  Successfully assigned default/liveness-exec to node01
  Normal   Pulling    16m                   kubelet            Pulling image "busybox"
  Normal   Pulled     16m                   kubelet            Successfully pulled image "busybox" in 2.316123738s
  Normal   Killing    13m (x3 over 15m)     kubelet            Container liveness-exec-container failed liveness probe, will be restarted
  Normal   Created    12m (x4 over 16m)     kubelet            Created container liveness-exec-container
  Normal   Started    12m (x4 over 16m)     kubelet            Started container liveness-exec-container
  Normal   Pulled     12m (x3 over 14m)     kubelet            Container image "busybox" already present on machine
  Warning  Unhealthy  11m (x13 over 15m)    kubelet            Liveness probe failed:
  Warning  BackOff    66s (x30 over 9m14s)  kubelet            Back-off restarting failed container

每次健康检查失败后 kubelet 启动 killing 程序并拉取镜像创建新的容器

5.2 httpGet 方式

官方示例

apiVersion: v1
kind: Pod
metadata:
  labels:
    test: liveness
  name: liveness-http
spec:
  containers:
  - name: liveness
    image: k8s.gcr.io/liveness
    args:
    - /server
    livenessProbe:
      httpGet:
        path: /healthz
        port: 8080
        httpHeaders:
        - name: Custom-Header
          value: Awesome
      initialDelaySeconds: 3
      periodSeconds: 3

在这个配置文件中,可以看到 Pod 只有一个容器。initialDealySeconds 字段告诉 kubelet 在执行第一次探测前应该等待 3 秒。preiodSeconds 字段指定了 kubelet 每隔 3 秒执行一次存活探测。kubelet 会向容器内运行的服务(服务会监听 8080 端口)发送一个认为容器是健康存活的。如果处理程序返回失败代码,则 kubelet 会杀死这个容器并且重新启动它。
任何大于或等于 200 并且小于 400 的返回代码标示成功,其他返回代码都标示失败。

编写 yaml 资源配置清单

[root@master ~]#cat httpget.yaml
apiVersion: v1
kind: Pod
metadata:
  name: liveness-httpget
  namespace: default
spec:
  containers:
  - name: liveness-httpget-container
    image: nginx
    imagePullPolicy: IfNotPresent
    ports:
    - name: nginx
      containerPort: 80
    livenessProbe:
      httpGet:
        port: nginx
        path: /index.html
      initialDelaySeconds: 1
      periodSeconds: 3
      timeoutSeconds: 10

创建资源

[root@master ~]#kubectl create -f httpget.yaml
pod/liveness-httpget created
[root@master ~]#kubectl get pod
NAME               READY   STATUS    RESTARTS   AGE
liveness-httpget   1/1     Running   0          59s

删除 Pod 的 index.html 文件

[root@master ~]#kubectl exec -it liveness-httpget -- rm -rf /usr/share/nginx/html/index.html

查看 pod 状态

[root@master ~]#kubectl get pod -w
NAME               READY   STATUS    RESTARTS      AGE
liveness-httpget   1/1     Running   1 (18s ago)   2m6s
......

查看容器事件

[root@master ~]#kubectl describe pod liveness-httpget
......
Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  4m22s                  default-scheduler  Successfully assigned default/liveness-httpget to node02
  Normal   Pulling    4m21s                  kubelet            Pulling image "nginx"
  Normal   Pulled     3m25s                  kubelet            Successfully pulled image "nginx" in 55.685103435s
  Normal   Created    2m34s (x2 over 3m25s)  kubelet            Created container liveness-httpget-container
  Warning  Unhealthy  2m34s (x3 over 2m40s)  kubelet            Liveness probe failed: HTTP probe failed with statuscode: 404
  Normal   Killing    2m34s                  kubelet            Container liveness-httpget-container failed liveness probe, will be restarted
  Normal   Pulled     2m34s                  kubelet            Container image "nginx" already present on machine
  Normal   Started    2m33s (x2 over 3m25s)  kubelet            Started container liveness-httpget-container

重启原因是 HTTP 探测得到的状态返回码是 404,HTTP probe failed with statuscode: 404。
重启完成后,不会再次重启,因为重新拉取的镜像中包含了 index.html 文件。

5.3 tcpSocket 方式

官方示例

apiVersion: v1
kind: Pod
metadata:
  name: goproxy
  labels:
    app: goproxy
spec:
  containers:
  - name: goproxy
    image: k8s.gcr.io/goproxy:0.1
    ports:
    - containerPort: 8080
    readinessProbe:
      tcpSocket:
        port: 8080
      initialDelaySeconds: 5
      periodSeconds: 10
    livenessProbe:
      tcpSocket:
        port: 8080
      initialDelaySeconds: 15
      periodSeconds: 20

这个例子同时使用 readinessProbe 和 livenessProbe 探测。kubelet 会在容器启动 5 秒后发送第一个 readiness 探测。这会尝试连接 goproxy 容器的 8080 端口。如果探测成功,kubelet 将继续每隔 10 秒运行一次检测。除了 readinessProbe 探测,这个配置包括了一个 livenessProbe 探测。kubelet 会在容器启动 15 秒后进行第一次 livenessProbe 探测。就像 readinessProbe 探测一样,会尝试连接 goproxy 容器的 8080 端口。如果 livenessProbe 探测失败,这个容器会被重新启动。

编写 yaml 资源配置清单

[root@master ~]#vim tcpsocket.yaml
[root@master ~]#cat tcpsocket.yaml
apiVersion: v1
kind: Pod
metadata:
  name: liveness-tcpsocket
spec:
  containers:
  - name: liveness-tcpsocket-container
    image: nginx
    livenessProbe:
      initialDelaySeconds: 5
      timeoutSeconds: 1
      tcpSocket:
        port: 8080
      periodSeconds: 3

创建资源

[root@master ~]#kubectl apply -f tcpsocket.yaml
pod/liveness-tcpsocket created

跟踪查看 pod 状态

[root@master ~]#kubectl get pod -w
NAME                 READY   STATUS    RESTARTS   AGE
liveness-tcpsocket   1/1     Running   0          31s
liveness-tcpsocket   1/1     Running   1 (16s ago)   43s
liveness-tcpsocket   1/1     Running   2 (17s ago)   71s
liveness-tcpsocket   1/1     Running   3 (2s ago)    83s
liveness-tcpsocket   0/1     CrashLoopBackOff   3 (1s ago)    94s
liveness-tcpsocket   1/1     Running            4 (28s ago)   2m1s
......

查看 pod 事件

[root@master ~]#kubectl describe pod liveness-tcpsocket
......
Events:
  Type     Reason     Age                  From               Message
  ----     ------     ----                 ----               -------
  Normal   Scheduled  2m44s                default-scheduler  Successfully assigned default/liveness-tcpsocket to node02
  Normal   Pulled     2m28s                kubelet            Successfully pulled image "nginx" in 15.610378425s
  Normal   Pulled     2m1s                 kubelet            Successfully pulled image "nginx" in 15.598030812s
  Normal   Created    94s (x3 over 2m28s)  kubelet            Created container liveness-tcpsocket-container
  Normal   Started    94s (x3 over 2m28s)  kubelet            Started container liveness-tcpsocket-container
  Normal   Pulled     94s                  kubelet            Successfully pulled image "nginx" in 15.553201391s
  Warning  Unhealthy  83s (x9 over 2m23s)  kubelet            Liveness probe failed: dial tcp 10.244.2.18:8080: connect: connection refused
  Normal   Killing    83s (x3 over 2m17s)  kubelet            Container liveness-tcpsocket-container failed liveness probe, will be restarted
  Normal   Pulling    83s (x4 over 2m43s)  kubelet            Pulling image "nginx"

重启原因是 nginx 使用的默认端口为 80,8080 端口的健康检查被拒绝访问

删除 pod

[root@master ~]#kubectl delete -f tcpsocket.yaml
pod "liveness-tcpsocket" deleted

修改 tcpSocket 端口

[root@master ~]#vim tcpsocket.yaml
[root@master ~]#cat tcpsocket.yaml
apiVersion: v1
kind: Pod
metadata:
  name: liveness-tcpsocket
spec:
  containers:
  - name: liveness-tcpsocket-container
    image: nginx
    livenessProbe:
      initialDelaySeconds: 5
      timeoutSeconds: 1
      tcpSocket:
        port: 80        # 修改为 80 端口
      periodSeconds: 3

创建资源并查看

[root@master ~]#kubectl apply -f tcpsocket.yaml
pod/liveness-tcpsocket created
[root@master ~]#kubectl get pod -o wide -w
NAME                 READY   STATUS              RESTARTS   AGE   IP       NODE     NOMINATED NODE   READINESS GATES
liveness-tcpsocket   0/1     ContainerCreating   0          5s    <none>   node02   <none>           <none>
liveness-tcpsocket   1/1     Running             0          17s   10.244.2.19   node02   <none>           <none>
......

查看 pod 事件

[root@master ~]#kubectl describe pod liveness-tcpsocket
......
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  85s   default-scheduler  Successfully assigned default/liveness-tcpsocket to node02
  Normal  Pulling    85s   kubelet            Pulling image "nginx"
  Normal  Pulled     69s   kubelet            Successfully pulled image "nginx" in 15.532244594s
  Normal  Created    69s   kubelet            Created container liveness-tcpsocket-container
  Normal  Started    69s   kubelet            Started container liveness-tcpsocket-container

启动正常

5.4 readinessProbe 就绪探针 1

编写 yaml 资源配置清单

[root@master ~]#vim readiness-httpget.yaml
[root@master ~]#cat readiness-httpget.yaml
apiVersion: v1
kind: Pod
metadata:
  name: readiness-httpget
  namespace: default
spec:
  containers:
  - name: readiness-httpget-container
    image: nginx
    imagePullPolicy: IfNotPresent
    ports:
    - name: http
      containerPort: 80
    readinessProbe:
      httpGet:
        port: 80
        path: /index1.html #注意,这里设置个错误地址
      initialDelaySeconds: 1
      periodSeconds: 3
    livenessProbe:
      httpGet:
        port: http
        path: /index.html
      initialDelaySeconds: 1
      periodSeconds: 3
      timeoutSeconds: 10

创建资源查看 pod 状态

[root@master ~]#kubectl apply -f readiness-httpget.yaml
pod/readiness-httpget created
[root@master ~]#kubectl get pod
NAME                READY   STATUS    RESTARTS   AGE
readiness-httpget   0/1     Running   0          8s

STATUS 为 Running,但无法进入 READY 状态

查看 pod 事件

[root@master ~]#kubectl describe pod readiness-httpget
......
Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  43s                default-scheduler  Successfully assigned default/readiness-httpget to node02
  Normal   Pulled     42s                kubelet            Container image "nginx" already present on machine
  Normal   Created    42s                kubelet            Created container readiness-httpget-container
  Normal   Started    42s                kubelet            Started container readiness-httpget-container
  Warning  Unhealthy  1s (x17 over 41s)  kubelet            Readiness probe failed: HTTP probe failed with statuscode: 404

异常原因为 readinessProbe 检测的状态返回值为 404,kubelet 阻止 pod 进入 READY 状态

查看日志

[root@master ~]#kubectl logs readiness-httpget
......
2022/07/03 06:03:18 [error] 33#33: *65 open() "/usr/share/nginx/html/index1.html" failed (2: No such file or directory), client: 10.244.2.1, server: localhost, request: "GET /index1.html HTTP/1.1", host: "10.244.2.20:80"
2022/07/03 06:03:21 [error] 33#33: *67 open() "/usr/share/nginx/html/index1.html" failed (2: No such file or directory), client: 10.244.2.1, server: localhost, request: "GET /index1.html HTTP/1.1", host: "10.244.2.20:80"
10.244.2.1 - - [03/Jul/2022:06:03:21 +0000] "GET /index1.html HTTP/1.1" 404 153 "-" "kube-probe/1.22" "-"
10.244.2.1 - - [03/Jul/2022:06:03:21 +0000] "GET /index.html HTTP/1.1" 200 615 "-" "kube-probe/1.22" "-"

为容器创建 index1.html

[root@master ~]#kubectl exec -it readiness-httpget -- touch /usr/share/nginx/html/index1.html
[root@master ~]#kubectl get pod		# 恢复正常
NAME                READY   STATUS    RESTARTS   AGE
readiness-httpget   1/1     Running   0          2m52s

5.5 readinessProbe 就绪探针 2

编写 yaml 资源配置清单

[root@master ~]#vim readiness-multi-nginx.yaml
[root@master ~]#cat readiness-multi-nginx.yaml
apiVersion: v1
kind: Pod
metadata:
  name: nginx1
  labels:
    app: nginx
spec:
  containers:
  - name: nginx
    image: nginx
    imagePullPolicy: IfNotPresent
    ports:
    - name: http
      containerPort: 80
    readinessProbe:
      httpGet:
        port: http
        path: /index.html
      initialDelaySeconds: 5
      periodSeconds: 5
      timeoutSeconds: 10
---
apiVersion: v1
kind: Pod
metadata:
  name: nginx2
  labels:
    app: nginx
spec:
  containers:
  - name: nginx
    image: nginx
    imagePullPolicy: IfNotPresent
    ports:
    - name: http
      containerPort: 80
    readinessProbe:
      httpGet:
        port: http
        path: /index.html
      initialDelaySeconds: 5
      periodSeconds: 5
      timeoutSeconds: 10
---
apiVersion: v1
kind: Pod
metadata:
  name: nginx3
  labels:
    app: nginx
spec:
  containers:
  - name: nginx
    image: nginx
    imagePullPolicy: IfNotPresent
    ports:
    - name: http
      containerPort: 80
    readinessProbe:
      httpGet:
        port: http
        path: /index.html
      initialDelaySeconds: 5
      periodSeconds: 5
      timeoutSeconds: 10
---
apiVersion: v1
kind: Service
metadata:
  name: nginx-svc
spec:
# service 通过 selector 绑定到 nginx 集群中
  selector:
    app: nginx
  type: ClusterIP
  ports:
  - name: http
    port: 80
    targetPort: 80

创建资源

[root@master ~]#kubectl apply -f readiness-multi-nginx.yaml
pod/nginx1 created
pod/nginx2 created
pod/nginx3 created
service/nginx-svc created
......
[root@master ~]#kubectl get pod,svc -o wide
NAME                    READY   STATUS    RESTARTS   AGE   IP            NODE     NOMINATED NODE   READINESS GATES
pod/nginx1              1/1     Running   0          88s   10.244.1.16   node01   <none>           <none>
pod/nginx2              1/1     Running   0          88s   10.244.1.17   node01   <none>           <none>
pod/nginx3              1/1     Running   0          88s   10.244.2.21   node02   <none>           <none>
pod/readiness-httpget   1/1     Running   0          50m   10.244.2.20   node02   <none>           <none>

NAME                 TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE     SELECTOR
service/kubernetes   ClusterIP   10.96.0.1       <none>        443/TCP   6d19h   <none>
service/nginx-svc    ClusterIP   10.102.12.150   <none>        80/TCP    88s     app=nginx

删除 nginx1 中的 index.html

[root@master ~]#kubectl exec -it nginx1 -- rm -rf /usr/share/nginx/html/index.html
[root@master ~]#kubectl get pod -o wide -w	# nginx1 的 READY 状态变为 0/1
NAME                READY   STATUS    RESTARTS   AGE    IP            NODE     NOMINATED NODE   READINESS GATES
nginx1              1/1     Running   0          3m5s   10.244.1.16   node01   <none>           <none>
nginx2              1/1     Running   0          3m5s   10.244.1.17   node01   <none>           <none>
nginx3              1/1     Running   0          3m5s   10.244.2.21   node02   <none>           <none>
readiness-httpget   1/1     Running   0          52m    10.244.2.20   node02   <none>           <none>
nginx1              0/1     Running   0          3m10s   10.244.1.16   node01   <none>           <none>
......

查看 pod 事件

[root@master ~]#kubectl describe pod nginx1
......
Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  4m6s               default-scheduler  Successfully assigned default/nginx1 to node01
  Normal   Pulling    4m5s               kubelet            Pulling image "nginx"
  Normal   Pulled     3m19s              kubelet            Successfully pulled image "nginx" in 46.172728026s
  Normal   Created    3m19s              kubelet            Created container nginx
  Normal   Started    3m18s              kubelet            Started container nginx
  Warning  Unhealthy  1s (x15 over 66s)  kubelet            Readiness probe failed: HTTP probe failed with statuscode: 404

由于 httpGet 检测到的状态返回码为 404,所以 readinessProbe 失败,kubelet 将其设定为 noready 状态。

查看 service 详情

[root@master ~]#kubectl describe svc nginx-svc	# nginx1 被剔除出了 service 的终端列表
Name:              nginx-svc
Namespace:         default
Labels:            <none>
Annotations:       <none>
Selector:          app=nginx
Type:              ClusterIP
IP Family Policy:  SingleStack
IP Families:       IPv4
IP:                10.102.12.150
IPs:               10.102.12.150
Port:              http  80/TCP
TargetPort:        80/TCP
Endpoints:         10.244.1.17:80,10.244.2.21:80
Session Affinity:  None
Events:            <none>

查看终端

[root@master ~]#kubectl get endpoints
NAME         ENDPOINTS                       AGE
kubernetes   192.168.10.20:6443              6d19h
nginx-svc    10.244.1.17:80,10.244.2.21:80   6m43s

6. 启动、退出动作(postStart,preStop)

编写 yaml 资源配置清单

[root@master]#vim post.yaml
[root@master]#cat post.yaml
apiVersion: v1
kind: Pod
metadata:
  name: lifecycle-test
spec:
  containers:
  - name: lifecycle-test-container
    image: nginx
    lifecycle:
      postStart:
        exec:
          command: ["/bin/sh","-c","echo Hello from the postStart handler >> /var/log/nginx/message"]
      preStop:
        exec:
          command: ["/bin/sh","-c","echo Hello from the postStop handler >> /var/log/nginx/message"]
    volumeMounts:
    - name: message-log
      mountPath: /var/log/nginx/
      readOnly: false
  initContainers:
  - name: init-nginx
    image: nginx
    command: ["/bin/sh","-c","echo 'Hello initContainers' >> /var/log/nginx/message"]
    volumeMounts: 
    - name: message-log
      mountPath: /var/log/nginx/
      readOnly: false
  volumes:
  - name: message-log
    hostPath:
      path: /data/volumes/nginx/log/
      type: DirectoryOrCreate

创建资源

[root@master ~]#kubectl apply -f post.yaml 
pod/lifecycle-test created

跟踪查看 pod 状态

[root@master]#kubectl get pod -o wide -w
NAME             READY   STATUS     RESTARTS   AGE   IP       NODE     NOMINATED NODE   READINESS GATES
lifecycle-test   0/1     Init:0/1   0          5s    <none>   node01   <none>           <none>
lifecycle-test   0/1     PodInitializing   0          17s   10.244.1.73   node01   <none>           <none>
lifecycle-test   1/1     Running           0          19s   10.244.1.73   node01   <none>           <none>

查看 pod 事件

[root@master]#kubectl describe po lifecycle-test
......
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  46s   default-scheduler  Successfully assigned default/lifecycle-test to node01
  Normal  Pulling    45s   kubelet, node01    Pulling image "nginx"
  Normal  Pulled     30s   kubelet, node01    Successfully pulled image "nginx"
  Normal  Created    30s   kubelet, node01    Created container init-nginx
  Normal  Started    30s   kubelet, node01    Started container init-nginx
  Normal  Pulling    29s   kubelet, node01    Pulling image "nginx"
  Normal  Pulled     27s   kubelet, node01    Successfully pulled image "nginx"
  Normal  Created    27s   kubelet, node01    Created container lifecycle-test-container
  Normal  Started    27s   kubelet, node01    Started container lifecycle-test-container

查看容器日志

[root@master]#kubectl exec -it lifecycle-test -- cat /var/log/nginx/message
Hello initContainers
Hello from the postStart handler

init 容器先执行,然后当一个主容器启动后,kubernetes 将立即发送 postStart 事件。

闭容器后查看节点挂载文件

[root@master]#kubectl delete -f post.yaml
pod "lifecycle-test" deleted

[root@node01 ~]#cat /data/volumes/nginx/log/message 
Hello initContainers
Hello from the postStart handler
Hello from the postStop handler

由上可知,当在容器被终结之前,kubernetes 将发送一个 preStop 事件。

重新创建资源,查看容器日志

[root@master]#kubectl apply -f post.yaml
pod/lifesycle-test created
[root@master]#kubectl exec -it lifecycle-test -- cat /var/log/nginx/message
Hello initContainers
Hello from the postStart handler
Hello from the postStop handler
Hello initContainers
Hello from the postStart handler

7. 总结

7.1 探针

探针分为 3 种

  1. livenessProbe(存活探针)∶判断容器是否正常运行,如果失败则杀掉容器(不是 pod),再根据重启策略是否重启容器
  2. readinessProbe(就绪探针)∶判断容器是否能够进入 ready 状态,探针失败则进入 noready 状态,并从 service 的 endpoints 中剔除此容器
  3. startupProbe∶判断容器内的应用是否启动成功,在 success 状态前,其它探针都处于无效状态

7.2 检查方式

检查方式分为 3 种

  1. exec∶使用 command 字段设置命令,在容器中执行此命令,如果命令返回状态码为 0,则认为探测成功
  2. httpget∶通过访问指定端口和 url 路径执行 http get 访问。如果返回的 http 状态码为大于等于 200 且小于 400 则认为成功
  3. tcpsocket∶通过 tcp 连接 pod(IP)和指定端口,如果端口无误且 tcp 连接成功,则认为探测成功

7.3 常用的探针可选参数

行为属性名称 默认值 最小值 备注
initialDelaySeconds 0秒 0秒 探测延迟时长,容器启动后多久开始进行第一次探测工作。
timeoutSeconds 1秒 1秒 探测的超时时长。
periodSeconds 10秒 1秒 探测频度,频率过高会对pod带来较大的额外开销,频率过低则无法及时反映容器产生的错误。
failureThreshold 3 1 处于成功状态时,探测连续失败几次可被认为失败。
successThreshold 1 1 处于失败状态时,探测连续成功几次,被认为成功。

参考:

官方文档


posted @ 2022-07-03 15:37  公博义  阅读(1995)  评论(0编辑  收藏  举报