Pod进阶篇-Pod生命周期和健康探测以及startupProbe（6）

一.Pod容器探测和钩子

1.1 容器钩子：postStart和preStop

　　postStart：容器创建成功后，运行前的任务，用于资源部署、环境准备等。

　　 preStop：在容器被终止前的任务，用于优雅关闭应用程序、通知其他系统等。

[root@master node]# kubectl explain pods.spec.containers.lifecycle
FIELDS:
   postStart    <Object>
   preStop    <Object>

[root@master node]# kubectl explain pods.spec.containers.lifecycle.postStart
FIELDS:
   exec    <Object>  # 执行命令
             One and only one of the following should be specified. Exec specifies the action to take.

   httpGet    <Object>  # 调用http
        HTTPGet specifies the http request to perform.
   tcpSocket    <Object>  # 通过tcp模式
        TCPSocket specifies an action involving a TCP port. TCP hooks not yet
     supported

演示 postStart 和 preStop 用法 
...... 
containers: 
- image: sample:v2 
 name: war 
 lifecycle： 
 postStart: 
 exec: 
 command:  # 拷贝文件到/app目录
 - “cp” 
 - “/sample.war” 
 - “/app” 

 prestop: 
   httpGet: 
     host: monitor.com 
       path: /waring 
       port: 8080 
       scheme: HTTP 
...... 
以上示例中，定义了一个 Pod，包含一个 JAVA 的 web 应用容器，其中设置了 PostStart 和PreStop 回调函数。即在容器创建成功后，复制/sample.war 到/app 文件夹中。
而在容器终止之前，发送 HTTP 请求到 http://monitor.com:8080/waring，即向监控系统发送警告。

1.2 优雅的删除资源对象

优雅的删除资源对象 
当用户请求删除含有 pod 的资源对象时（如 RC、deployment 等），K8S 为了让应用程序优雅关闭（即让应用程序完成正在处理的请求后，再关闭软件），K8S 提供两种信息通知： 
1）、默认：K8S 通知 node 执行 docker stop 命令，docker 会先向容器中 PID 为 1 的进程发送系统信号 SIGTERM，然后等待容器中的应用程序终止执行，如果等待时间达到设定的超时时间，或者默认超时时间（30s），
会继续发送 SIGKILL 的系统信号强行 kill 掉进程。

2）、使用 pod 生命周期（利用 PreStop 回调函数），它执行在发送终止信号之前。 默认情况下，所有的删除操作的优雅退出时间都在 30 秒以内。kubectl delete 命令支持--graceperiod=的选项，以运行用户来修改默认值。
0 表示删除立即执行，并且立即从 API 中删除 pod。在节点上，被设置了立即结束的的 pod，仍然会给一个很短的优雅退出时间段，才会开始被强制杀死。

示例：

spec: 
 containers: 
 - name: nginx-demo 
 image: centos:nginx 
 lifecycle: 
   preStop: 
     exec: 
     # nginx -s quit gracefully terminate while SIGTERM triggers a quick exit 
     command: ["/usr/local/nginx/sbin/nginx","-s","quit"] 
 ports: 
   - name: http 
      containerPort: 80

1.3 存活性探测 livenessProbe和就绪性探测readinessProbe

livenessProbe：存活性探测

    许多应用程序经过长时间运行，最终过渡到无法运行的状态，除了重启，无法恢复。通常情况下，K8S 会发现应用程序已经终止，然后重启应用程序 pod。有时应用程序可能因为某些原因（后端服务故障等）导致暂时无法对外提供服务，
但应用软件没有终止，导致 K8S 无法隔离有故障的pod，调用者可能会访问到有故障的 pod，导致业务不稳定。K8S 提供 livenessProbe 来检测容器是否正常运行，并且对相应状况进行相应的补救措施。

readinessProbe：就绪性探测

    在没有配置 readinessProbe 的资源对象中，pod 中的容器启动完成后，就认为 pod 中的应用程序可以对外提供服务，该 pod 就会加入相对应的 service，对外提供服务。但有时一些应用程序启动后，
需要较长时间的加载才能对外服务，如果这时对外提供服务，执行结果必然无法达到预期效果，影响用户体验。比如使用 tomcat 的应用程序来说，并不是简单地说 tomcat 启动成功就可以对外提供服务的，还需要等待 spring 容器初始化，
数据库连接上等等。

目前 LivenessProbe 和 ReadinessProbe 两种探针都支持下面三种探测方法： 
1、ExecAction：在容器中执行指定的命令，如果执行成功，退出码为 0 则探测成功。 
2、TCPSocketAction：通过容器的 IP 地址和端口号执行 TCP 检 查，如果能够建立 TCP 连接，则表明容器健康。 
3、HTTPGetAction：通过容器的 IP 地址、端口号及路径调用 HTTP Get 方法，如果响应的状态码大于等于 200 且小于 400，则认为容器健康 

探针探测结果有以下值： 
1、Success：表示通过检测。 
2、Failure：表示未通过检测。 
3、Unknown：表示检测没有正常进行。

[root@master node]# kubectl explain pods.spec.containers.livenessProbe
Pod 探针相关的属性： 
探针(Probe)有许多可选字段，可以用来更加精确的控制 Liveness 和 Readiness 两种探针的行为 
 initialDelaySeconds： Pod 启动后首次进行检查的等待时间，单位“秒”。 
 periodSeconds： 检查的间隔时间，默认为 10s，单位“秒”。 
 timeoutSeconds： 探针执行检测请求后，等待响应的超时时间，默认为 1s，单位“秒”。 
 successThreshold：连续探测几次成功，才认为探测成功，默认为 1，在 Liveness 探针中必须为 1，最小值为 1。 
 failureThreshold： 探测失败的重试次数，重试一定次数后将认为失败，在 readiness 探针中，Pod 会被标记为未就绪，默认为 3，最小值为 1 
 
两种探针区别： 
ReadinessProbe 和 livenessProbe 可以使用相同探测方式，只是对 Pod 的处置方式不同： 
readinessProbe 当检测失败后，将 Pod 的 IP:Port 从对应的 EndPoint 列表中删除。 
livenessProbe 当检测失败后，将杀死容器并根据 Pod 的重启策略来决定作出对应的措施。

1.3.1 Pod探针使用示例：

LivenessProbe探针使用示例
(1)通过exec方式做健康探测

[root@master tanzhen]# cat liveness-exec.yaml 
apiVersion: v1 
kind: Pod 
metadata: 
  name: liveness-exec 
  labels: 
    app: liveness 
spec: 
  containers: 
  - name: liveness 
    image: busybox 
    args: #创建测试探针探测的文件 
    - /bin/sh 
    - -c 
    - touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600 
    livenessProbe: 
      initialDelaySeconds: 10 #延迟检测时间 
      periodSeconds: 5 #检测时间间隔 
      exec: 
        command: 
        - cat 
        - /tmp/healthy
[root@master tanzhen]# kubectl apply -f liveness-exec.yaml 
pod/liveness-exec created
# 查看创建日志
[root@master tanzhen]# kubectl describe pods liveness-exec
[root@master tanzhen]# kubectl get pods
NAME                                       READY   STATUS    RESTARTS   AGE
liveness-exec                              1/1     Running   3          5m47s


容器启动设置执行的命令： 
 /bin/sh -c "touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600" 
容器在初始化后，首先创建一个 /tmp/healthy 文件，然后执行睡眠命令，睡眠 30 秒，到时间后执行删除 /tmp/healthy 文件命令。而设置的存活探针检检测方式为执行 shell 命令，
用 cat 命令输出 healthy 文件的内容，如果能成功执行这条命令，存活探针就认为探测成功，否则探测失败。在前 30 秒内，由于文件存在，所以存活探针探测时执行 cat /tmp/healthy 命令成功执行。
30 秒后 healthy 文件被删除，所以执行命令失败，Kubernetes 会根据 Pod 设置的重启策略来判断，是否重启 Pod。

(2) 通过HTTP方式做健康检测

# 存活性探测
[root@master tanzhen]# cat liveness-http.yaml 
apiVersion: v1 
kind: Pod 
metadata: 
 name: liveness-http 
 labels: 
 test: liveness 
spec: 
 containers: 
 - name: liveness 
   image: mydlqclub/springboot-helloworld:0.0.1 
   livenessProbe: 
     initialDelaySeconds: 20 #延迟加载时间 
     periodSeconds: 5 #重试时间间隔 
     timeoutSeconds: 10 #超时时间设置 
     httpGet: 
       scheme: HTTP 
       port: 8081 
       path: /actuator/health
       # host: pod ip  # 默认是pod ip，所以不用写
http://pod ip:8081/actuator/health

   上面 Pod 中启动的容器是一个 SpringBoot 应用，其中引用了 Actuator 组件，提供了 /actuator/health 健康检查地址，存活探针可以使用 HTTPGet 方式向服务发起请求，
请求 8081 端口的 /actuator/health 路径来进行存活判断：
 
任何大于或等于 200 且小于 400 的代码表示探测成功。 
任何其他代码表示失败。 
 
如果探测失败，则会杀死 Pod 进行重启操作。 
 
httpGet 探测方式有如下可选的控制字段: 
scheme: 用于连接 host 的协议，默认为 HTTP。 
host：要连接的主机名，默认为 Pod IP，可以在 http request head 中设置 host 头部。 
port：容器上要访问端口号或名称。 
path：http 服务器上的访问 URI。 
httpHeaders：自定义 HTTP 请求 headers，HTTP 允许重复 headers。

[root@master tanzhen]# kubectl get pods
NAME READY STATUS RESTARTS AGE

liveness-http 0/1 ContainerCreating 0 8s

[root@master tanzhen]# kubectl describe pods liveness-http

Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 60s default-scheduler Successfully assigned kube-system/liveness-http to node3
Normal Pulling <invalid> kubelet Pulling image "mydlqclub/springboot-helloworld:0.0.1" # 拉取镜像中

Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 3m40s default-scheduler Successfully assigned kube-system/liveness-http to node3
Normal Pulling <invalid> kubelet Pulling image "mydlqclub/springboot-helloworld:0.0.1"
Normal Pulled <invalid> kubelet Successfully pulled image "mydlqclub/springboot-helloworld:0.0.1" in 1m43.811208938s
Normal Created <invalid> kubelet Created container liveness
Normal Started <invalid> kubelet Started container liveness

liveness-http 1/1 Running 0 4m42s

[root@master tanzhen]# curl http://10.244.135.17:8081/actuator/health
{"status":"UP"}

就绪性探测

apiVersion: v1 
kind: Pod 
metadata: 
  name: liveness-http 
  labels: 
    test: liveness 
spec: 
 containers: 
 - name: liveness 
   image: mydlqclub/springboot-helloworld:0.0.1 
   livenessProbe: 
     initialDelaySeconds: 20 #延迟加载时间 
     periodSeconds: 5 #重试时间间隔 
     timeoutSeconds: 10 #超时时间设置 
     httpGet: 
       scheme: HTTP 
       port: 8081 
       path: /actuator/health
   readinessProbe:
     initialDelaySeconds: 20
     periodSeconds: 5
     timeoutSeconds: 10
     httpGet:
       scheme: HTTP
       port: 8081
       path: /actuator/health

就绪性探测

如果端口修改成8082，查看日志就会发现探测失败，kubectl get pods上查看就会重启

(3) 通过tcp的方式做健康探测

[root@master tanzhen]# cat liveness-tcp.yaml 
apiVersion: v1 
kind: Pod 
metadata: 
 name: liveness-tcp 
 labels: 
  app: liveness 
spec: 
 containers: 
 - name: liveness 
   image: nginx 
   livenessProbe: 
     initialDelaySeconds: 15 
     periodSeconds: 20 
     tcpSocket: 
       port: 80

TCP 检查方式和 HTTP 检查方式非常相似，在容器启动initialDelaySeconds 参数设定的时间后，kubelet 将发送第一个 livenessProbe 探针，尝试连接容器的 80 端口，如果连接失败则将杀死 Pod 重启容器。

readinessProbe 就绪性探针使用

Pod 的 ReadinessProbe 探针使用方式和 LivenessProbe 探针探测方法一样，也是支持三种，只是一个是用于探测应用的存活，一个是判断是否对外提供流量的条件。
这里用一个 Springboot 项目，设置 ReadinessProbe 探测 SpringBoot 项目的 8081 端口下的 /actuator/health 接口，如果探测成功则代表内部程序以及启动，就开放对外提供接口访问，
否则内部应用没有成功启动，暂不对外提供访问，直到就绪探针探测成功。

[root@master tanzhen]# cat readness-exec.yaml 
apiVersion: v1 
kind: Service 
metadata: 
  name: springboot 
  labels: 
    app: springboot 
spec: 
  type: NodePort
  ports: 
  - name: server
    port: 8080
    targetPort: 8080
    nodePort: 31180
  - name: management 
    port: 8081 
    targetPort: 8081 
    nodePort: 31181 
  selector: 
    app: springboot 
--- 
apiVersion: v1 
kind: Pod 
metadata: 
  name: springboot 
  labels: 
    app: springboot 
spec: 
  containers: 
  - name: springboot 
    image: mydlqclub/springboot-helloworld:0.0.1 
    ports: 
    - name: server 
      containerPort: 8080 
    - name: management 
      containerPort: 8081 
    readinessProbe: 
      initialDelaySeconds: 20 
      periodSeconds: 5 
      timeoutSeconds: 10 
      httpGet: 
        scheme: HTTP 
        port: 8081 
        path: /actuator/health

(3) ReadinessProbe + LivenessProbe 配合使用示例

一般程序中需要设置两种探针结合使用，并且也要结合实际情况，来配置初始化检查时间和检测间隔，生产环境中经常使用
示例：

apiVersion: v1 
kind: Service 
metadata: 
  name: springboot 
  labels: 
    app: springboot 
spec: 
  type: NodePort
  ports: 
  - name: server
    port: 8080
    targetPort: 8080
    nodePort: 31180
  - name: management 
    port: 8081 
    targetPort: 8081 
    nodePort: 31181 
  selector: 
    app: springboot 
--- 
apiVersion: v1 
kind: Pod 
metadata: 
  name: springboot 
  labels: 
    app: springboot 
spec: 
  replicas: 1
  selector:
    matchLabels:
      app: springboot
  template:
    metadata:
      name: springboot
      labels:
        app: springboot
spec:
  containers: 
  - name: readiness
    image: mydlqclub/springboot-helloworld:0.0.1 
    ports: 
    - name: server 
      containerPort: 8080 
    - name: management 
      containerPort: 8081 
    readinessProbe: 
      initialDelaySeconds: 20 
      periodSeconds: 5 
      timeoutSeconds: 10 
      httpGet: 
        scheme: HTTP 
        port: 8081 
        path: /actuator/health 
    livenessProbe:
      initialDelaySeconds: 30
      periodSeconds: 10
      timeoutSeconds: 5
      httpGet:
        scheme: HTTP
        port: 8081
        path: /actuator/health

二、kubernetes 启动探针startupProbe

2.1 kubernetes的三种探针

livenessProbe：用于探测容器是否运行。如果存活探测失败，则 kubelet 会杀死容器，并且容器将受到其重启策略的影响决定是否重启。如果容器不提供存活探针，则默认状态为 Success。

readinessProbe：一般用于探测容器内的程序是否健康，容器是否准备好服务请求。如果就绪探测失败，endpoint 将从与 Pod 匹配的所有 Service 的端点中删除该 Pod 的 IP 地址。初始延迟之前的就绪状态默认为 Failure。
如果容器不提供就绪探针，则默认状态为 Success。

startupProbe: 探测容器中的应用是否已经启动。如果提供了启动探测(startup probe)，则禁用所有其他探测，直到它成功为止。如果启动探测失败，kubelet 将杀死容器，容器服从其重启策略进行重启。
如果容器没有提供启动探测，则默认状态为成功 Success。 

可以自定义在 pod 启动时是否执行这些检测，如果不设置，则检测结果均默认为通过，如果设置，则顺序为startupProbe>readinessProbe>livenessProbe。

2.2 startupProbe介绍---为什么要使用startupProbe

在 k8s 中，通过控制器管理 pod，如果更新 pod 的时候，会创建新的 pod，删除老的 pod，但是如果新的 pod 创建了，pod 里的容器还没完成初始化，老的 pod 就被删除了，会导致访问 service 或者
ingress 时候，访问到的 pod 是有问题的，所以 k8s 就加入了一些存活性探针：livenessProbe、就绪性探针 readinessProbe 以及这节课要介绍的启动探针 startupProbe。 
 
startupProbe 是在 k8s v1.16 加入了 alpha 版，官方对其作用的解释是： 
 
Indicates whether the application within the Container is started. All other probes are 
disabled if a startup probe is provided, until it succeeds. If the startup probe fails, the 
kubelet kills the Container, and the Container is subjected to its restart policy. If a 
Container does not provide a startup probe, the default state is Success 
 
翻译：判断容器内的应用程序是否已启动。如果提供了启动探测，则禁用所有其他探测，直到它成功为止。如果启动探测失败，kubelet 将杀死容器，容器将服从其重启策略。如果容器没有提供启动探测，则默认状态为成功。 
 
注意：不要将 startupProbe 和 readinessProbe 混淆。 
只有startupProbe 探测成功才会执行readinessProbe

2.3 什么时候使用startupProbe?

　　正常情况下，我们会在 pod template 中配置 livenessProbe 来探测容器是否正常运行，如果异常则会触发 restartPolicy 重启容器（因为默认情况下 restartPolicy 设置的是 always）

livenessProbe: 
 httpGet: 
 path: /test 
 prot: 80 
 failureThreshold: 1 
 initialDelay：10 
 periodSeconds: 10 
 
上面配置的意思是容器启动 10s 后每 10s 检查一次，允许失败的次数是 1 次。如果失败次数超过 1 则会触发 restartPolicy。

但是有时候会存在特殊情况，比如服务 A 启动时间很慢，需要 60s。这个时候如果还是用上面的探针就会进入死循环，因为上面的探针 10s 后就开始探测，这时候我们服务并没有起来，发现探测失败就会触发 restartPolicy。这时候有的朋友可能会想到把 initialDelay 调成 60s 不就可以了？但是我们并不能保证这个服务每次起来都是 60s，假如新的版本起来要 70s，甚至更多的时间，我们就不好控制了。有的朋友可能还会想到把失败次数增加，比如下面配置：

livenessProbe: 
 httpGet: 
 path: /test 
 prot: 80 
 failureThreshold: 5 
 initialDelay：60 
 periodSeconds: 10

这在启动的时候是可以解决我们目前的问题，但是如果这个服务挂了呢？如果 failureThreshold=1 则 10s 后就会报警通知服务挂了，如果设置了 failureThreshold=5，那么就需要 5*10s=50s 的时间，在现在大家追求快速发现、快速定位、快速响应的时代是不被允许的。

在这时候我们把 startupProbe 和 livenessProbe 结合起来使用就可以很大程度上解决我们的问题。

livenessProbe: 
  httpGet: 
    path: /test 
    prot: 80 
  failureThreshold: 1 
  initialDelay：10 
  periodSeconds: 10 
startupProbe: 
  httpGet: 
    path: /test 
    prot: 80 
 failureThreshold: 10 
 initialDelay：10 
 periodSeconds: 10 

上面的配置是只有 startupProbe 探测成功后再交给 livenessProbe。我们 startupProbe 配置的是10*10s，也就是说只要应用在 100s 内启动都是 OK 的，而且应用挂掉了 10s 就会发现问题

其实这种还是不能确定具体时间，只能给出一个大概的范围。我个人认为对服务启动时间的影响因素太多了，有可能是应用本身，有可能是外部因素，比如主机性能等等。我们只有在最大程度上追求高效、稳定，但是我们不能保证 100%稳定，像阿里这样的大企业对外宣称的也是 5 个 9，6 个 9 的稳定率，如果出问题了，不好意思你恰恰不在那几个 9 里面，所以我们自己要做好监控有效性，告警的及时性，响应的快速性，处理的高效性。

2.4 k8s的LivenessProbe和ReadinessProbe的启动顺序问题

LivenessProbe 会导致 pod 重启，ReadinessProbe 只是不提供服务 
 
我们最初的理解是 LivenessProbe 会在 ReadinessProbe 成功后开始检查，但事实并非如此。 
 
kubelet 使用存活探测器来知道什么时候要重启容器。 例如，存活探测器可以捕捉到死锁（应用程序在运行，但是无法继续执行后面的步骤）。 这样的情况下重启容器有助于让应用程序在有问题的情况下可用。 
 
kubelet 使用就绪探测器可以知道容器什么时候准备好了并可以开始接受请求流量， 当一个 Pod 内的所有容器都准备好了，才能把这个 Pod 看作就绪了。 这种信号的一个用途就是控制哪个 Pod 作为 Service 的后端。 
在 Pod 还没有准备好的时候，会从 Service 的负载均衡器中被剔除的。 
 
kubelet 使用启动探测器(startupProbe)可以知道应用程序容器什么时候启动了。 如果配置了这类探测器，就可以控制容器在启动成功后再进行存活性和就绪检查， 确保这些存活、就绪探测器不会影响应用程序的启动。 
这可以用于对慢启动容器进行存活性检测，避免它们在启动运行之前就被杀掉。 
 
真正的启动顺序 
https://github.com/kubernetes/kubernetes/issues/60647

https://github.com/kubernetes/kubernetes/issues/27114 
 
https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readinessstartup-probes/#define-readiness-pro

Liveness probes 并不会等到 Readiness probes 成功之后才运行
根据上面的官方文档，Liveness 和 readiness 应该是某种并发的关系

posted on 2022-07-26 17:48 杨梅冲阅读(597) 评论(0) 编辑收藏举报