17、K8S-Pod探测机制之readinessProbe、livenessProbe

Kubernetes学习目录

1、基础知识

1.1、简介

 根据我们对Docker的学习,我们知道,以镜像打包技术为基础的容器技术环境,它运行起来的效果就类
似于一个"黑盒",默认情况下我们不知道里面是一种什么样的运行环境,我们为了实时的监控容器应用环境的
运行状态,容器提供了inspect的方法,让我们主动来采集相关的状态数据,但是这种方式太繁琐了。
 实际上,我们需要一种可以及时的获取容器的各种运行状态数据,所以对于容器任务编排的环境下,他们
都应该考虑到一种场景:主动的将容器运行的相关数据暴露出来 -- 数据暴露接口。常见的就是 包含大量
metric指标数据的API接口。
对于k8s内部的pod环境来说,常见的这些API接口有:
 process health 状态健康检测接口
 metrics 监控指标接口
 readiness 容器可读状态的接口
 liveness 容器存活状态的接口
 tracing 全链路监控的埋点(探针)接口
 logs 容器日志接口

1.2、检测相关属性

1.2.1、LivenessProbe

livenessProbe:存活性探针,用于判断容器是不是健康,如果不满足健康条件,那么 Kubelet 将根据 Pod 中设置的 restartPolicy (重启策略)来判断,
Pod 是否要进行重启操作。LivenessProbe按照配置去探测 ( 进程、或者端口、或者命令执行后是否成功等等),来判断容器是不是正常。如果探测不到,
代表容器不健康(可以配置连续多少次失败才记为不健康),则 kubelet 会杀掉该容器,并根据容器的重启策略做相应的处理。如果未配置存活探针,
则默认容器启动为通过(Success)状态。即探针返回的值永远是 Success。即Success后pod状态是RUNING 参考资料:kubectl explain pod.spec.containers.livenessProbe

1.2.2、ReadinessProbe

readinessProbe 就绪性探针,用于判断容器内的程序是否存活(或者说是否健康),只有程序(服务)正常, 容器开始对外提供网络访问(启动完成并就绪)。
容器启动后按照readinessProbe配置进行探测,无问题后结果为成功即状态为 Success。pod的READY状态为 true,从0/1变为1/1。如果失败继续为0/1
状态为 false。若未配置就绪探针,则默认状态容器启动后为Success。对于此pod、此pod关联的Service资源、EndPoint 的关系也将基于 Pod 的 Ready
状态进行设置,如果 Pod 运行过程中 Ready 状态变为 false,则系统自动从 Service资源 关联的 EndPoint 列表中去除此pod,届时service资源接收到GET请求后,
kube-proxy将一定不会把流量引入此pod中,通过这种机制就能防止将流量转发到不可用的 Pod 上。如果 Pod 恢复为 Ready 状态。将再会被加回 Endpoint 列表。
kube-proxy也将有概率通过负载机制会引入流量到此pod中。 参考资料:kubectl explain pod.spec.containers.ReadnessProbe

1.2.3、StartupProbe

k8s在1.16版本后增加startupProbe探针,主要解决在复杂的程序中readinessProbe、livenessProbe探针无法更好的判断程序是否启动、是否存活。
进而引入startupProbe探针为readinessProbe、livenessProbe探针服务。 startupProbe探针与另两种区别 如果三个探针同时存在,先执行startupProbe探针,其他两个探针将会被暂时禁用,直到pod满足startupProbe探针配置的条件,其他2个探针启动,如果不满足按照规则重启容器 另外两种探针在容器启动后,会按照配置,直到容器消亡才停止探测,而startupProbe探针只是在容器启动后按照配置满足一次后,不在进行后续的探测。 参照资料:kubectl explain pod.spec.containers.startupProbe

1.3、探针类型

1.3.1、ExecAction

直接执行命令,命令成功返回表示探测成功;

1.3.2、TCPSocketAction

端口能正常打开,即成功

1.3.3、HTTPGetAction

向指定的path发HTTP请求,2xx, 3xx的响应码表示成功

1.3.4、总结

注意:每种检测机制都支持这三种探针机制

1.4、相关的属性

spec:
 containers:
 - name: …
   image: …
livenessProbe:
     exec <Object>      # 命令式探针
     httpGet <Object>   # http GET类型的探针
     tcpSocket <Object> # tcp Socket类型的探针
     initialDelaySeconds <integer>  # 发起初次探测请求的延后时长
     periodSeconds <integer>        # 请求周期
     timeoutSeconds <integer>       # 超时时长,默认是1。
     successThreshold <integer>     # 连续成功几次,才表示状态正常,默认值是1
     failureThreshold <integer>     # 连续失败几次,才表示状态异常,默认值是3
注意:
 这里面仅仅罗列的livenessProbe ,readnessProbe 的属性与livenessProbe一样

2、探针简单的入门示例

2.1、exec

2.1.1、存活性探针yaml

cat >pod-health-cmd.yml<<'EOF'
apiVersion: v1
kind: Pod
metadata:
  name: liveness-exec-pod
  namespace: default
spec:
  containers:
  - name: liveness-exec-container
    image: busybox
    imagePullPolicy: IfNotPresent
    command: ["/bin/sh","-c","touch /tmp/healthy; sleep 3; rm -rf /tmp/healthy;sleep 3600"]
    livenessProbe:
      exec:
        command: ["test", "-e","/tmp/healthy"]
      initialDelaySeconds: 1
      periodSeconds: 3
EOF


# 主要的功能,创建容器后,创建一个文件,隔3秒,删除文件,然后开启存活性探针,检测文件是否存在。

2.1.1、运行的效果

[root@master1 deplay]# kubectl apply -f pod-health-cmd.yml && kubectl get pod liveness-exec-pod -w
pod/liveness-exec-pod created
NAME                READY   STATUS              RESTARTS   AGE
liveness-exec-pod   0/1     ContainerCreating   0          0s
liveness-exec-pod   1/1     Running             0          1s
liveness-exec-pod   1/1     Running             1 (0s ago)   43s
liveness-exec-pod   1/1     Running             2 (0s ago)   85s
liveness-exec-pod   1/1     Running             3 (0s ago)   2m7s
liveness-exec-pod   1/1     Running             4 (0s ago)   2m49s
liveness-exec-pod   1/1     Running             5 (0s ago)   3m31s # 重试到第5次后,报循环错误
liveness-exec-pod   0/1     CrashLoopBackOff    5 (0s ago)   4m13s

4、探针重启策略

# 帮助命令参数
kubectl explain pod.spec.restartPolicy

Always:当容器终止退出,总是重启容器,默认策略
OnFailure:当容器异常退出(退出状态码非0)时,才重启容器
Never:当容器终止退出,从不重启容器

5、存活性探针-liveness-实践

cat >/usr/local/bin/demo.py<<'EOF' 
#!/usr/bin/python3
#
from flask import Flask, request, abort, Response, jsonify as flask_jsonify, make_response
import argparse
import sys, os, getopt, socket, json, time

app = Flask(__name__)

@app.route('/')
def index():
    return ('kubernetes pod-test v0.1!! ClientIP: {}, ServerName: {}, '
          'ServerIP: {}!\n'.format(request.remote_addr, socket.gethostname(),
                                  socket.gethostbyname(socket.gethostname())))

@app.route('/hostname')
def hostname():
    return ('ServerName: {}\n'.format(socket.gethostname()))

health_status = {'livez': 'OK', 'readyz': 'OK'}
probe_count = {'livez': 0, 'readyz': 0}

@app.route('/livez', methods=['GET','POST'])
def livez():
    if request.method == 'POST':
        status = request.form['livez']
        health_status['livez'] = status
        return ''

    else:
        if probe_count['livez'] == 0:
            time.sleep(5)
        probe_count['livez'] += 1
        if health_status['livez'] == 'OK':
            return make_response((health_status['livez']), 200)
        else:
            return make_response((health_status['livez']), 506)

@app.route('/readyz', methods=['GET','POST'])
def readyz():
    if request.method == 'POST':
        status = request.form['readyz']
        health_status['readyz'] = status
        return ''

    else:
        if probe_count['readyz'] == 0:
            time.sleep(15)
        probe_count['readyz'] += 1
        if health_status['readyz'] == 'OK':
            return make_response((health_status['readyz']), 200)
        else:
            return make_response((health_status['readyz']), 507)

@app.route('/configs')
def configs():
    return ('DEPLOYENV: {}\nRELEASE: {}\n'.format(os.environ.get('DEPLOYENV'), os.environ.get('RELEASE')))

@app.route("/user-agent")
def view_user_agent():
    # user_agent=request.headers.get('User-Agent')
    return('User-Agent: {}\n'.format(request.headers.get('user-agent')))

def main(argv):
    port = 80
    host = '0.0.0.0'
    debug = False

    if os.environ.get('PORT') is not None:
        port = os.environ.get('PORT')

    if os.environ.get('HOST') is not None:
        host = os.environ.get('HOST')

    try:
        opts, args = getopt.getopt(argv,"vh:p:",["verbose","host=","port="])
    except getopt.GetoptError:
        print('server.py -p <portnumber>')
        sys.exit(2)
    for opt, arg in opts:
        if opt in ("-p", "--port"):
            port = arg
        elif opt in ("-h", "--host"):
            host = arg
        elif opt in ("-v", "--verbose"):
            debug = True

    app.run(host=str(host), port=int(port), debug=bool(debug))


if __name__ == "__main__":
    main(sys.argv[1:])
EOF
关于探针的py测试代码

5.1、exec

5.1.1、存活性探针yml

cat > pod-liveness-exec.yml <<'EOF'
apiVersion: v1
kind: Pod
metadata:
  name: liveness-exec-demo
  namespace: default
spec:
  restartPolicy: OnFailure
  containers: 
  - name: demo
    image: 192.168.10.33:80/k8s/pod_test:v0.1
    imagePullPolicy: IfNotPresent
    livenessProbe:
      exec:
        command: ["test", "-e","/tmp/healthy"]
      initialDelaySeconds: 5
      timeoutSeconds: 1
      periodSeconds: 5
EOF

# 主要作用探测文件是否存在

5.1.2、容器的运行效果

]# kubectl apply -f pod-liveness-exec.yml && kubectl get pods -w
pod/liveness-exec-demo created
NAME                 READY   STATUS              RESTARTS   AGE
liveness-exec-demo   0/1     ContainerCreating   0          0s
liveness-exec-demo   1/1     Running             0          1s
liveness-exec-demo   1/1     Running             1 (1s ago)   46s
liveness-exec-demo   1/1     Running             2 (1s ago)   91s
liveness-exec-demo   1/1     Running             3 (1s ago)   2m16s
liveness-exec-demo   1/1     Running             4 (1s ago)   3m1s
liveness-exec-demo   1/1     Running             5 (0s ago)   3m45s

# 坐这里可以看出,如果存活性探针,探测失败,将会重启pod.

5.2、http

5.2.1、模拟失败示例

cat >liveness-httpget-fail.yml <<'EOF'
apiVersion: v1
kind: Pod
metadata:
  name: liveness-httpget-pod
spec:
  containers:
  - name: liveness-httpget-container
    image: 192.168.10.33:80/k8s/pod_test:v0.1
    ports:
    - name: http
      containerPort: 80
    livenessProbe:
      httpGet:
        port: http
        path: /index.html
      initialDelaySeconds: 1
      periodSeconds: 3
EOF


-----------------------

# 失败周期性的重启pod
]# kubectl apply -f liveness-httpget-fail.yml && kubectl get pods -w
pod/liveness-httpget-pod created
NAME                   READY   STATUS              RESTARTS   AGE
liveness-httpget-pod   0/1     ContainerCreating   0          0s
liveness-httpget-pod   1/1     Running             0          1s
liveness-httpget-pod   1/1     Running             1 (1s ago)   40s

-----------------------
# 失败的日志
]# kubectl describe pod liveness-httpget-pod
Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  22s                default-scheduler  Successfully assigned default/liveness-httpget-pod to node1
  Normal   Pulled     22s                kubelet            Container image "192.168.10.33:80/k8s/pod_test:v0.1" already present on machine
  Normal   Created    22s                kubelet            Created container liveness-httpget-container
  Normal   Started    22s                kubelet            Started container liveness-httpget-container
  Warning  Unhealthy  14s (x3 over 20s)  kubelet            Liveness probe failed: HTTP probe failed with statuscode: 404
  Normal   Killing    14s                kubelet            Container liveness-httpget-container failed liveness probe, will be restarted

5.2.2、模拟成功示例

cat >liveness-httpget-suc.yml <<'EOF'
apiVersion: v1
kind: Pod
metadata:
  name: liveness-httpget-demo
  namespace: default
spec:
  containers:
  - name: demo
    image: 192.168.10.33:80/k8s/pod_test:v0.1
    imagePullPolicy: IfNotPresent
    livenessProbe:
      httpGet:
        path: '/hostname'
        port: 80
        scheme: HTTP
      initialDelaySeconds: 5
      periodSeconds: 3
EOF


---------------------------
]# kubectl apply -f liveness-httpget-suc.yml && kubectl get pods -w
pod/liveness-httpget-demo created
NAME                    READY   STATUS              RESTARTS   AGE
liveness-httpget-demo   0/1     ContainerCreating   0          0s
liveness-httpget-demo   1/1     Running             0          1s


# 持续探测成功
]# kubectl logs --tail 5  liveness-httpget-demo 
10.244.3.1 - - [19/Mar/2023 09:49:25] "GET /hostname HTTP/1.1" 200 -
10.244.3.1 - - [19/Mar/2023 09:49:28] "GET /hostname HTTP/1.1" 200 -
10.244.3.1 - - [19/Mar/2023 09:49:31] "GET /hostname HTTP/1.1" 200 -
10.244.3.1 - - [19/Mar/2023 09:49:34] "GET /hostname HTTP/1.1" 200 -
10.244.3.1 - - [19/Mar/2023 09:49:37] "GET /hostname HTTP/1.1" 200 -

5.3、tcpsocket

5.3.1、模拟失败示例

cat >liveness-tcpsocket-fail.yml<<'EOF'
apiVersion: v1
kind: Pod
metadata:
  name: liveness-tcpsocket-demo
  namespace: default
spec:
  containers:
  - name: demo
    image: 192.168.10.33:80/k8s/pod_test:v0.1
    imagePullPolicy: IfNotPresent
    ports:
    - name: http
      containerPort: 801
    livenessProbe:
      tcpSocket:
        port: http
      periodSeconds: 5
      initialDelaySeconds: 5
EOF

-------------------

# 先运行容器,然后存活探测,如果失败,则重启pod
]# kubectl apply -f liveness-tcpsocket-fail.yml && kubectl get pods -w -o wide
pod/liveness-tcpsocket-demo created
NAME                      READY   STATUS              RESTARTS   AGE   IP       NODE    NOMINATED NODE   READINESS GATES
liveness-tcpsocket-demo   0/1     ContainerCreating   0          0s    <none>   node1   <none>           <none>
liveness-tcpsocket-demo   1/1     Running             0          0s    10.244.3.86   node1   <none>           <none>
liveness-tcpsocket-demo   1/1     Running             1 (0s ago)   45s   10.244.3.86   node1   <none>           <none>

-------------------

]# kubectl describe pod liveness-tcpsocket-demo 
Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  44s                default-scheduler  Successfully assigned default/liveness-tcpsocket-demo to node1
  Normal   Pulled     45s                kubelet            Container image "192.168.10.33:80/k8s/pod_test:v0.1" already present on machine
  Normal   Created    45s                kubelet            Created container demo
  Normal   Started    45s                kubelet            Started container demo
  Warning  Unhealthy  30s (x3 over 40s)  kubelet            Liveness probe failed: dial tcp 10.244.3.86:801: connect: connection refused
  Normal   Killing    30s                kubelet            Container demo failed liveness probe, will be restarted

5.3.2、模板成功示例

cat >liveness-tcpsocket-suc.yml<<'EOF'
apiVersion: v1
kind: Pod
metadata:
  name: liveness-tcpsocket-demo
  namespace: default
spec:
  containers:
  - name: demo
    image: 192.168.10.33:80/k8s/pod_test:v0.1
    imagePullPolicy: IfNotPresent
    ports:
    - name: http
      containerPort: 80
    livenessProbe:
      tcpSocket:
        port: http
      periodSeconds: 5
      initialDelaySeconds: 5
EOF

-------------------

]# kubectl apply -f liveness-tcpsocket-suc.yml && kubectl get pods -w -o wide
pod/liveness-tcpsocket-demo created
NAME                      READY   STATUS              RESTARTS   AGE   IP       NODE    NOMINATED NODE   READINESS GATES
liveness-tcpsocket-demo   0/1     ContainerCreating   0          0s    <none>   node1   <none>           <none>
liveness-tcpsocket-demo   1/1     Running             0          1s    10.244.3.84   node1   <none>           <none>

6、就绪性探针-readness-实践

6.1、http

6.1.1、模拟失败示例

cat >readiness-httpget-fail.yml<<'EOF'
apiVersion: v1
kind: Pod
metadata:
  name: readiness-httpget-pod
spec:
  containers:
  - name: readiness-httpget-container
    image: busybox
    ports:
    - name: http
      containerPort: 80
    readinessProbe:
      httpGet:
        port: http
        path: /index.html
      initialDelaySeconds: 1
      periodSeconds: 3
EOF

--------------------------

# 镜像没有/index.html可访问,所以容器创建不会成功,不断尝试重启创建容器
]# kubectl apply -f readiness-httpget-fail.yml && kubectl get pods -w
pod/readiness-httpget-pod created
NAME                    READY   STATUS              RESTARTS   AGE
readiness-httpget-pod   0/1     ContainerCreating   0          0s
readiness-httpget-pod   0/1     Completed           0          3s
readiness-httpget-pod   0/1     Completed           1 (3s ago)   6s
readiness-httpget-pod   0/1     CrashLoopBackOff    1 (2s ago)   7s

--------------------------

Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  97s                default-scheduler  Successfully assigned default/readiness-httpget-pod to node1
  Normal   Pulled     95s                kubelet            Successfully pulled image "busybox" in 2.408374442s (2.408383198s including waiting)
  Normal   Pulled     93s                kubelet            Successfully pulled image "busybox" in 2.399310727s (2.399315236s including waiting)
  Normal   Pulled     75s                kubelet            Successfully pulled image "busybox" in 2.446511445s (2.446515052s including waiting)
  Normal   Pulling    49s (x4 over 98s)  kubelet            Pulling image "busybox"
  Normal   Created    47s (x4 over 95s)  kubelet            Created container readiness-httpget-container
  Normal   Started    47s (x4 over 95s)  kubelet            Started container readiness-httpget-container
  Normal   Pulled     47s                kubelet            Successfully pulled image "busybox" in 2.409887055s (2.40989074s including waiting)
  Warning  BackOff    29s (x9 over 92s)  kubelet            Back-off restarting failed container

6.1.2、模拟成功示例

cat > readiness-httpget-suc.yml<<'EOF'
apiVersion: v1
kind: Pod
metadata:
  name: readiness-httpget-demo
  namespace: default
spec:
  containers:
  - name: demo
    image: 192.168.10.33:80/k8s/pod_test:v0.1
    imagePullPolicy: IfNotPresent
    readinessProbe:
      httpGet:
        path: '/readyz'
        port: 80
        scheme: HTTP
      initialDelaySeconds: 15
      timeoutSeconds: 2
      periodSeconds: 5
      failureThreshold: 3
  restartPolicy: Always
EOF

-----------------------------

]# kubectl apply -f readiness-httpget-suc.yml && kubectl get pods -w -o wide
pod/readiness-httpget-demo created
NAME                     READY   STATUS              RESTARTS   AGE   IP       NODE    NOMINATED NODE   READINESS GATES
readiness-httpget-demo   0/1     ContainerCreating   0          0s    <none>   node1   <none>           <none>
readiness-httpget-demo   0/1     Running             0          1s    10.244.3.81   node1   <none>           <none>
readiness-httpget-demo   1/1     Running             0          31s   10.244.3.81   node1   <none>           <none>

-----------------------------

# 模拟失败
]# curl -XPOST -d 'readyz=FAIL'  http://10.244.3.81:80/readyz

# 械拟成功
]# curl -XPOST -d 'readyz=OK'  http://10.244.3.81:80/readyz

6.2、tcpsocket

6.2.1、模拟失败示例

# 只需要修改为不存在的端口

cat > readiness-tcpsocket.yml<<'EOF'
apiVersion: v1
kind: Pod
metadata:
  name: readiness-tcpsocket-pod
spec:
  containers:
  - name: readiness-tcpsocket-pod
    image: 192.168.10.33:80/k8s/my_nginx:v1
    readinessProbe:
      tcpSocket:
        port: 801
      initialDelaySeconds: 5
      periodSeconds: 10
    livenessProbe:
      tcpSocket:
        port: 801
      initialDelaySeconds: 15
      periodSeconds: 20
EOF

# 注意:先执行就绪探针,成功后,创建pod成功,再执行存活性探针

-----------

# 先就绪,再探测
]# kubectl apply -f readiness-tcpsocket.yml && kubectl get pods -w -o wide
pod/readiness-tcpsocket-pod created
NAME                      READY   STATUS              RESTARTS   AGE   IP       NODE    NOMINATED NODE   READINESS GATES
readiness-tcpsocket-pod   0/1     ContainerCreating   0          0s    <none>   node1   <none>           <none>
readiness-tcpsocket-pod   0/1     Running             0          1s    10.244.3.83   node1   <none>           <none>
readiness-tcpsocket-pod   0/1     Running             1 (1s ago)   61s   10.244.3.83   node1   <none>           <none>

-----------

# 失败,不断尝试重新创建容器
]# kubectl describe pod readiness-tcpsocket-pod
Events:
  Type     Reason     Age               From               Message
  ----     ------     ----              ----               -------
  Normal   Scheduled  60s               default-scheduler  Successfully assigned default/readiness-tcpsocket-pod to node1
  Normal   Pulled     1s (x2 over 60s)  kubelet            Container image "192.168.10.33:80/k8s/my_nginx:v1" already present on machine
  Normal   Created    1s (x2 over 60s)  kubelet            Created container readiness-tcpsocket-pod
  Warning  Unhealthy  1s (x7 over 51s)  kubelet            Readiness probe failed: dial tcp 10.244.3.83:801: connect: connection refused
  Warning  Unhealthy  1s (x3 over 41s)  kubelet            Liveness probe failed: dial tcp 10.244.3.83:801: connect: connection refused
  Normal   Killing    1s                kubelet            Container readiness-tcpsocket-pod failed liveness probe, will be restarted
  Normal   Started    0s (x2 over 60s)  kubelet            Started container readiness-tcpsocket-pod

6.2.2、模拟成功示例

cat > readiness-tcpsocket.yml<<'EOF'
apiVersion: v1
kind: Pod
metadata:
  name: readiness-tcpsocket-pod
spec:
  containers:
  - name: readiness-tcpsocket-pod
    image: 192.168.10.33:80/k8s/my_nginx:v1
    readinessProbe:
      tcpSocket:
        port: 80
      initialDelaySeconds: 5
      periodSeconds: 10
    livenessProbe:
      tcpSocket:
        port: 80
      initialDelaySeconds: 15
      periodSeconds: 20
EOF

# 注意:先执行就绪探针,成功后,创建pod成功,再执行存活性探针

-----------

# 先就绪,再探测
]# kubectl apply -f readiness-tcpsocket.yml && kubectl get pods -w -o wide
pod/readiness-tcpsocket-pod created
NAME                      READY   STATUS              RESTARTS   AGE   IP       NODE    NOMINATED NODE   READINESS GATES
readiness-tcpsocket-pod   0/1     ContainerCreating   0          1s    <none>   node1   <none>           <none>
readiness-tcpsocket-pod   0/1     Running             0          1s    10.244.3.82   node1   <none>           <none>
readiness-tcpsocket-pod   1/1     Running             0          11s   10.244.3.82   node1   <none>           <none>
posted @ 2023-03-19 18:36  小粉优化大师  阅读(1626)  评论(0编辑  收藏  举报