Kubernetes进阶实战读书笔记:POD对象的生命周期(探针检测)
一、存活性检测(设置exec探针)
它只有一个可用属性 "command",用于制定要执行的命令、下面订一张资源清单liveness-exec.yaml
1、资源清单
[root@master chapter4]# cat liveness-exec.yaml apiVersion: v1 kind: Pod metadata: labels: test: liveness-exec name: liveness-exec spec: containers: - name: liveness-demo image: busybox args: - /bin/sh - -c - touch /tmp/healthy; sleep 60; rm -rf /tmp/healthy; sleep 600 livenessProbe: exec: command: - test - -e - /tmp/healthy
上面的资源清单中定义了一个pod对象,基于busybox镜像启动一个运行"touch /tmp/healthy; sleep 60; rm -rf /tmp/healthy; sleep 600" 命令的容器
此命令在容器启动时创建/tmp/healthy"文件,并于60秒之后将其删除。存活性探针运行"test -e /tmp/healthy" 命令检查文件的存在性,若文件存在则返回状态码0,表示成功通过测试
2、运行
首先执行如下命令,创建pod对象liveness-exec
[root@master chapter4]# kubectl apply -f liveness-exec.yaml pod/liveness-exec created [root@master chapter4]# kubectl get pods liveness-exec NAME READY STATUS RESTARTS AGE liveness-exec 1/1 Running 0 42s
3、验证效果
在60秒之内使用"kubectl describe pod liveness-exec"查看其详细信息,其存活性探测不会出现错误。而超过60秒之后,再次运行 查看其详细信息可以发现,存活性探测出现了故障,并且隔更长一段时间之后再查看甚至还可以看到容器重启的相关信息
[root@master chapter4]# kubectl describe pod liveness-exec Name: liveness-exec PodScheduled True ...... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Killing 3m12s (x3 over 7m32s) kubelet, node2 Container liveness-demo failed liveness probe, will be restarted Normal Pulling 2m41s (x4 over 9m16s) kubelet, node2 Pulling image "busybox" Normal Pulled 2m26s (x4 over 8m58s) kubelet, node2 Successfully pulled image "busybox"
另外,输出信息的"Conditions" 一段中还清晰地显示了容器健康状态监测及状态变化的相关信息:容器当前处于"Running "状态,但是前一次是为"Terminated",原因是退出码为137的错误信息,它表示进程是被外部信号所终止的,137事实上是由两部分数字之和生成的:128+signum,其中signum是导致进程终止的信号的数字标识,9表示SIGKILL,这意味着进程是被强行终止的
[root@master chapter4]# kubectl describe pod liveness-exec Name: liveness-exec ...... Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True State: Running Reason: CrashLoopBackOff Last State: Terminated Reason: Error Exit Code: 137 Started: Tue, 09 Jun 2020 11:35:32 +0800 Finished: Tue, 09 Jun 2020 11:37:26 +0800 Ready: False Restart Count: 26 Liveness: exec [test -e /tmp/healthy] delay=0s timeout=1s period=10s #success=1 #failure=3
待容器重启完成后再次查看,容器已经处于正常运行状态,直到文件再次被删除,存活性探测失败而重启。从下面的命令显示可以看出在4分钟内已然重启了两次
[root@master chapter4]# kubectl get pods liveness-exec NAME READY STATUS RESTARTS AGE liveness-exec 1/1 Running 4 9m14s [root@master chapter4]# kubectl describe pod liveness-exec Name: liveness-exec ...... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled <unknown> default-scheduler Successfully assigned default/liveness-exec to node2 Normal Created 4m36s (x3 over 8m58s) kubelet, node2 Created container liveness-demo Normal Started 4m35s (x3 over 8m57s) kubelet, node2 Started container liveness-demo Warning Unhealthy 3m12s (x9 over 7m52s) kubelet, node2 Liveness probe failed: Normal Killing 3m12s (x3 over 7m32s) kubelet, node2 Container liveness-demo failed liveness probe, will be restarted Normal Pulling 2m41s (x4 over 9m16s) kubelet, node2 Pulling image "busybox" Normal Pulled 2m26s (x4 over 8m58s) kubelet, node2 Successfully pulled image "busybox"
需要特别说明的是,exec指定的命令运行于容器中,会消耗容器的可用资源配额,另外,考虑到探测操作的效率本身等因素、探测操作的命令应该简单和轻量
二、存活性检测(设置http探针)
1、官方手册详解
[root@master chapter4]# kubectl explain pod.spec.containers.livenessProbe.httpGet KIND: Pod VERSION: v1 RESOURCE: httpGet <Object> DESCRIPTION: HTTPGet specifies the http request to perform. HTTPGetAction describes an action based on HTTP Get requests. FIELDS: host <string> #请求的主机地址,默认为POD IP;也可以在httpheaders中使用"Host:" 来定义 Host name to connect to, defaults to the pod IP. You probably want to set "Host" in httpHeaders instead. httpHeaders <[]Object> #自定义的请求报文首部 Custom headers to set in the request. HTTP allows repeated headers. path <string> #请求http资源路径,即URL path Path to access on the HTTP server. port <string> -required- #请求端口,必须字段 Name or number of the port to access on the container. Number must be in the range 1 to 65535. Name must be an IANA_SVC_NAME. scheme <string> #建立连接使用的协议,仅可为HTTPS,默认为HTTP Scheme to use for connecting to the host. Defaults to HTTP.
2、资源清单
创建一个专用于httpGet测试页面的文件healthz:
[root@master chapter4]# cat liveness-http.yaml apiVersion: v1 kind: Pod metadata: labels: test: liveness name: liveness-http spec: containers: - name: liveness-demo image: nginx:1.12-alpine ports: - name: http containerPort: 80 lifecycle: postStart: exec: command: - /bin/sh - -c - 'echo Healty > /usr/share/nginx/html/healthz' livenessProbe: httpGet: path: /healthz port: http
3、创建运行
首先创建POD对象
[root@master chapter4]# kubectl apply -f liveness-http.yaml pod/liveness-http created
4、验证效果
而后查看其监控康状态监测相关的信息,健康状态监测正常时,容器也讲正常运行
root@master chapter4]# kubectl describe pod liveness-http Name: liveness-http ...... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled <unknown> default-scheduler Successfully assigned default/liveness-http to node2 Normal Pulling 55s kubelet, node2 Pulling image "nginx:1.12-alpine" Normal Pulled 21s kubelet, node2 Successfully pulled image "nginx:1.12-alpine" Normal Created 21s kubelet, node2 Created container liveness-demo Normal Started 21s kubelet, node2 Started container liveness-demo
接下来借助于"kubectl exec" 命令删除经由poststart hook创建的测试页面healthz:
[root@master chapter4]# kubectl exec liveness-http rm /usr/share/nginx/html/healthz kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl kubectl exec [POD] -- [COMMAND] instead. [root@master chapter4]# kubectl exec liveness-http rm /usr/share/nginx/html/healthz kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl kubectl exec [POD] -- [COMMAND] instead.
再次执行"kubectl get pods liveness-http" 查看其详细的状态信息,事件输出中的信息可以表明探测测试失败,容器被杀掉后进行了重新创建
[root@master chapter4]# kubectl get pods liveness-http NAME READY STATUS RESTARTS AGE liveness-http 1/1 Running 2 5m11s [root@master chapter4]# kubectl describe pod liveness-http Name: liveness-http ...... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled <unknown> default-scheduler Successfully assigned default/liveness-http to node2 Normal Pulling 5m58s kubelet, node2 Pulling image "nginx:1.12-alpine" Normal Pulled 5m24s kubelet, node2 Successfully pulled image "nginx:1.12-alpine" Warning Unhealthy 2m12s (x6 over 3m2s) kubelet, node2 Liveness probe failed: HTTP probe failed with statuscode: 404 Normal Killing 2m12s (x2 over 2m42s) kubelet, node2 Container liveness-demo failed liveness probe, will be restarted Normal Created 2m11s (x3 over 5m24s) kubelet, node2 Created container liveness-demo Normal Started 2m11s (x3 over 5m24s) kubelet, node2 Started container liveness-demo Normal Pulled 2m11s (x2 over 2m41s) kubelet, node2 Container image "nginx:1.12-alpine" already present on machine
一般来说HTTP类型的探测操作应该针对专用的URL路径进行,例如:/healthz
另外此URL路径对应的web资源应该以轻量化的方式在内部对应用程序的个关键组件进行全面检测以确保可正常向客户端提供完整的服务
需要注意的是:这种检测试试仅对分层架构中的前一层有效、但重启操作却无法解决其后端服务(如数据库或缓存服务)导致的故障此时容器可能会被一次次的重启,知道后端服务恢复正常位置。其他两种检测方式也存在类似的问题
三、存活性检测(设置TCP探针)
1、官方手册详解
[root@master chapter4]# kubectl explain pod.spec.containers.livenessProbe.tcpSocket KIND: Pod VERSION: v1 RESOURCE: tcpSocket <Object> DESCRIPTION: TCPSocket specifies an action involving a TCP port. TCP hooks not yet supported TCPSocketAction describes an action based on opening a socket FIELDS: host <string> #请求连接的目标IP地址,默认POD ip Optional: Host name to connect to, defaults to the pod IP. port <string> -required- #请求连接的目标端口,必选字段 Number or name of the port to access on the container. Number must be in the range 1 to 65535. Name must be an IANA_SVC_NAME.
2、模板示例
cat nginx_pod_tcpSocket.yaml apiVersion: v1 kind: Pod metadata: name: tcpSocket spec: containers: - name: nginx image: 10.0.0.11:5000/nginx:1.13 ports: - containerPort: 80 livenessProbe: tcpSocket: port: 80 initialDelaySeconds: 3 periodSeconds: 3
四、存活性探测行为属性
1、查看存活性探测pod对象的详细信息
使用"kubectl describe" 命令查看配置了存活性探测的pod对象的详细信息时,其相关容器中会输出类似如下一行的内容
[root@master chapter4]# kubectl describe pod liveness-exec Name: liveness-exec ...... Ready: False Restart Count: 10 Liveness: exec [test -e /tmp/healthy] delay=0s timeout=1s period=10s #success=1 #failure=3
它给出了探测方式及其额外的配置属性delay、timeout、period、success和failure及其各自的相关属性值。
用户没有明确定义这些属性字段时,它们会使用各自的默认值,例如上面显示出的设定,这些属性信息可通过"pod.spec.containers.livenessProbe" 的如下属性字段来给出:
2、官方手册详解
kubectl explain pod.spec.containers.livenessProbe
[root@master chapter4]# kubectl explain pod.spec.containers.livenessProbe KIND: Pod VERSION: v1 RESOURCE: livenessProbe <Object> DESCRIPTION: Periodic probe of container liveness. Container will be restarted if the probe fails. Cannot be updated. More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes Probe describes a health check to be performed against a container to determine whether it is alive or ready to receive traffic. FIELDS: exec <Object> One and only one of the following should be specified. Exec specifies the action to take. failureThreshold <integer> #处于成功状态时,探测操作至少连续多少次的失败才被视为是检测不通过、显示为#failure属性、默认值为3、最小值为1 Minimum consecutive failures for the probe to be considered failed after having succeeded. Defaults to 3. Minimum value is 1. httpGet <Object> HTTPGet specifies the http request to perform. initialDelaySeconds <integer> #存活性探针延迟时长、即容器启动多久之后再开始第一次探测操作,显示为delay属性;默认为0秒、即容器启动后立刻便开始进行探测 Number of seconds after the container has started before liveness probes are initiated. More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes periodSeconds <integer> #存活性探针的频度,显示为period属性、默认值为10s、最小值为1s、过高频率会对pod对象带来较大的额外开销、而过低的频率会使得对错误的发应不及时 How often (in seconds) to perform the probe. Default to 10 seconds. Minimum value is 1. successThreshold <integer> #处于失败状态时、探测操作至少连续多少次的成功才被认为通过检测,显示为#success属性、默认值为1、最小值也为1 Minimum consecutive successes for the probe to be considered successful after having failed. Defaults to 1. Must be 1 for liveness and startup. Minimum value is 1. tcpSocket <Object> TCPSocket specifies an action involving a TCP port. TCP hooks not yet supported timeoutSeconds <integer> #存活性探测的超时时长,显示为timeout属性,默认为1s、最小值也为1s Number of seconds after which the probe times out. Defaults to 1 second. Minimum value is 1. More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes 根据修改的清单再次创建pod对象并进行效果测试,可以从输出的详细信息中看出已经更新到自定义的属性,其内容如下所示 [root@master chapter4]# kubectl describe pod liveness-exec Name: liveness-exec ...... Ready: False Restart Count: 10 Liveness: exec [test -e /tmp/healthy] delay=5s timeout=2s period=5s #success=1 #failure=3
五、就绪性探测
1、就绪性探测的用途
就绪性探测是用来判断容器就绪与否的周期性操作、他用于探测容器是否已经初始化完成并可服务于客户端请求、探测操作返回"success"状态时,即为传递容器已经"就绪"的信号
探测失败时、就绪性探测不会杀死活重启容器以保证其健康性,而是通知其尚未就绪,并触发依赖于其就绪状态操作(例如从service对象中移除pod对象)以确保客户端请求接入此pod对象
2、价值所在
价值所在:Pod A 依赖的Pod B因网络故障等原因而不可用时,Pod A上的服务应该转为未就绪状态、以免无法向客户端提供完整的相应
将容器定义中liveness的字段名替换为readinessProbe即可定义出就绪性探测的配置、一个简单的示例如下面的配置清单(readiness-exec)所示,它会在pod对象创建完成5秒钟后使用test -e /tmp/ready命令来探测容器的就绪性,命令执行成功即为就绪、探测周期为5秒钟:
3、资源清单
[root@master chapter4]# cat readiness-exec.yaml apiVersion: v1 kind: Pod metadata: labels: test: readiness-exec name: readiness-exec spec: containers: - name: readiness-demo image: busybox args: ["/bin/sh", "-c", "while true; do rm -f /tmp/ready; sleep 30; touch /tmp/ready; sleep 300; done"] readinessProbe: exec: command: ["test", "-e", "/tmp/ready"] initialDelaySeconds: 5 periodSeconds: 5
4、创建运行
首先、使用"kubectl create"命令将资源配置清单定义的资源创建到集群中:
[root@master chapter4]# kubectl create -f readiness-exec.yaml pod/readiness-exec created
5、效果验证
接着、运行"kubectl get -w "命令监视其资源变动信息,由如下命令结果可知,尽管pod对象处于Running状态,但知道就绪探测命令执行成功后pod资源才转为"就绪"
[root@master chapter4]# kubectl get pods -l test=readiness-exec -w NAME READY STATUS RESTARTS AGE readiness-exec 0/1 Running 0 22s readiness-exec 1/1 Running 0 50s
另外、还可以从pod对象的详细信息中得到类似如下的表示其已经处于就绪状态的信息
[root@master chapter4]# kubectl describe pod readiness-exec Name: readiness-exec ....... Ready: True Restart Count: 0 Readiness: exec [test -e /tmp/ready] delay=5s timeout=1s period=5s #success=1 #failure=3
特别提醒:
未定义就绪性探测的POD迪欧瞎忙活早pod进入"Running" 状态后将立即就绪,在容器需要时间进行初始化场景中,在应用真正就绪之前
必然无法正常想用客户请求,因此、生产实践中,必须为关键性pod资源中的容器定义就绪性探测机制,其探测机制的定义请参考4.6节中定义