架构第七次作业-20230819
一、基于Velero及MinIO实现对kubernetes 集群etcd数据自定义数据备份与恢复
Velero
是 vmware 开源的一个云原生的灾难恢复和迁移工具,可以安全的备份、恢复和迁移 kubernetes 集群资源数据 https://velero.io/
Velero 的工作方式就是把 kubernets 中的数据备份到对象存储以实现高可用和持久化,默认的备份保存时间为720小时,并在需要的时候进行下载和恢复。
Velero 与 etcd 快照备份的区别:
- etcd 快照是全局完全备份,即使需要恢复一个资源对象,也需要做完全恢复,这样会影响其它 namespace 中 Pod 运行的服务
- Velero 可以有针对性的备份,如按照 namespace 单独备份、只备份单独的资源对象等,在恢复的时候可以根据备份只回复单独的 namespace 或资源对象,不影响其它 namespace 中 Pod 运行的服务
- Velero 支持 ceph、oos 等对象存储,etcd 快照是一个本地文件
- Velero 支持任务计划实现周期备份,但 etcd 快照也可以基于 cronjob 实现
- Velero 支持对 AWS EBS 创建快照及还原
备份流程
- Velero 客户端调用 kubernets api-server 创建 Backup 任务
- Backup 控制器基于 watch 机制通过 API Server 获取到备份任务
- Backup 控制器开始执行备份动作,其会通过请求 API Server 获取需要备份的数据
- Backup 控制器将获取到的数据备份到指定的对象存储 Server 端
1.1 部署 minio
1、创建 minio 容器
root@k8s-ha2-deploy-239:~# mkdir -p /data/minio
root@k8s-ha2-deploy-239:~# docker pull registry.cn-hangzhou.aliyuncs.com/wuhaolam/baseimage:minio-RELEASE.2022-04-12T06-55-35Z
# 创建 minio 容器,如果不指定用户名和密码,则默认为 minioadmin/minioadmin
# 9000 数据的读写端口;9999 控制台登录端口
root@k8s-ha2-deploy-239:~# docker run --name minio \
> -p 9000:9000 \
> -p 9999:9999 \
> -d --restart=always \
> -e "MINIO_ROOT_USER=admin" \
> -e "MINIO_ROOT_PASSWORD=12345678" \
> -v /data/minio/data:/data \
> registry.cn-hangzhou.aliyuncs.com/wuhaolam/baseimage:minio-RELEASE.2022-04-12T06-55-35Z server /data \
> --console-address '0.0.0.0:9999'
2、登录 minio
3、minio 创建 bucket
1.2 部署 Velero
Velero
github站点:https://github.com/vmware-tanzu/velero
部署 Velero 的主机需要有访问 k8s-apiserver 的权限
1.2.1 在 master1 节点下载源文件
root@k8s-master1-230:/usr/local/src# wget https://github.com/vmware-tanzu/velero/releases/download/v1.11.1/velero-v1.11.1-linux-amd64.tar.gz
root@k8s-master1-230:/usr/local/src# tar xvf velero-v1.11.1-linux-amd64.tar.gz
root@k8s-master1-230:/usr/local/src# cp velero-v1.11.1-linux-amd64/velero /usr/local/bin/
root@k8s-master1-230:/usr/local/src# velero --help
1.2.2 部署 Velero 环境
root@k8s-master1-230:~# mkdir -p /data/velero
# 配置访问 minio 的认证文件
root@k8s-master1-230:/data/velero# cat velero-auth.txt
[default]
aws_access_key_id = admin
aws_secret_access_key = 12345678
# 安装 Velero
root@k8s-master1-230:/data/velero# velero --kubeconfig /root/.kube/config install \
> --provider aws \
> --plugins registry.cn-hangzhou.aliyuncs.com/zhangshijie/velero-plugin-for-aws:v1.7.1 \
> --bucket velerodata \
> --secret-file ./velero-auth.txt \
> --use-volume-snapshots=false \
> --backup-location-config region=minio,s3ForcePathStyle="true",s3Url=http://10.243.20.239:9000
# 验证
root@k8s-master1-230:/data/velero# kubectl get pod -n velero
NAME READY STATUS RESTARTS AGE
velero-696978b6c9-nbkz5 1/1 Running 0 2m39s
1.2.3 验证备份与恢复
1.2.3.1 对 default 和 Velero namespace 进行备份
1、备份
root@k8s-master1-230:/data/velero# DATE=$(date +%F-%H%M%S)
# --include-cluster-resources 全局资源都备份
# --include-namespaces 备份指定的namespace,多个namespace之间用','隔开
# --namespace 数据备份的位置
root@k8s-master1-230:/data/velero# velero backup create default-namespace-${DATE} --include-cluster-resources=true --include-namespaces default,velero --kubeconfig=/root/.kube/config --namespace velero
Backup request "default-namespace-2024-02-08-100852" submitted successfully.
Run `velero backup describe default-namespace-2024-02-08-100852` or `velero backup logs default-namespace-2024-02-08-100852` for more details.
# 查看备份
root@k8s-master1-230:/data/velero# velero backup describe default-namespace-2024-02-08-100852 -n velero
Name: default-namespace-2024-02-08-100852
Namespace: velero
Labels: velero.io/storage-location=default
Annotations: velero.io/source-cluster-k8s-gitversion=v1.27.2
velero.io/source-cluster-k8s-major-version=1
velero.io/source-cluster-k8s-minor-version=27
Phase: Completed
Namespaces:
Included: default, velero
Excluded: <none>
Resources:
Included: *
Excluded: <none>
Cluster-scoped: included
Label selector: <none>
Storage Location: default
Velero-Native Snapshot PVs: auto
TTL: 720h0m0s
CSISnapshotTimeout: 10m0s
ItemOperationTimeout: 1h0m0s
Hooks: <none>
Backup Format Version: 1.1.0
Started: 2024-02-08 10:09:13 +0800 CST
Completed: 2024-02-08 10:09:19 +0800 CST
Expiration: 2024-03-09 10:09:13 +0800 CST
Total items to be backed up: 289
Items backed up: 289
Velero-Native Snapshots: <none included>
2、删除 Pod 并验证数据恢复
# 删除 Pod
root@k8s-master1-230:/data/velero# kubectl get pod
NAME READY STATUS RESTARTS AGE
net-test1 1/1 Running 71 (63m ago) 30d
test-centos 1/1 Running 2 (12d ago) 23d
root@k8s-master1-230:/data/velero# kubectl delete pod test-centos
pod "test-centos" deleted
# 恢复 Pod
root@k8s-master1-230:/data/velero# velero restore create --from-backup default-namespace-2024-02-08-100852 --wait --kubeconfig=/root/.kube/config --namespace velero
Restore request "default-namespace-2024-02-08-100852-20240208102408" submitted successfully.
Waiting for restore to complete. You may safely press ctrl-c to stop waiting - your restore will continue in the background.
..........................
Restore completed with status: Completed. You may check for more information using the commands `velero restore describe default-namespace-2024-02-08-100852-20240208102408` and `velero restore logs default-namespace-2024-02-08-100852-20240208102408`.
# 只会创建已经删除的Pod,对正常运行的Pod不会进行重新创建
root@k8s-master1-230:/data/velero# kubectl get pod
NAME READY STATUS RESTARTS AGE
net-test1 1/1 Running 71 (69m ago) 30d
test-centos 1/1 Running 0 53s
root@k8s-master1-230:/data/velero# kubectl exec -it test-centos /bin/bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
[root@test-centos /]# ping -c 2 www.baidu.com
PING www.a.shifen.com (183.2.172.185) 56(84) bytes of data.
64 bytes from 183.2.172.185 (183.2.172.185): icmp_seq=1 ttl=48 time=25.4 ms
64 bytes from 183.2.172.185 (183.2.172.185): icmp_seq=2 ttl=48 time=25.4 ms
--- www.a.shifen.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1002ms
rtt min/avg/max/mdev = 25.354/25.382/25.411/0.161 ms
1.2.3.2 对指定的资源对象进行备份
1、备份指定的 namespace 中的 Pod 或特定资源
root@k8s-master1-230:/data/velero# kubectl get pod -n myserver
NAME READY STATUS RESTARTS AGE
myserver-myapp-frontend-deployment-b6fb5767f-wfdx6 1/1 Running 0 6d3h
myserver-nginx-deployment-6f48cf5b-79dft 1/1 Running 6 (13d ago) 23d
myserver-nginx-deployment-6f48cf5b-8xcsf 1/1 Running 7 (13d ago) 23d
myserver-tomcat-deployment-b67dddc64-6tm4z 1/1 Running 2 (13d ago) 23d
myserver-tomcat-deployment-b67dddc64-v2jff 1/1 Running 2 (13d ago) 23d
root@k8s-master1-230:/data/velero# kubectl get pods
NAME READY STATUS RESTARTS AGE
net-test1 1/1 Running 71 (4h58m ago) 30d
test-centos 1/1 Running 0 3h49m
# 备份 myserver namespace中的Pod 和 default namespace 中的Pod
root@k8s-master1-230:/data/velero# DATE=$(date +%F-%H%M%S)
root@k8s-master1-230:/data/velero# velero backup create pod-myserver-${DATE} --include-cluster-resources=true --ordered-resources 'pods=myserver/myserver-myapp-frontend-deployment-b6fb5767f-wfdx6,default/test-centos' --namespace velero
Backup request "pod-myserver-2024-02-08-141153" submitted successfully.
Run `velero backup describe pod-myserver-2024-02-08-141153` or `velero backup logs pod-myserver-2024-02-08-141153` for more details.
2、删除 Pod 并验证数据恢复
root@k8s-master1-230:/data/velero# kubectl delete -n myserver pod myserver-myapp-frontend-deployment-b6fb5767f-wfdx6
pod "myserver-myapp-frontend-deployment-b6fb5767f-wfdx6" deleted
root@k8s-master1-230:/data/velero# kubectl delete pod test-centos
pod "test-centos" deleted
root@k8s-master1-230:/data/velero# kubectl get pod
NAME READY STATUS RESTARTS AGE
net-test1 1/1 Running 71 (5h3m ago) 30d
root@k8s-master1-230:/data/velero# velero restore create --from-backup pod-myserver-2024-02-08-141153 --wait --kubeconfig=/root/.kube/config --namespace velero
Restore request "pod-myserver-2024-02-08-141153-20240208142057" submitted successfully.
Waiting for restore to complete. You may safely press ctrl-c to stop waiting - your restore will continue in the background.
..........................................
Restore completed with status: Completed. You may check for more information using the commands `velero restore describe pod-myserver-2024-02-08-141153-20240208142057` and `velero restore logs pod-myserver-2024-02-08-141153-20240208142057`.
root@k8s-master1-230:/data/velero# kubectl get pod
NAME READY STATUS RESTARTS AGE
net-test1 1/1 Running 71 (5h6m ago) 30d
test-centos 1/1 Running 0 44s
1.2.3.3 批量备份所有 namespace
# 查看当前namespace
root@k8s-master1-230:/data/velero# kubectl get ns
NAME STATUS AGE
default Active 35d
kube-node-lease Active 35d
kube-public Active 35d
kube-system Active 35d
kuboard Active 29d
myserver Active 23d
velero Active 23h
root@k8s-master1-230:/data/velero# cat namespace-backup.sh
#!/bin/bash
NS_NAME=`kubectl get namespaces | awk '{if (NR>1){print $1}}'`
DATE=$(date +%F-%H%M%S)
cd /data/velero
for i in ${NS_NAME};do
velero backup create ${i}-namespace-${DATE} \
--include-cluster-resources=true \
--include-namespaces ${i} \
--kubeconfig=/root/.kube/config \
--namespace velero
done
root@k8s-master1-230:/data/velero# bash namespace-backup.sh
Backup request "default-namespace-2024-02-08-143617" submitted successfully.
Run `velero backup describe default-namespace-2024-02-08-143617` or `velero backup logs default-namespace-2024-02-08-143617` for more details.
Backup request "kube-node-lease-namespace-2024-02-08-143617" submitted successfully.
Run `velero backup describe kube-node-lease-namespace-2024-02-08-143617` or `velero backup logs kube-node-lease-namespace-2024-02-08-143617` for more details.
Backup request "kube-public-namespace-2024-02-08-143617" submitted successfully.
Run `velero backup describe kube-public-namespace-2024-02-08-143617` or `velero backup logs kube-public-namespace-2024-02-08-143617` for more details.
Backup request "kube-system-namespace-2024-02-08-143617" submitted successfully.
Run `velero backup describe kube-system-namespace-2024-02-08-143617` or `velero backup logs kube-system-namespace-2024-02-08-143617` for more details.
Backup request "kuboard-namespace-2024-02-08-143617" submitted successfully.
Run `velero backup describe kuboard-namespace-2024-02-08-143617` or `velero backup logs kuboard-namespace-2024-02-08-143617` for more details.
Backup request "myserver-namespace-2024-02-08-143617" submitted successfully.
Run `velero backup describe myserver-namespace-2024-02-08-143617` or `velero backup logs myserver-namespace-2024-02-08-143617` for more details.
Backup request "velero-namespace-2024-02-08-143617" submitted successfully.
Run `velero backup describe velero-namespace-2024-02-08-143617` or `velero backup logs velero-namespace-2024-02-08-143617` for more details.
MinIO 验证备份
二、基于nerdctl + BuildKit + Containerd构建容器镜像
2.1 安装 containerd
见此文章:https://wuhaolam.top/archives/229c053b.html
# nerdctl 命令自动补全
root@k8s-master1-120:~# vim /etc/profile
## 添加此行
source <(nerdctl completion bash)
root@k8s-master1-120:~# source /etc/profile
# 登录镜像仓库
root@k8s-master1-120:/data/dockerfile/nginx# nerdctl login --username=wuhaolam registry.cn-hangzhou.aliyuncs.com
Enter Password:
WARN[0003] skipping verifying HTTPS certs for "registry.cn-hangzhou.aliyuncs.com"
WARNING: Your password will be stored unencrypted in /root/.docker/config.json.
Configure a credential helper to remove this warning. See
https://docs.docker.com/engine/reference/commandline/login/#credentials-store
Login Succeeded
2.2 部署 BuildKit
https://github.com/moby/buildkit
组成部分
buildkitd(服务端),目前支持 runc 和 containerd 作为镜像构建环境,默认是 runc,可以更换为 containerd。
buildctl(客户端),负责解析 Dockerfile 文件,并向服务端 buildkitd 发出构建请求。
安装 buildkitd
# 下载地址 https://github.com/moby/buildkit/releases/download/v0.12.1/buildkit-v0.12.1.linux-amd64.tar.gz
root@k8s-master1-120:~# ls
buildkit-v0.12.1.linux-amd64.tar.gz
root@k8s-master1-120:~# tar xf buildkit-v0.12.1.linux-amd64.tar.gz -C /usr/local/src/
root@k8s-master1-120:~# mv /usr/local/src/bin/* /usr/local/bin/
root@k8s-master1-120:~# buildctl --help
# 准备 buildkit socket 文件和 buildkitd Service 文件
root@k8s-master1-120:~# cat /lib/systemd/system/buildkit.socket
[Unit]
Description=BuildKit
Documentation=https://github.com/moby/buildkit
[Socket]
ListenStream=%t/buildkit/buildkitd.sock
[Install]
WantedBy=sockets.target
root@k8s-master1-120:~# cat /lib/systemd/system/buildkitd.service
[Unit]
Description=BuildKit
Requires=buildkit.socket
After=buildkit.socketDocumentation=https://github.com/moby/buildkit
[Service]
ExecStart=/usr/local/bin/buildkitd --oci-worker=false --
containerd-worker=true
[Install]
WantedBy=multi-user.target
root@k8s-master1-120:~# systemctl daemon-reload
root@k8s-master1-120:~# systemctl enable buildkitd
root@k8s-master1-120:~# systemctl restart buildkitd
root@k8s-master1-120:~# systemctl status buildkitd
2.3 构建镜像
root@k8s-master1-120:~# mkdir -p /data/dockerfile/nginx
root@k8s-master1-120:~# cd /data/dockerfile/nginx
准备所需文件
root@k8s-master1-120:/data/dockerfile/nginx# ls
Dockerfile image-build.sh index.html nginx-1.22.0.tar.gz nginx.conf sources.list执行构建
root@k8s-master1-120:/data/dockerfile/nginx# bash image-build.sh
root@k8s-master1-120:/data/dockerfile/nginx# nerdctl images | grep ubuntu
registry.cn-hangzhou.aliyuncs.com/wuhaolam/myserver ubuntu-nginx-20240214 1662392df732 7 minutes ago linux/amd64 411.4 MiB 156.1 MiB
ubuntu 国内镜像源
root@k8s-master1-120:/data/dockerfile/nginx# cat sources.list
# 默认注释了源码镜像以提高 apt update 速度,如有需要可自行取消注释
deb http://mirrors.tuna.tsinghua.edu.cn/ubuntu/ bionic main restricted universe multiverse
# deb-src http://mirrors.tuna.tsinghua.edu.cn/ubuntu/ bionic main restricted universe multiverse
deb http://mirrors.tuna.tsinghua.edu.cn/ubuntu/ bionic-updates main restricted universe multiverse
# deb-src http://mirrors.tuna.tsinghua.edu.cn/ubuntu/ bionic-updates main restricted universe multiverse
deb http://mirrors.tuna.tsinghua.edu.cn/ubuntu/ bionic-backports main restricted universe multiverse
# deb-src http://mirrors.tuna.tsinghua.edu.cn/ubuntu/ bionic-backports main restricted universe multiverse
deb http://security.ubuntu.com/ubuntu/ bionic-security main restricted universe multiverse
# deb-src http://security.ubuntu.com/ubuntu/ bionic-security main restricted universe multiverse
Dockerfile 文件
FROM registry.cn-hangzhou.aliyuncs.com/wuhaolam/baseimage:ubuntu18.04
LABEL author="harry wu"
ADD sources.list /etc/apt/sources.list
RUN apt update && apt -y install iproute2 ntpdate tcpdump nfs-kernel-server nfs-common lrzsz tree openssl libssl-dev libpcre3 libpcre3-dev zlib1g-dev gcc openssh-server iotop unzip zip make
ADD nginx-1.22.0.tar.gz /usr/local/src/
RUN mkdir -p /apps/nginx && useradd -r -s /sbin/nologin nginx && cd /usr/local/src/nginx-1.22.0 && ./configure --prefix=/apps/nginx/ --user=nginx --group=nginx && make && make install && ln -s /apps/nginx/sbin/nginx /usr/sbin/nginx && chown -R nginx:nginx /apps/nginx/ && mkdir -p /apps/nginx/run && mkdir /apps/nginx/html/myapp
ADD nginx.conf /apps/nginx/conf/
ADD index.html /apps/nginx/html/
EXPOSE 80 8008
CMD ["nginx","-g","daemon off;"]
nginx 网页文件
root@k8s-master1-120:/data/dockerfile/nginx# cat index.html
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>我的静态网页</title>
</head>
<body>
<h1>欢迎来到我的网页!</h1>
<p>这是一个示例网页,用于展示标题、文字和图片的基本结构。</p>
<img src="https://cdn.jsdelivr.net/gh/wuhaolam/picgo_demo/img/QQ%E6%88%AA%E5%9B%BE20231129163610.png" alt="light belfry">
</body>
</html>
nginx 配置文件
root@k8s-master1-120:/data/dockerfile/nginx# cat nginx.conf
user nginx nginx;
worker_processes 2;
error_log logs/error.log;
#error_log logs/error.log notice;
#error_log logs/error.log info;
pid /apps/nginx/run/nginx.pid;
events {
worker_connections 1024;
}
http {
include mime.types;
default_type application/octet-stream;
client_max_body_size 30G;
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
access_log logs/access.log main;
sendfile on;
#tcp_nopush on;
#keepalive_timeout 0;
keepalive_timeout 65;
#gzip on;
server {
listen 80;
#listen 443 ssl;
#server_name harbor.wuhaolam.top;
#ssl_certificate /mnt/sdb/data/nginx/certs/harbor.wuhaolam.top.pem;
#ssl_certificate_key /mnt/sdb/data/nginx/certs/harbor.wuhaolam.top.key;
#charset koi8-r;
#access_log logs/host.access.log main;
location / {
root html;
index index.php index.html;
}
#location /myapp {
# root /apps/nginx/html;
# index index.html;
#}
error_page 500 502 503 504 /50x.html;
location = /50x.html {
root html;
}
location ~ \.php$ {
root html;
fastcgi_pass 127.0.0.1:9000;
fastcgi_index index.php;
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
include fastcgi_params;
}
}
include /apps/nginx/conf/conf.d/*.conf;
}
镜像构建脚本
root@k8s-master1-120:/data/dockerfile/nginx# cat image-build.sh
#!/bin/bash
/usr/local/bin/nerdctl build -t registry.cn-hangzhou.aliyuncs.com/wuhaolam/myserver:ubuntu-nginx-20240214 .
/usr/local/bin/nerdctl push registry.cn-hangzhou.aliyuncs.com/wuhaolam/myserver:ubuntu-nginx-20240214
2.4 测试构建的镜像
root@k8s-master1-120:/data/dockerfile/nginx# nerdctl run -d --restart=always -p 8008:80 --name=ubuntu-nginx registry.cn-hangzhou.aliyuncs.com/wuhaolam/myserver:ubuntu-nginx-20240214
3ae39cafae8b7e4cb38736a505cbbed32921b5dcdeb5ec2e1a8495b00119d4af
root@k8s-master1-120:/data/dockerfile/nginx# nerdctl ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
3ae39cafae8b registry.cn-hangzhou.aliyuncs.com/wuhaolam/myserver:ubuntu-nginx-20240214 "nginx -g daemon off;" 7 seconds ago Up 0.0.0.0:8008->80/tcp ubuntu-nginx
三、总结pod生命周期、pod常见状态、pod调度流程详解
3.1 Pod 的生命周期
1、当 kubelet 开始在节点上创建Pod时,首先创建Pause容器
2、Pause 容器穿件成功后,如果配置了 Init 容器进行环境初始化,那么依次创建 Init Containerd
3、Init 容器创建成功后就会退出不再运行,之后开始创建容器,在配置了 postStart 探针后开始执行操作,相关参数执行成功后继续创建容器
4、kubernetes v1.16 后引入了startupProbe,只有启动探针检测通过了才会执行其它探针
5、配置了readinessProbe 和 livenessProbe 后,就绪探针检测通过会将 Pod 加入到 Service中,存活探针会决定在检测失败后是否将 Pod 进行重启
6、当Pod需要被删除时,preStop 探针会在删除前执行相关操作,之后Pod才会从节点中移除
3.2 Pod 常见状态
1、Unschedulable # Pod 不能被调度,kube-scheduler 没有匹配到合适的 node 节点
2、PodScheduled # Pod 正处于调度中,在 kube-scheduler 刚开始调度的时候,还没有将 Pod 分配到指定的 node,在筛选出合适的节点后就会更新 etcd 数据,将 Pod 分配到指定的 node
3、Pending # 正在创建 Pod,但是 Pod 中的容器还没有全部被创建完成
4、Failed # Pod 中有容器启动失败而导致 Pod 工作异常
5、Unknown # 由于某种原因无法获得 Pod 的当前状态,通常是由于与 Pod 所在的 node 节点通信错误
6、Initialized # 所有 Pod 中的初始化容器已经完成
7、ImagePullBackOff # Pod 所在的 node 节点下载镜像失败
8、Running # Pod 内部的容器已经被创建并且启动
9、Ready # 表示 Pod 中的容器已经可以提供访问服务
10、Error # Pod 启动过程中发生错误
11、NodeLost # Pod 所在的节点失联
12、Waiting # Pod 等待启动
13、Terminating # Pod 正在被销毁
14、CrashLoopBackOff # Pod 崩溃,但 kubelet 正将它重启
15、InvalidImageName # Node 节点无法解析镜像名称导致镜像无法下载
16、ImageInspectError # 无法校验镜像,镜像不完整导致
17、ErrImageNeverPull # 策略禁止拉取镜像,镜像仓库权限是私有的等原因导致
18、RegistryUnavailable # 镜像服务器不可用,网络原因或Harbor宕机
19、ErrImagePull # 镜像拉取错误,超时下载或被强制终止
20、CreateContainerError # 创建容器失败
21、RunContainerError # Pod 运行失败,容器中没有初始化 PID 为1的守护进程等原因
22、ContainersNotInitialized # Pod 没有初始化完毕
23、ContainersNotReady # Pod 没有准备完毕
24、ContainerCreating # Pod 正在创建中
25、PodInitializing # Pod 正在初始化中
26、DockerDaemonNotReady # Node 节点 Docker 服务没有启动
27、NetworkerPluginNotReady # 网络插件没有启动
3.3 Pod 调度流程
通过命令行工具执行 Pod 创建时,k8s master 会调用 API Server 将事件写入到 etcd 中,之后 kube-scheduler 监听 API Server 获取创建 Pod 的事件,再进行调度,调度成功后发给 API Server 将事件写入到 etcd 中。kubelet 监听 Master API 端口,获取到创建 Pod 的事件,然后在指定的 Node 节点创建 Pod。kube-proxy 监听 Master API 端口获取到端口映射关系,调用内核通过 IPtables 或 IPVS 创建网络规则。
四、总结pause容器、init容器的功能及使用案例
4.1 pause 容器
Pause 容器又称为 Infra 容器,是 Pod 的基础容器,镜像体积只有几百KB左右,配置在 kubelet 中,主要的功能是一个 Pod 中多个容器的网络通信。
Infra 容器被创建后会初始化 Network Namespace,之后其它容器就可以加入到 Infra 容器中共享 Infra 容器的网络。因此如果一个 Pod 中的两个容器 A 和 B,它们的共享关系如下:
(1)A 容器和 B 容器能够直接使用 localhost 通信
(2)A 容器和 B 容器可以看到网卡、IP与端口监听信息
(3)Pod 中只有一个 IP 地址,也就是该 Pod 的 Network Namespace 对应的 IP 地址(由 Infra 容器初始化并创建)
(4)k8s 环境中的每个 Pod 有一个独立的 IP 地址并且此 IP 被当前 Pod 中所有容器在内部共享使用
(5)Pod 被删除后 Infra 容器随机被删除,IP 地址被回收
Pause 容器共享的 Namespace
(1)NET Namespace:Pod 中的多个容器共享同一个网络命名空间,即使用相同的 IP 和端口信息
(2)IPC Namespace:Pod 中的多个容器可以使用 System V IPC 或 POSIX 消息队列进行通信
(3)UTS Namespace:Pod 中的多个容器共享一个主机名MNT Namespace、PID Namespace、User Namespace 未共享。
查看宿主机中的网卡与容器中网卡的对应
root@k8s-ha2-deploy-239:~# kubectl get pod -n myserver -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
myserver-myapp-frontend-deployment-b6fb5767f-jrnkk 1/1 Running 0 8d 10.200.19.59 10.243.20.240 <none> <none>
# 进入到容器中,两种方式都可以看到接口的索引号
root@k8s-ha2-deploy-239:~# kubectl exec -it myserver-myapp-frontend-deployment-b6fb5767f-jrnkk -n myserver sh
/ # cat /sys/class/net/eth0/iflink
23
/ # apk add ethtool
/ # ethtool -S eth0
NIC statistics:
peer_ifindex: 23
rx_queue_0_xdp_packets: 0
rx_queue_0_xdp_bytes: 0
rx_queue_0_drops: 0
rx_queue_0_xdp_redirect: 0
rx_queue_0_xdp_drops: 0
rx_queue_0_xdp_tx: 0
rx_queue_0_xdp_tx_errors: 0
tx_queue_0_xdp_xmit: 0
tx_queue_0_xdp_xmit_errors: 0
# 宿主机中查看
root@k8s-node1-240:~# ip link show
...
23: cali2ff8f0c7462@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1480 qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-7ba484ac-791d-a2f2-9a6f-11c8194f5559
...
root@k8s-node1-240:~# nsenter --net=/run/netns/cni-7ba484ac-791d-a2f2-9a6f-11c8194f5559 ifconfig eth0
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1480
inet 10.200.19.59 netmask 255.255.255.255 broadcast 0.0.0.0
inet6 fe80::a0a1:50ff:fea4:cd9a prefixlen 64 scopeid 0x20<link>
ether a2:a1:50:a4:cd:9a txqueuelen 1000 (Ethernet)
RX packets 1520 bytes 2446840 (2.4 MB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 1651 bytes 118111 (118.1 KB)
TX errors 0 dropped 1 overruns 0 carrier 0 collisions 0
案例演示
# 准备相关文件,实现动静分离
root@k8s-node1-240:~/pause-test-case# cat nginx.conf
error_log stderr;
events { worker_connections 1024; }
http {
access_log /dev/stdout;
server {
listen 80 default_server;
server_name www.mysite.com;
location / {
index index.html index.php;
root /usr/share/nginx/html;
}
location ~ \.php$ {
root /usr/share/nginx/html;
fastcgi_pass 127.0.0.1:9000;
fastcgi_index index.php;
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
include fastcgi_params;
}
}
}
root@k8s-node1-240:~/pause-test-case# cat index.html
<h1> pause web page </h1>
root@k8s-node1-240:~/pause-test-case# cat index.php
<?php
phpinfo();
?>
# 部署 Pause 容器
root@k8s-node1-240:~/pause-test-case# nerdctl pull registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.8
root@k8s-node1-240:~/pause-test-case# nerdctl run -d -p 80:80 --name=pause-container-test registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.8
397f64044fdca69555c7b2d108399e763cf840ade49a80e11ac6dea9e0ad88f9
# 部署 nginx 容器
root@k8s-node1-240:~/pause-test-case# nerdctl run -d --name nginx-container-test \
> -v `pwd`/nginx.conf:/etc/nginx/nginx.conf \
> -v `pwd`/html:/usr/share/nginx/html \
> --net=container:pause-container-test \
> registry.cn-hangzhou.aliyuncs.com/wuhaolam/myserver:nginx_1.22.0
3e5e49a459d1a3169fa1070064e0a677de0c1a3e75aae45498f7bd83bc43df53
# 部署 php 容器
root@k8s-node1-240:~/pause-test-case# nerdctl run -d --name=php-container-test --net=container:pause-container-test -v `pwd`/html:/usr/share/nginx/html registry.cn-hangzhou.aliyuncs.com/wuhaolam/baseimage:php-5.6.40-fpm
4.2 Init 容器
Init 容器的作用:
(1)可以为业务容器提前准备好运行环境,比如将业务容器需要的配置文件提前生成并放在指定位置、检查数据权限或完整性、软件版本等基础运行环境
(2)可以在运行业务容器之前准备好需要的业务数据,比如从 OSS 下载或从其它位置复制
(3)检查依赖的服务是否能够访问
Init 容器的特点:
(1)一个 Pod 可以有多个业务容器还能同时在有多个 Init 容器,但是每个 Init 容器和业务容器的运行环境都是隔离的
(2)Init 容器会比业务容器先启动
(3)Init 容器运行成功之后才会继续运行业务容器
(4)如果一个 Pod 有多个 Init 容器,则需要从上到下逐个运行并且全部成功,最后才会运行业务容器
(5)Init 容器不支持探针检测 -- Init 容器初始化完成后就退出再也不运行了
案例演示
root@k8s-ha2-deploy-239:~/case-example# cat init-container.yaml
kind: Deployment
apiVersion: apps/v1
metadata:
labels:
app: myserver-myapp
name: myserver-myapp-deployment-init
namespace: myserver
spec:
replicas: 1
selector:
matchLabels:
app: myserver-myapp-frontend
template:
metadata:
labels:
app: myserver-myapp-frontend
spec:
containers:
- name: myserver-myapp-container
image: nginx:1.20.0
volumeMounts:
- mountPath: "/usr/share/nginx/html/myserver"
name: myserver-data
- name: timezone
mountPath: /etc/localtime
initContainers:
- name: init-web-data
image: centos:7.9.2009
command: ['/bin/bash', '-c', "for i in `seq 1 5`; do echo '<h1>'$1 web page at $(date +%F-%T) '</h1>' >> /data/nginx/html/myserver/index.html;sleep 2; done"]
volumeMounts:
- mountPath: "/data/nginx/html/myserver"
name: myserver-data
- name: timezone
mountPath: /etc/localtime
- name: change-data-owner
image: busybox:1.28
command: ['/bin/sh', '-c', "/bin/chmod 644 /data/nginx/html/myserver/* -R"]
volumeMounts:
- mountPath: "/data/nginx/html/myserver"
name: myserver-data
- name: timezone
mountPath: /etc/localtime
volumes:
- name: myserver-data
hostPath:
path: /tmp/data/html
- name: timezone
hostPath:
path: /etc/localtime
---
kind: Service
apiVersion: v1
metadata:
labels:
app: myserver-myapp-service
name: myserver-myapp-service-name
namespace: myserver
spec:
type: NodePort
ports:
- name: http
port: 80
targetPort: 80
nodePort: 30028
selector:
app: myserver-myapp-frontend
root@k8s-ha2-deploy-239:~/case-example# kubectl apply -f init-container.yaml
root@k8s-ha2-deploy-239:~/case-example# kubectl get pod -n myserver -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
myserver-myapp-deployment-init-c4f8cfd-4b8jv 0/1 Init:0/2 0 3s 10.200.19.62 10.243.20.240 <none> <none>
五、总结探针类型、功能及区别、配置参数、使用场景、startupProbe、livenessProbe、readinessProbe案例
探针是由 kubelet 对容器执行的定期诊断,以保证 Pod 的状态始终处于运行状态,要执行诊断,kubelet 调用由容器实现的 Handler(处理程序),也成为 Hook(钩子),目前处理程序由如下几种:
- ExecAction 在容器内执行指定命令,如果命令退出时返回码为0则认为诊断成功
- TCPSocketAction 对指定端口上的容器的IP地址进行TCP检查,如果端口打开,则诊断被认为是成功的
- HTTPGetAction 对指定的端口和路径上的容器的IP地址执行HTTPGet请求,如果响应的状态码大于等于200且小于400,则诊断被认为是成功的
- grpc 仅针对支持 gRPC 健康检查协议应用的心跳检测(响应的状态是 “SERVING”,则认为诊断成功)
早期检测方式:https://github.com/grpc-ecosystem/grpc-health-probe/
探测结果
- 成功:容器通过诊断
- 失败:容器未通过诊断
- 未知:诊断失败,不会采取任何行动
Pod 的重启策略:Pod 配置探针,在检测失败的时候,会基于 restartPolicy 对 Pod 进行下一步操作
- Always:当容器异常时,k8s 自动重启该容器。ReplicationController/Replicaset/Deployment,默认为 Always
- OnFailure:当容器失败时(容器停止运行且退出码不为0),k8s 自动重启该容器
- Never:不论容器运行状态如何都不会重启该容器,Job 或 CronJob
5.1 探针的类型和功能
1、startupProbe:启动探针,kubernetes v1.16 引入
判断容器内的应用程序是否已经启动完成,如果配置了启动探测,则会先禁用所有其它的探测,直到startupProbe 检测成功为止,如果 startupProbe 探测失败,则 kubelet 将杀死容器,容器将按照重启策略进行下一步操作,如果容器没有提供探测,则默认状态为成功
2、livenessProbe:存活探针
检测容器是否正在运行,如果存活探测失败,则 kubelet 会杀死容器,并且容器将收到其重启策略的影响,如果容器不提供存活探针,则默认状态为 Success,livenessProbe 用于控制是否重启 Pod。
3、readinessProbe:就绪探针
如果就绪探针探测失败,端点控制器将从与 Pod 匹配的所有 Service 的端点中删除该 Pod 的 IP 地址,初始延迟之前的就绪状态默认为 Failure,如果容器不提供就绪探针,则默认状态为 Success,readinessProbe 用于控制 Pod 是否添加至 Service。
5.2 探针配置参数
官方文档:https://kubernetes.io/zh-cn/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#configure-probes
initialDelaySeconds: 120
- 初始化延迟时间,告诉 kubelet 在执行第一次探测前应该等待多少秒,默认是0秒,最小值为0(某些容器启动较慢,应根据情况调整时间)
periodSeconds: 60
- 探测周期间隔时间,指定了 kubelet 应该每多少秒执行一次存活探测,默认是10秒,最小值为1
timeoutSeconds: 5
- 单次探测超时时间,探测的超时后等待多少秒,默认值1秒,最小值是1
successThreshold: 1
- 从失败转为成功的重试次数,探测器在失败后,被视为成功的最小连接成功数,默认值为1。存活探测此值必须为1,最小值是1
failureThreshold: 3
- 从成功转为失败的重试次数,当 Pod 启动了并且探测到失败,kubernetes 的重试次数。存活探测情况下放弃意味着重新启动容器,就绪探测情况下的放弃Pod会被打上未就绪的标签,默认值是3,最小值是1
HTTP 探测
HTTP Probes 允许针对 httpGet 配置额外的字段:
host: 连接使用的主机名,默认是 Pod 的 IP。也可以在 HTTP 头中设置 "Host" 来代替
scheme: 用于设置连接主机的方式(HTTP 还是 HTTPS)。默认是 "HTTP"
path: 访问 HTTP 服务的路径。默认值为 "/"
httpHeaders: 请求中自定义的 HTTP 头。HTTP 头字段允许重复
port: 访问容器的端口号或者端口名。如果数字必须在 1~65535 之间
5.3 探针示例
5.3.1 exec-Probe 示例
root@k8s-ha2-deploy-239:~/probe# cat 1-exec-probe.yaml
kind: Deployment
apiVersion: apps/v1
metadata:
name: myserver-redis-deployment
namespace: myserver
spec:
replicas: 1
selector:
matchLables:
app: myserver-redis-label
template:
metadata:
labels:
app: myserver-redis-label
spec:
containers:
- name: myserver-redis-container
image: redis
ports:
- containerPort: 6379
livenessProbe:
exec:
command:
# - /apps/redis/bin/redis-cli
- /usr/local/bin/redis-cli
- quit
initialDelaySeconds: 10
periodSeconds: 3
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 3
---
apiVersion: v1
kind: Service
metadata:
name: myserver-redis-service
namespace: myserver
spec:
ports:
- name: http
port: 6379
targetPort: 6379
nodePort: 30016
type: NodePort
selector:
app: myserver-redis-label
# 容器成功启动
root@k8s-ha2-deploy-239:~/probe# kubectl apply -f 1-exec-probe.yaml
root@k8s-ha2-deploy-239:~/probe# kubectl get pod -n myserver | grep redis
myserver-redis-deployment-56f9754677-m7wnb 1/1 Running 0 75s
修改 yaml 文件,存活探针会执行该命令检测时,检测不通过
root@k8s-ha2-deploy-239:~/probe# kubectl apply -f 1-exec-probe.yaml
root@k8s-ha2-deploy-239:~/probe# kubectl get pods -n myserver
'NAME READY STATUS RESTARTS AGE
myserver-redis-deployment-7b7dcbc55c-tpjk2 1/1 Running 0 7s
root@k8s-ha2-deploy-239:~/probe# kubectl describe pod myserver-redis-deployment-7b7dcbc55c-tpjk2 -n myserver
5.3.2 tcp-Probe 示例
root@k8s-ha2-deploy-239:~/probe# cat 2-tcp-probe.yaml
kind: Deployment
apiVersion: apps/v1
metadata:
name: myserver-nginx-deployment
namespace: myserver
spec:
replicas: 1
selector:
matchLabels:
app: myserver-nginx-probe-label
template:
metadata:
labels:
app: myserver-nginx-probe-label
spec:
containers:
- name: myserver-nginx-container
image: nginx:1.20.2
ports:
- containerPort: 80
readinessProbe:
tcpSocket:
port: 80
#port: 8008
initialDelaySeconds: 5
periodSeconds: 3
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 3
---
kind: Service
apiVersion: v1
metadata:
name: myserver-nginx-service
namespace: myserver
spec:
ports:
- name: http
port: 88
targetPort: 80
nodePort: 30280
protocol: TCP
type: NodePort
selector:
app: myserver-nginx-probe-label
root@k8s-ha2-deploy-239:~/probe# kubectl apply -f 2-tcp-probe.yaml
# Service 成功创建
root@k8s-ha2-deploy-239:~/probe# kubectl get svc -n myserver
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
myserver-nginx-service NodePort 10.100.138.49 <none> 88:30280/TCP 7s
root@k8s-ha2-deploy-239:~/probe# kubectl get ep -n myserver
NAME ENDPOINTS AGE
myserver-nginx-service 10.200.21.40:80 32s
修改 readinessProbe 测试端口为 8008
# 重新部署后检测失败
root@k8s-ha2-deploy-239:~/probe# kubectl get pod -n myserver
NAME READY STATUS RESTARTS AGE
myserver-nginx-deployment-5948fcf8db-fd6ws 0/1 Running 0 5m20s
root@k8s-ha2-deploy-239:~/probe# kubectl get ep -n myserver
NAME ENDPOINTS AGE
myserver-nginx-service 118s
5.3.3 http-Probe 示例
root@k8s-ha2-deploy-239:~/probe# cat 3-http-probe.yaml
kind: Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: myserver-nginx-probe-deployment
namespace: myserver
spec:
replicas: 1
selector:
matchLabels:
app: myserver-nginx-probe-label
template:
metadata:
labels:
app: myserver-nginx-probe-label
spec:
containers:
- name: myserver-nginx-container
image: nginx:1.20.2
ports:
- containerPort: 80
livenessProbe:
httpGet:
# path: /monitor/monitor.html
path: /index.html
port: 80
initialDelaySeconds: 5
periodSeconds: 3
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 3
---
apiVersion: v1
kind: Service
metadata:
name: myserver-nginx-service
namespace: myserver
spec:
ports:
- name: http
port: 88
targetPort: 80
nodePort: 32080
protocol: TCP
type: NodePort
selector:
app: myserver-nginx-probe-label
# 容器正常运行
root@k8s-ha2-deploy-239:~/probe# kubectl get pods -n myserver
NAME READY STATUS RESTARTS AGE
myserver-nginx-probe-deployment-86df66c5df-n9l2l 1/1 Running 0 54s
使用一个不存在的文件作为测试
# 重新部署,探针检测失败,容器发生重启
root@k8s-ha2-deploy-239:~/probe# kubectl get pod -n myserver
NAME READY STATUS RESTARTS AGE
myserver-nginx-probe-deployment-7ccf4c4cdb-f6cjr 1/1 Running 2 (4s ago) 29s
root@k8s-ha2-deploy-239:~/probe# kubectl get pod -n myserver
NAME READY STATUS RESTARTS AGE
myserver-nginx-probe-deployment-7ccf4c4cdb-f6cjr 0/1 CrashLoopBackOff 4 (18s ago) 106s
5.3.4 grpc-Probe 示例
root@k8s-ha2-deploy-239:~/probe# cat 4-grpc-probe.yaml
kind: Deployment
apiVersion: apps/v1
metadata:
name: myserver-etcd-deployment
namespace: myserver
spec:
replicas: 1
selector:
matchLabels:
app: myserver-etcd-label
template:
metadata:
labels:
app: myserver-etcd-label
spec:
containers:
- name: myserver-etcd-container
image: registry.cn-hangzhou.aliyuncs.com/zhangshijie/etcd:3.5.1-0
command: ["/usr/local/bin/etcd", "--data-dir", "/var/lib/etcd", "--listen-client-urls", "http://0.0.0.0:2379", "--advertise-client-urls", "http://127.0.0.1:2379", "--log-level", "debug"]
ports:
- containerPort: 2379
livenessProbe:
grpc:
port: 2379
initialDelaySeconds: 10
periodSeconds: 3
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 3
---
apiVersion: v1
kind: Service
metadata:
name: myserver-etcd-service
namespace: myserver
spec:
ports:
- name: http
port: 2379
targetPort: 2379
nodePort: 32379
protocol: TCP
type: NodePort
selector:
app: myserver-etcd-label
root@k8s-ha2-deploy-239:~/probe# kubectl get pod -n myserver
NAME READY STATUS RESTARTS AGE
myserver-etcd-deployment-754f7c6fd7-fm28p 1/1 Running 0 19s
修改 grpc 测试的端口号为 2380
# 重新部署
root@k8s-ha2-deploy-239:~/probe# kubectl get pod -n myserver
NAME READY STATUS RESTARTS AGE
myserver-etcd-deployment-86cfc67c59-7t5mp 0/1 CrashLoopBackOff 5 (8s ago) 3m32s
5.3.5 startupProbe-livenessProbe-readinessProbe 示例
root@k8s-ha2-deploy-239:~/probe# cat 5-startupProbe-livenessProbe-readinessProbe.yaml
kind: Deployment
apiVersion: apps/v1
metadata:
name: myserver-nginx-probe-deployment
namespace: myserver
spec:
replicas: 1
selector:
matchLabels:
app: myserver-nginx-probe-label
template:
metadata:
labels:
app: myserver-nginx-probe-label
spec:
terminationGracePeriodSeconds: 60
containers:
- name: myserver-nginx-container
image: nginx.1.20.2
ports:
- containerPort: 80
startupProbe:
httpGet:
path: /index.html
port: 80
initialDelaySeconds: 120
failureThreshold: 3
periodSeconds: 3
readinessProbe:
httpGet:
path: /index.html
port: 80
initialDelaySeconds: 5
failureThreshold: 3
periodSeconds: 3
timeoutSeconds: 5
successThreshold: 1
livenessProbe:
httpGet:
path: /index.html
port: 80
initialDelaySeconds: 5
periodSeconds: 3
timeoutSeconds: 3
successThreshold: 1
failureThreshold: 3
---
apiVersion: v1
kind: Service
metadata:
name: myserver-nginx-service
namespace: myserver
spec:
ports:
- name: http
port: 81
targetPort: 80
nodePort: 30280
protocol: TCP
type: NodePort
selector:
app: myserver-nginx-label
# 只有 startupProbe 探针探测成功,才会执行 livenessProbe 和 readinessProbe 探针检测
# 容器成功运行,但检测还没完成,READY 未置1
root@k8s-ha2-deploy-239:~/probe# kubectl get pod -n myserver
NAME READY STATUS RESTARTS AGE
myserver-nginx-probe-deployment-c477f8484-clmhx 0/1 Running 0 30s
# 探针检测成功,全部就绪
root@k8s-ha2-deploy-239:~/probe# kubectl get pod -n myserver
NAME READY STATUS RESTARTS AGE
myserver-nginx-probe-deployment-c477f8484-clmhx 1/1 Running 0 7m57s
5.3.6 postStart-preStop 示例
root@k8s-ha2-deploy-239:~/probe# cat 6-postStart-preStop.yaml
kind: Deployment
apiVersion: apps/v1
metadata:
name: myserver-lifecycle
labels:
app: myserver-lifecycle
namespace: myserver
spec:
replicas: 1
selector:
matchLabels:
app: myserver-lifecycle-label
template:
metadata:
labels:
app: myserver-lifecycle-label
spec:
terminationGracePeriodSeconds: 60 # 优雅关闭等待的时间
containers:
- name: myserver-lifecycle-container
image: tomcat:7.0.94-alpine
lifecycle:
postStart:
exec:
command: ["/bin/sh","-c","echo 'Hello from the postStart handler'>>/usr/local/tomcat/webapps/ROOT/index.html"]
preStop:
exec:
command:
- /bin/sh
- -c
- 'sleep 10000000'
#command: ["/usr/local/tomcat/bin/catalina.sh","stop"]
#command: ['/bin/sh','-c','/path/preStop.sh']
ports:
- name: http
containerPort: 8080
---
kind: Service
apiVersion: v1
metadata:
name: myserver-lifecycle-service
namespace: myserver
spec:
ports:
- name: http
port: 80
targetPort: 8080
nodePort: 32080
protocol: TCP
type: NodePort
selector:
app: myserver-lifecycle-label
# 容器在启动后会执行 postStart 钩子函数;完全删除容器前会执行 preStop 函数
root@k8s-ha2-deploy-239:~/probe# kubectl apply -f 6-postStart-preStop.yaml
root@k8s-ha2-deploy-239:~/probe# kubectl get pods -n myserver
NAME READY STATUS RESTARTS AGE
myserver-lifecycle-6854fdbbc7-xjnsf 1/1 Running 0 21s
root@k8s-ha2-deploy-239:~/probe# curl http://10.243.20.240:32080
Hello from the postStart handler
# 执行删除操作,等待此 terminationGracePeriodSeconds 参数定义的时间后才终止 Pod
root@k8s-ha2-deploy-239:~/probe# kubectl delete pod -n myserver myserver-lifecycle-6854fdbbc7-xjnsf
在 Kubernetes 中,
terminationGracePeriodSeconds
参数设置的是从发送 SIGTERM 信号到 Pod 开始强制终止的总等待时间。这个时间段包括了执行preStop
钩子函数的时间。如果在删除 Pod 的时候您观察到等待了60秒(即
terminationGracePeriodSeconds
设置的时间),但是感觉preStop
函数没有运行完整个预设的时间,这可能有以下几个原因:
preStop
执行非常快:
- 如果
preStop
钩子里的命令执行得非常迅速,那么它会在terminationGracePeriodSeconds
规定的时间内完成,因此看起来像是没有等待整个设定的时间。- 命令未阻塞:
- 在提供的示例中,
preStop
配置为一个睡眠命令,但如果您实际配置的命令并不阻塞进程或者执行速度过快,就会立即结束。- 命令执行错误或无输出:
- 假如
preStop
中的命令由于某种原因未能正确执行或返回,Kubernetes 不会无限期等待,而是继续进行后续的清理流程。- 超时与宽限期重叠:
- 即使
preStop
没有执行完,一旦达到terminationGracePeriodSeconds
的时间上限,不论preStop
是否仍在运行,Kubernetes 都会继续执行下一步骤,也就是强制结束容器进程。为了确认
preStop
是否按照预期执行,请检查容器日志以及确保preStop
脚本中的命令是阻塞式且执行所需时间超过terminationGracePeriodSeconds
设置的值。同时,也可以通过调整terminationGracePeriodSeconds
到足够大的值以确保preStop
能够顺利完成。
六、Kubernetes实战案例-PV/PVC简介、Kubernetes运行Zookeeper集群并基于PVC实现数据持久化
1、StatefulSet+storageClassName实现
2、基于Deployment实现
6.1 基于 StatefulSet+storageClassName 实现 zookeeper 集群
6.1.1 准备存储类
1、创建账号,分配相应的权限,以便可以在 k8s 环境中进行相应的操作
root@k8s-ha2-deploy-239:~/case-example# cat rbac.yaml
apiVersion: v1
kind: Namespace
metadata:
name: nfs
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: nfs-client-provisioner
namespace: nfs
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: nfs-client-provisioner-runner
rules:
- apiGroups: [""]
resources: ["nodes"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["persistentvolumes"]
verbs: ["get", "list", "watch", "create", "delete"]
- apiGroups: [""]
resources: ["persistentvolumeclaims"]
verbs: ["get", "list", "watch", "update"]
- apiGroups: ["storage.k8s.io"]
resources: ["storageclasses"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["events"]
verbs: ["create", "update", "patch"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: run-nfs-client-provisioner
subjects:
- kind: ServiceAccount
name: nfs-client-provisioner
namespace: nfs
roleRef:
kind: ClusterRole
name: nfs-client-provisioner-runner
apiGroup: rbac.authorization.k8s.io
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: leader-locking-nfs-client-provisioner
namespace: nfs
rules:
- apiGroups: [""]
resources: ["endpoints"]
verbs: ["get", "list", "watch", "create", "update", "patch"]
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: leader-locking-nfs-client-provisioner
namespace: nfs
subjects:
- kind: ServiceAccount
name: nfs-client-provisioner
namespace: nfs
roleRef:
kind: Role
name: leader-locking-nfs-client-provisioner
apiGroup: rbac.authorization.k8s.io
root@k8s-ha2-deploy-239:~/case-example# kubectl apply -f rbac.yaml
2、创建 storageclass
root@k8s-ha2-deploy-239:~/case-example# cat storageclass.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: managed-nfs-storage
provisioner: k8s-sigs.io/nfs-subdir-external-provisioner # 需要和创建nfs storageclass中env中的PROVISIONER_NAME变量的值一致
reclaimPolicy: Retain # PV 的删除策略,默认为delete
mountOptions:
- noresvport # NFS 有抖动,pod不会重新挂载NFS,加上此参数,客户端会重新挂载NFS服务
- noatime # 访问文件时不更新文件inode时间戳,高并发环境可提高性能
parameters:
archiveOnDelete: "true" # 删除Pod时保留Pod数据,默认为false
root@k8s-ha2-deploy-239:~/case-example# kubectl apply -f storageclass.yaml
storageclass.storage.k8s.io/managed-nfs-storage created
3、创建 NFS provisioner
(1)创建 NFS Server 的挂载路径
root@k8s-ha1-238:~# mkdir /data/volumes
root@k8s-ha1-238:~# cat /etc/exports
/data/k8sdata *(rw,no_root_squash)
/data/volumes *(rw,no_root_squash)
/data/zookeeperdata *(rw,no_root_squash)
root@k8s-ha1-238:~# systemctl restart nfs-server.service
(2)创建 NFS provisioner
root@k8s-ha2-deploy-239:~/case-example# cat nfs-provisioner.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: nfs-client-provisioner
labels:
app: nfs-client-provisioner
namespace: nfs
spec:
replicas: 1
strategy:
type: Recreate
selector:
matchLabels:
app: nfs-client-provisioner
template:
metadata:
labels:
app: nfs-client-provisioner
spec:
serviceAccountName: nfs-client-provisioner
containers:
- name: nfs-client-provisioner
image: registry.cn-qingdao.aliyuncs.com/zhangshijie/nfs-subdir-external-provisioner:v4.0.2
volumeMounts:
- name: nfs-client-root
mountPath: /persistentvolumes
env:
- name: PROVISIONER_NAME
value: k8s-sigs.io/nfs-subdir-external-provisioner
- name: NFS_SERVER
value: 10.243.20.238
- name: NFS_PATH
value: /data/volumes
volumes:
- name: nfs-client-root
nfs:
server: 10.243.20.238
path: /data/volumes
(3)部署 nfs provisioner
root@k8s-ha2-deploy-239:~/case-example# kubectl apply -f nfs-provisioner.yaml
deployment.apps/nfs-client-provisioner created
root@k8s-ha2-deploy-239:~/case-example# kubectl get storageclasses.storage.k8s.io
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
managed-nfs-storage k8s-sigs.io/nfs-subdir-externel-provisioner Retain Immediate false 24h
6.1.2 部署 zookeeper 集群
root@k8s-ha2-deploy-239:~/zookeeperCase/StatefulSet# ls
zk-cluster-daemonset.yaml
root@k8s-ha2-deploy-239:~/zookeeperCase/StatefulSet# kubectl apply -f zk-cluster-daemonset.yaml
# Pod 会分布在三个节点上
root@k8s-ha2-deploy-239:~/zookeeperCase/StatefulSet# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
net-test1 1/1 Running 118 (91m ago) 49d 10.200.21.63 10.243.20.242 <none> <none>
test-centos 1/1 Running 1 (21h ago) 19d 10.200.19.77 10.243.20.240 <none> <none>
zookeeper-0 1/1 Running 0 3m18s 10.200.21.73 10.243.20.242 <none> <none>
zookeeper-1 1/1 Running 0 3m16s 10.200.19.84 10.243.20.240 <none> <none>
zookeeper-2 1/1 Running 0 3m15s 10.200.143.35 10.243.20.241 <none> <none>
# 每个Pod都会生成对应的pv和pvc
root@k8s-ha2-deploy-239:~/zookeeperCase/StatefulSet# kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-40bd1139-8c1a-4f01-9d61-1fe2c1c592d1 20Gi RWO Retain Bound default/datadir-zookeeper-0 managed-nfs-storage 49m
pvc-73dac627-69d8-4320-a0f3-c20709a262cd 20Gi RWO Retain Bound default/datadir-zookeeper-1 managed-nfs-storage 45m
pvc-7e0335e4-bb16-49f1-97ba-da6742b7d48c 20Gi RWO Retain Bound default/datadir-zookeeper-2 managed-nfs-storage 4m13s
root@k8s-ha2-deploy-239:~/zookeeperCase/StatefulSet# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
datadir-zookeeper-0 Bound pvc-40bd1139-8c1a-4f01-9d61-1fe2c1c592d1 20Gi RWO managed-nfs-storage 49m
datadir-zookeeper-1 Bound pvc-73dac627-69d8-4320-a0f3-c20709a262cd 20Gi RWO managed-nfs-storage 45m
datadir-zookeeper-2 Bound pvc-7e0335e4-bb16-49f1-97ba-da6742b7d48c 20Gi RWO managed-nfs-storage 4m14s
# nfs 服务器上会生成对应的数据存放目录
root@k8s-ha1-238:/data/zookeeperdata# ls -l
total 12
drwxrwxrwx 3 root root 4096 Feb 27 15:09 default-datadir-zookeeper-0-pvc-40bd1139-8c1a-4f01-9d61-1fe2c1c592d1
drwxrwxrwx 3 root root 4096 Feb 27 15:09 default-datadir-zookeeper-1-pvc-73dac627-69d8-4320-a0f3-c20709a262cd
drwxrwxrwx 3 root root 4096 Feb 27 15:10 default-datadir-zookeeper-2-pvc-7e0335e4-bb16-49f1-97ba-da6742b7d48c
查看 zookeeper 集群的状态
# zookeeper-0 和 zookeeper-2 为 follwer 角色,zookeeper-1 为 leader 角色
root@k8s-ha2-deploy-239:~# kubectl exec -it zookeeper-0 sh
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
$ /usr/bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /usr/bin/../etc/zookeeper/zoo.cfg
Mode: follower
$ exit
root@k8s-ha2-deploy-239:~# kubectl exec -it zookeeper-1 sh
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
$ /usr/bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /usr/bin/../etc/zookeeper/zoo.cfg
Mode: leader
$ exit
root@k8s-ha2-deploy-239:~# kubectl exec -it zookeeper-2 sh
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
$ /usr/bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /usr/bin/../etc/zookeeper/zoo.cfg
Mode: follower
zookeeper 根据 Namespace 动态生成配置文件
# zookeeper 的 ID 必须和当前 zookeeper 对应上
# 此处 ID 为 1,表示zookeeper-0 (当前登录的为zookeeper-0的终端)
$ cat /var/lib/zookeeper/data/myid
1
6.1.3 验证 zookeeper 集群
1、获取Service端口信息
root@k8s-ha2-deploy-239:~# kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.100.0.1 <none> 443/TCP 54d
ng-deploy-880 NodePort 10.100.131.4 <none> 880:31880/TCP 42d
zookeeper-headless ClusterIP None <none> 2888/TCP,3888/TCP 42m
zookeeper-service NodePort 10.100.140.149 <none> 2181:32474/TCP 42m
2、配置连接
6.2 基于 Deployment 实现 zookeeper 集群
# 构建zookeeper镜像
root@k8s-ha2-deploy-239:~/zookeeper-case-n79/2.deployment/k8s-data/dockerfile/web/magedu/zookeeper# bash build-command.sh v3.4.14
# 创建数据挂载目录
root@k8s-ha1-238:~# mkdir -p /data/k8sdata/magedu/zookeeper-datadir-1
root@k8s-ha1-238:~# mkdir -p /data/k8sdata/magedu/zookeeper-datadir-2
root@k8s-ha1-238:~# mkdir -p /data/k8sdata/magedu/zookeeper-datadir-3
# 创建pv和pvc
root@k8s-ha2-deploy-239:~/zookeeper-case-n79/2.deployment/k8s-data/yaml/magedu/zookeeper/pv# kubectl apply -f zookeeper-persistentvolume.yaml
persistentvolume/zookeeper-datadir-pv-1 created
persistentvolume/zookeeper-datadir-pv-2 created
persistentvolume/zookeeper-datadir-pv-3 created
root@k8s-ha2-deploy-239:~/zookeeper-case-n79/2.deployment/k8s-data/yaml/magedu/zookeeper/pv# kubectl apply -f zookeeper-persistentvolumeclaim.yaml
persistentvolumeclaim/zookeeper-datadir-pvc-1 created
persistentvolumeclaim/zookeeper-datadir-pvc-2 created
persistentvolumeclaim/zookeeper-datadir-pvc-3 created
root@k8s-ha2-deploy-239:~/zookeeper-case-n79/2.deployment/k8s-data/yaml/magedu/zookeeper/pv# kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
zookeeper-datadir-pv-1 20Gi RWO Retain Bound magedu/zookeeper-datadir-pvc-1 2m9s
zookeeper-datadir-pv-2 20Gi RWO Retain Bound magedu/zookeeper-datadir-pvc-2 2m9s
zookeeper-datadir-pv-3 20Gi RWO Retain Bound magedu/zookeeper-datadir-pvc-3 2m8s
root@k8s-ha2-deploy-239:~/zookeeper-case-n79/2.deployment/k8s-data/yaml/magedu/zookeeper/pv# kubectl get pvc
No resources found in default namespace.
root@k8s-ha2-deploy-239:~/zookeeper-case-n79/2.deployment/k8s-data/yaml/magedu/zookeeper/pv# kubectl get pvc -n magedu
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
zookeeper-datadir-pvc-1 Bound zookeeper-datadir-pv-1 20Gi RWO 39s
zookeeper-datadir-pvc-2 Bound zookeeper-datadir-pv-2 20Gi RWO 39s
zookeeper-datadir-pvc-3 Bound zookeeper-datadir-pv-3 20Gi RWO 39s
# 部署 zookeeper 服务
root@k8s-ha2-deploy-239:~/zookeeper-case-n79/2.deployment/k8s-data/yaml/magedu/zookeeper# kubectl apply -f zookeeper.yaml
service/zookeeper created
service/zookeeper1 created
service/zookeeper2 created
service/zookeeper3 created
deployment.apps/zookeeper1 created
deployment.apps/zookeeper2 created
deployment.apps/zookeeper3 created
root@k8s-ha2-deploy-239:~/zookeeper-case-n79/2.deployment/k8s-data/yaml/magedu/zookeeper# kubectl get pod -n magedu
NAME READY STATUS RESTARTS AGE
zookeeper1-db5579778-hqpk5 1/1 Running 0 43s
zookeeper2-b8c4cb4fc-qljh4 1/1 Running 0 43s
zookeeper3-59d4544db8-lmgst 1/1 Running 1 (26s ago) 43s
# 查看集群角色状态
七、自定义镜像实现动静分离的web服务并结合NFS实现数据共享和持久化
7.1 nginx 的文件
root@k8s-ha2-deploy-239:~# cat nginx.yaml
kind: Deployment
apiVersion: apps/v1
metadata:
labels:
app: myserver-nginx-deployment-label
name: myserver-nginx-deployment
namespace: myserver
spec:
replicas: 1
selector:
matchLabels:
app: myserver-nginx-selector
template:
metadata:
labels:
app: myserver-nginx-selector
spec:
containers:
- name: myserver-nginx-container
image: registry.cn-hangzhou.aliyuncs.com/wuhaolam/myserver:alpine-nginx-v4
imagePullPolicy: Always
ports:
- containerPort: 80
protocol: TCP
name: http
resources:
limits:
cpu: 500m
memory: 2Gi
requests:
cpu: 500m
memory: 2Gi
volumeMounts:
- name: htmlfile
mountPath: /apps/nginx/html
readOnly: false
volumes:
- name: htmlfile
nfs:
server: 10.243.20.238
path: /data/k8sdata
---
kind: Service
apiVersion: v1
metadata:
labels:
app: myserver-nginx-service-label
name: myserver-nginx-service
namespace: myserver
spec:
type: NodePort
ports:
- name: http
port: 80
targetPort: 80
nodePort: 32180
protocol: TCP
selector:
app: myserver-nginx-selector
7.2 tomcat 文件
root@k8s-ha2-deploy-239:~# cat tomcat.yaml
kind: Deployment
apiVersion: apps/v1
metadata:
labels:
app: myserver-tomcat-deployment-label
name: myserver-tomcat-deployment
namespace: myserver
spec:
replicas: 1
selector:
matchLabels:
app: myserver-tomcat-selector
template:
metadata:
labels:
app: myserver-tomcat-selector
spec:
containers:
- name: myserver-tomcat-container
image: registry.cn-hangzhou.aliyuncs.com/wuhaolam/myserver:centos-tomcat-v1
imagePullPolicy: Always
ports:
- containerPort: 8080
protocol: TCP
name: http
resources:
limits:
cpu: 500m
memory: 2Gi
requests:
cpu: 500m
memory: 2Gi
volumeMounts:
- name: jspfile
mountPath: /data/tomcat/webapps/myapp
readOnly: false
volumes:
- name: jspfile
nfs:
server: 10.243.20.238
path: /data/k8sdata/static
---
kind: Service
apiVersion: v1
metadata:
labels:
app: myserver-tomcat-service-label
name: myserver-tomcat-service
namespace: myserver
spec:
type: ClusterIP
ports:
- name: http
port: 8080
targetPort: 8080
protocol: TCP
selector:
app: myserver-tomcat-selector
7.3 准备 NFS 服务器及相关文件
7.3.1 NFS Server 配置
root@k8s-ha1-238:~# apt -y install nfs-server
root@k8s-ha1-238:~# cat /etc/exports
/data/k8sdata *(rw,no_root_squash)
root@k8s-ha1-238:~# systemctl restart nfs-server.service
7.3.2 挂载路径相关文件
1、nginx 的 html 文件
root@k8s-ha1-238:/data/k8sdata# cat index.html
<h1> nfs nginx web page</h1>
2、Tomcat 的 jsp 文件
root@k8s-ha1-238:/data/k8sdata# mkdir static
root@k8s-ha1-238:/data/k8sdata# cat static/index.jsp
<%@page import="java.util.Enumeration"%>
<br />
host: <%try{out.println(""+java.net.InetAddress.getLocalHost().getHostName());}catch(Exception e){}%>
<br />
remoteAddr: <%=request.getRemoteAddr()%>
<br />
remoteHost: <%=request.getRemoteHost()%>
<br />
sessionId: <%=request.getSession().getId()%>
<br />
serverName:<%=request.getServerName()%>
<br />
scheme:<%=request.getScheme()%>
<br />
<%request.getSession().setAttribute("t1","t2");%>
<%
Enumeration en = request.getHeaderNames();
while(en.hasMoreElements()){
String hd = en.nextElement().toString();
out.println(hd+" : "+request.getHeader(hd));
out.println("<br />");
}
%>
3、图片文件
root@k8s-ha1-238:/data/k8sdata# mkdir images
root@k8s-ha1-238:/data/k8sdata# ls images/
adventure-alps.jpg
7.4 部署 Nginx 和 Tomcat 服务
root@k8s-ha2-deploy-239:~# kubectl apply -f nginx.yaml
root@k8s-ha2-deploy-239:~# kubectl apply -f tomcat.yaml
root@k8s-ha2-deploy-239:~# kubectl get pod -n myserver
NAME READY STATUS RESTARTS AGE
myserver-nginx-deployment-845c67cb85-7ltmf 1/1 Running 0 23m
myserver-tomcat-deployment-787c56985-qdswq 1/1 Running 0 23m
7.5 查看结果
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 全程不用写代码,我用AI程序员写了一个飞机大战
· MongoDB 8.0这个新功能碉堡了,比商业数据库还牛
· 记一次.NET内存居高不下排查解决与启示
· 白话解读 Dapr 1.15:你的「微服务管家」又秀新绝活了
· DeepSeek 开源周回顾「GitHub 热点速览」