kubernetes-v1.20.8二进制安装(七)-containerd
1. 部署
为了方便管理,将containerd的二进制执行文件放置在kubernetes/node文件夹中
1.1. 下载
当前稳定版本为1.5.2
containerd_version="1.5.2"
wget -P /opt/software/kubernetes/tools/ https://github.com/containerd/containerd/releases/download/v${containerd_version}/cri-containerd-cni-${containerd_version}-linux-amd64.tar.gz
1.2. 解压
containerd_version="1.5.2"
cd /opt/software/kubernetes/tools/ ; mkdir containerd ; cd containerd
tar -xf ../cri-containerd-cni-${containerd_version}-linux-amd64.tar.gz
# 将会解压出usr etc opt 三个目录, 进行整理, 注释文件部分手动生成,也可直接复制
cp -r etc/cni /opt/software/kubernetes/node/
cp -r opt/cni/bin /opt/software/kubernetes/node/cni
cp usr/local/bin/* /opt/software/kubernetes/node/bin/
cp usr/local/sbin/runc /opt/software/kubernetes/node/bin/
cp -r opt/containerd /opt/software/kubernetes/node/
mkdir /opt/software/kubernetes/node/containerd/{data,run}
# cp -r etc/crictl.yaml /opt/software/kubernetes/node/config
# cp -r etc/systemd/system/containerd.service /opt/software/kubernetes/node/service
修改cni/net.d/10-containerd-net.conflist配置文件subnet网段
cat >/opt/software/kubernetes/node/cni/net.d/10-containerd-net.conflist<<EOF
{
"cniVersion": "0.4.0",
"name": "containerd-net",
"plugins": [
{
"type": "bridge",
"bridge": "cni0",
"isGateway": true,
"ipMasq": true,
"promiscMode": true,
"ipam": {
"type": "host-local",
"ranges": [
[{
"subnet": "172.30.0.0/16"
}],
[{
"subnet": "2001:4860:4860::/64"
}]
],
"routes": [
{ "dst": "0.0.0.0/0" },
{ "dst": "::/0" }
]
}
},
{
"type": "portmap",
"capabilities": {"portMappings": true}
}
]
}
EOF
1.3. 配置containerd文件
生成配置文件
/opt/software/kubernetes/node/bin/containerd config default > /opt/software/kubernetes/node/config/containerd_config.toml
修改containerd_config.toml 某些配置信息, 其余不变
$ vim /opt/software/kubernetes/node/config/containerd_config.toml
root = "/data/containerd"
state = "/run/containerd"
[grpc]
address = "/run/containerd/containerd.sock"
[plugins]
[plugins."io.containerd.grpc.v1.cri"]
sandbox_image = "k8s.gcr.io/pause:3.5"
[plugins."io.containerd.grpc.v1.cri".cni]
bin_dir = "/opt/software/kubernetes/cni/bin"
conf_dir = "/opt/software/kubernetes/cni/net.d"
[plugins."io.containerd.internal.v1.opt"]
path = "/opt/software/kubernetes/containerd"
-
root: 存放数据的根目录 (Default: "/var/lib/containerd")
-
state: 状态目录, (Default: "/run/containerd") , 如无需保留容器状态信息,则不要改变目录指向(/run目录为tmpfs 临时文件夹, 重启后会清楚当前目录数据), 否则在主机重启后,containerd会找该状态目录的容器信息,尝试连接容器, 但容器已在重启后失去进程(containerd-shim),导致containerd无法连接到容器,出现连接等待超时(100s), 即containerd在主机重启后启动异常缓慢(与容器数量成正比)
-
sandbox_image: pause基础镜像, 如无法FQ连接国外网站, 则可修改为
registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.5
1.4. 配置crictl文件
cat > /opt/software/kubernetes/node/config/crictl.yaml<<EOF
runtime-endpoint: unix:///run/containerd/containerd.sock
image-endpoint: unix:///run/containerd/containerd.sock
timeout: 10
debug: false
EOF
1.5. 配置service启动文件
cat >/opt/software/kubernetes/node/service/containerd.service<<EOF
[Unit]
Description=containerd container runtime
Documentation=https://containerd.io
After=network.target
[Service]
Environment="PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/opt/software/kubernetes/bin"
ExecStartPre=-/sbin/modprobe overlay
ExecStartPre=-/sbin/modprobe br_netfilter
ExecStart=/opt/software/kubernetes/bin/containerd --config /opt/software/kubernetes/config/containerd_config.toml
Type=notify
Delegate=yes
KillMode=process
Restart=always
RestartSec=5
LimitNPROC=infinity
LimitCORE=infinity
LimitNOFILE=infinity
TasksMax=infinity
OOMScoreAdjust=-999
[Install]
WantedBy=multi-user.target
EOF
- Environment: 设置PATH环境变量,让containerd服务能执行runc可执行文件
- Delegate : 这个选项允许 Containerd 以及运行时自己管理自己创建的容器的
cgroups
。如果不设置这个选项,systemd 就会将进程移到自己的cgroups
中,从而导致 Containerd 无法正确获取容器的资源使用情况。
- KillMode : 这个选项用来处理 Containerd 进程被杀死的方式。默认情况下,systemd 会在进程的 cgroup 中查找并杀死 Containerd 的所有子进程,这肯定不是我们想要的。
KillMode
字段可以设置的值如下。
我们需要将 KillMode 的值设置为process
,这样可以确保升级或重启 Containerd 时不杀死现有的容器。- control-group(默认值):当前控制组里面的所有子进程,都会被杀掉
- process:只杀主进程
- mixed:主进程将收到 SIGTERM 信号,子进程收到 SIGKILL 信号
- none:没有进程会被杀掉,只是执行服务的 stop 命令。
1.6. 分发
hosts=(node01 node02 node03)
domain='k8s-host.com'
config_files=('config/containerd_config.toml' 'config/crictl.yaml')
cd /opt/software/kubernetes
for host in ${hosts[*]}
do
scp -r node/{bin,cni,containerd,service} ${host}.${domain}:/opt/software/kubernetes/
done
# 不可重复执行,否则覆盖修改的配置
for host in ${hosts[*]}
do
for file in ${config_files[*]}
do
scp -r node/${file} ${host}.${domain}:/opt/software/kubernetes/${file}
done
done
1.7. 服务启动
hosts=(node01 node02 node03)
domain='k8s-host.com'
for host in ${hosts[*]}
do
# 软链接
ssh root@${host}.${domain} "ln -s /opt/software/kubernetes/service/containerd.service /usr/lib/systemd/system/containerd.service "
ssh root@${host}.${domain} "ln -s /opt/software/kubernetes/config/crictl.yaml /etc/crictl.yaml"
# 开机启动并启动服务
ssh root@${host}.${domain} "systemctl daemon-reload && systemctl enable containerd --now "
done
1.8. 验证服务
所有node节点
source /etc/profile
$ crictl ps
$ crictl images
如失败,则会报
WARN[0000] image connect using default endpoints: [unix:///var/run/dockershim.sock unix:///run/containerd/containerd.sock unix:///run/crio/crio.sock]. As the default settings are now deprecated, you should set the endpoint instead.
ERRO[0002] connect endpoint 'unix:///var/run/dockershim.sock', make sure you are running as root and the endpoint has been started: context deadline exceeded
2. 问题
问题一: 在containerd v1.4以上(包括1.5)版本中,发现在主机重启后, containerd服务启动缓慢
Jul 8 15:33:22 node02 containerd: time="2021-07-08T15:33:22.428227691+08:00" level=warning msg="cleaning up after shim disconnected" id=fbf2cf31a9bb1409892e1b4ed0c848264a16e6e24ba8e54c60393b1cabf4d7eb namespace=k8s.io
Jul 8 15:33:22 node02 containerd: time="2021-07-08T15:33:22.428320485+08:00" level=info msg="cleaning up dead shim"
Jul 8 15:33:22 node02 containerd: time="2021-07-08T15:33:22.447609860+08:00" level=warning msg="cleanup warnings time=\"2021-07-08T15:33:22+08:00\" level=info msg=\"starting signal loop\" namespace=k8s.io pid=2378\n"
100s一次报如上信息
相关链接:
- https://github.com/containerd/containerd/issues/5597
- https://github.com/cloudfoundry/garden-runc-release/issues/195
原因分析:
state参数指向了非临时性文件夹, 导致在containerd重启后读取该state目录容器信息,尝试重新连接容器, 但容器在主机重启后已失去自身的进程(containerd-shim-v2), 触发了containerd连接超时等待时间机制(100s一个容器).
修复建议:
state: 状态目录, (Default: "/run/containerd") , 如无需保留容器状态信息,则不要改变目录指向(/run目录为tmpfs 临时文件夹, 重启后会清楚当前目录数据)
state状态目录如有必需持久化, 请查看两个链接描述, 且还要做以下配置
containerd.service 增加 TimeoutSec=0 参数, 不限制超时时间, 否则会有以下报错
containerd.service: start operation timed out. Terminating.
State 'stop-sigterm' timed out. Killing.
containerd.service: Killing process 1511 (containerd) with signal SIGKILL.
containerd.service: Main process exited, code=killed, status=9/KILL
containerd.service: Failed with result 'timeout'.