kubernetes-v1.20.8二进制安装(七)-containerd

1. 部署

为了方便管理,将containerd的二进制执行文件放置在kubernetes/node文件夹中

1.1. 下载

当前稳定版本为1.5.2

containerd_version="1.5.2"
wget -P /opt/software/kubernetes/tools/ https://github.com/containerd/containerd/releases/download/v${containerd_version}/cri-containerd-cni-${containerd_version}-linux-amd64.tar.gz

1.2. 解压

containerd_version="1.5.2"
cd /opt/software/kubernetes/tools/ ; mkdir containerd ; cd containerd
tar -xf ../cri-containerd-cni-${containerd_version}-linux-amd64.tar.gz

# 将会解压出usr etc opt 三个目录, 进行整理,  注释文件部分手动生成,也可直接复制
cp -r etc/cni /opt/software/kubernetes/node/
cp -r opt/cni/bin /opt/software/kubernetes/node/cni
cp usr/local/bin/* /opt/software/kubernetes/node/bin/
cp usr/local/sbin/runc /opt/software/kubernetes/node/bin/
cp -r opt/containerd /opt/software/kubernetes/node/
mkdir /opt/software/kubernetes/node/containerd/{data,run}
# cp -r etc/crictl.yaml /opt/software/kubernetes/node/config
# cp -r etc/systemd/system/containerd.service /opt/software/kubernetes/node/service

修改cni/net.d/10-containerd-net.conflist配置文件subnet网段

cat >/opt/software/kubernetes/node/cni/net.d/10-containerd-net.conflist<<EOF
{
  "cniVersion": "0.4.0",
  "name": "containerd-net",
  "plugins": [
    {
      "type": "bridge",
      "bridge": "cni0",
      "isGateway": true,
      "ipMasq": true,
      "promiscMode": true,
      "ipam": {
        "type": "host-local",
        "ranges": [
          [{
            "subnet": "172.30.0.0/16"
          }],
          [{
            "subnet": "2001:4860:4860::/64"
          }]
        ],
        "routes": [
          { "dst": "0.0.0.0/0" },
          { "dst": "::/0" }
        ]
      }
    },
    {
      "type": "portmap",
      "capabilities": {"portMappings": true}
    }
  ]
}

EOF

1.3. 配置containerd文件

生成配置文件

/opt/software/kubernetes/node/bin/containerd config default > /opt/software/kubernetes/node/config/containerd_config.toml

修改containerd_config.toml 某些配置信息, 其余不变

$ vim /opt/software/kubernetes/node/config/containerd_config.toml
root = "/data/containerd"
state = "/run/containerd"


[grpc]
  address = "/run/containerd/containerd.sock"

[plugins]
  [plugins."io.containerd.grpc.v1.cri"]
    sandbox_image = "k8s.gcr.io/pause:3.5"
    
  [plugins."io.containerd.grpc.v1.cri".cni]
    bin_dir = "/opt/software/kubernetes/cni/bin"
    conf_dir = "/opt/software/kubernetes/cni/net.d"
    
  [plugins."io.containerd.internal.v1.opt"]
    path = "/opt/software/kubernetes/containerd"

  • root: 存放数据的根目录 (Default: "/var/lib/containerd")

  • state: 状态目录, (Default: "/run/containerd") , 如无需保留容器状态信息,则不要改变目录指向(/run目录为tmpfs 临时文件夹, 重启后会清楚当前目录数据), 否则在主机重启后,containerd会找该状态目录的容器信息,尝试连接容器, 但容器已在重启后失去进程(containerd-shim),导致containerd无法连接到容器,出现连接等待超时(100s), 即containerd在主机重启后启动异常缓慢(与容器数量成正比)

  • sandbox_image: pause基础镜像, 如无法FQ连接国外网站, 则可修改为registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.5

1.4. 配置crictl文件

cat > /opt/software/kubernetes/node/config/crictl.yaml<<EOF
runtime-endpoint: unix:///run/containerd/containerd.sock
image-endpoint: unix:///run/containerd/containerd.sock
timeout: 10
debug: false
EOF

1.5. 配置service启动文件

 cat >/opt/software/kubernetes/node/service/containerd.service<<EOF
[Unit]
Description=containerd container runtime
Documentation=https://containerd.io
After=network.target

[Service]
Environment="PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/opt/software/kubernetes/bin"
ExecStartPre=-/sbin/modprobe overlay
ExecStartPre=-/sbin/modprobe br_netfilter
ExecStart=/opt/software/kubernetes/bin/containerd  --config /opt/software/kubernetes/config/containerd_config.toml
Type=notify
Delegate=yes
KillMode=process
Restart=always
RestartSec=5
LimitNPROC=infinity
LimitCORE=infinity
LimitNOFILE=infinity
TasksMax=infinity
OOMScoreAdjust=-999

[Install]
WantedBy=multi-user.target
EOF
  • Environment: 设置PATH环境变量,让containerd服务能执行runc可执行文件
  • Delegate : 这个选项允许 Containerd 以及运行时自己管理自己创建的容器的 cgroups。如果不设置这个选项,systemd 就会将进程移到自己的 cgroups 中,从而导致 Containerd 无法正确获取容器的资源使用情况。
  • KillMode : 这个选项用来处理 Containerd 进程被杀死的方式。默认情况下,systemd 会在进程的 cgroup 中查找并杀死 Containerd 的所有子进程,这肯定不是我们想要的。KillMode字段可以设置的值如下。
    我们需要将 KillMode 的值设置为 process,这样可以确保升级或重启 Containerd 时不杀死现有的容器。
    • control-group(默认值):当前控制组里面的所有子进程,都会被杀掉
    • process:只杀主进程
    • mixed:主进程将收到 SIGTERM 信号,子进程收到 SIGKILL 信号
    • none:没有进程会被杀掉,只是执行服务的 stop 命令。

1.6. 分发

hosts=(node01 node02 node03)
domain='k8s-host.com'
config_files=('config/containerd_config.toml' 'config/crictl.yaml')
cd /opt/software/kubernetes
for host in ${hosts[*]}
do
    scp -r node/{bin,cni,containerd,service} ${host}.${domain}:/opt/software/kubernetes/
done

# 不可重复执行,否则覆盖修改的配置
for host in ${hosts[*]}
do
	for file in ${config_files[*]}
	do
    	scp -r node/${file} ${host}.${domain}:/opt/software/kubernetes/${file}
    done
done

1.7. 服务启动

hosts=(node01 node02 node03)
domain='k8s-host.com'
for host in ${hosts[*]}
do
	# 软链接
    ssh root@${host}.${domain} "ln -s /opt/software/kubernetes/service/containerd.service  /usr/lib/systemd/system/containerd.service "
    ssh root@${host}.${domain} "ln -s /opt/software/kubernetes/config/crictl.yaml  /etc/crictl.yaml"
    # 开机启动并启动服务
    ssh root@${host}.${domain} "systemctl daemon-reload && systemctl enable containerd --now "
done

1.8. 验证服务

所有node节点

source /etc/profile

$ crictl ps

$ crictl images

如失败,则会报

WARN[0000] image connect using default endpoints: [unix:///var/run/dockershim.sock unix:///run/containerd/containerd.sock unix:///run/crio/crio.sock]. As the default settings are now deprecated, you should set the endpoint instead. 
ERRO[0002] connect endpoint 'unix:///var/run/dockershim.sock', make sure you are running as root and the endpoint has been started: context deadline exceeded 

2. 问题

问题一: 在containerd v1.4以上(包括1.5)版本中,发现在主机重启后, containerd服务启动缓慢

Jul  8 15:33:22 node02 containerd: time="2021-07-08T15:33:22.428227691+08:00" level=warning msg="cleaning up after shim disconnected" id=fbf2cf31a9bb1409892e1b4ed0c848264a16e6e24ba8e54c60393b1cabf4d7eb namespace=k8s.io
Jul  8 15:33:22 node02 containerd: time="2021-07-08T15:33:22.428320485+08:00" level=info msg="cleaning up dead shim"
Jul  8 15:33:22 node02 containerd: time="2021-07-08T15:33:22.447609860+08:00" level=warning msg="cleanup warnings time=\"2021-07-08T15:33:22+08:00\" level=info msg=\"starting signal loop\" namespace=k8s.io pid=2378\n"

100s一次报如上信息

相关链接:

原因分析:

​ state参数指向了非临时性文件夹, 导致在containerd重启后读取该state目录容器信息,尝试重新连接容器, 但容器在主机重启后已失去自身的进程(containerd-shim-v2), 触发了containerd连接超时等待时间机制(100s一个容器).

修复建议:

​ state: 状态目录, (Default: "/run/containerd") , 如无需保留容器状态信息,则不要改变目录指向(/run目录为tmpfs 临时文件夹, 重启后会清楚当前目录数据)

​ state状态目录如有必需持久化, 请查看两个链接描述, 且还要做以下配置

containerd.service 增加 TimeoutSec=0 参数, 不限制超时时间, 否则会有以下报错

containerd.service: start operation timed out. Terminating.

State 'stop-sigterm' timed out. Killing.
containerd.service: Killing process 1511 (containerd) with signal SIGKILL.
containerd.service: Main process exited, code=killed, status=9/KILL
containerd.service: Failed with result 'timeout'.

3. 参考链接

containerd官网

containerd githup

posted @ 2021-09-03 18:03  风吹蛋生丶  阅读(1817)  评论(1编辑  收藏  举报