以沙箱的方式运行容器:安全容器gvisor

一.系统环境

本文主要基于Kubernetes1.22.2和Linux操作系统Ubuntu 18.04。

服务器版本 docker软件版本 Kubernetes(k8s)集群版本 gVisor软件版本 containerd软件版本 CPU架构
Ubuntu 18.04.5 LTS Docker version 20.10.14 v1.22.2 1.0.2-dev 1.6.4 x86_64

Kubernetes集群架构:k8scludes1作为master节点,k8scludes2,k8scludes3作为worker节点。

服务器 操作系统版本 CPU架构 进程 功能描述
k8scludes1/192.168.110.128 Ubuntu 18.04.5 LTS x86_64 docker,kube-apiserver,etcd,kube-scheduler,kube-controller-manager,kubelet,kube-proxy,coredns,calico k8s master节点
k8scludes2/192.168.110.129 Ubuntu 18.04.5 LTS x86_64 docker,kubelet,kube-proxy,calico k8s worker节点
k8scludes3/192.168.110.130 Ubuntu 18.04.5 LTS x86_64 docker,kubelet,kube-proxy,calico k8s worker节点

二.前言

容器技术的发展极大地提高了开发和部署的效率,但容器的安全性一直是一个不容忽视的问题。传统的Docker容器虽然方便快捷,但在隔离机制上存在一定的缺陷。本文将介绍一种更为安全可靠的容器运行时解决方案——Gvisor。

以沙箱的方式运行容器的前提是已经有一套可以正常运行的Kubernetes集群,关于Kubernetes(k8s)集群的安装部署,可以查看博客《Ubuntu 安装部署Kubernetes(k8s)集群》https://www.cnblogs.com/renshengdezheli/p/17632858.html。

三.安全容器隔离技术简介

安全容器是一种运行时技术,为容器应用提供一个完整的操作系统执行环境,但将应用的执行与宿主机操作系统隔离开,避免应用直接访问主机资源,从而可以在容器主机之间或容器之间提供额外的保护。另外一种安全容器为Kata Containers,相关详细操作请查看博客《以沙箱的方式运行容器:安全容器Kata Containers》。

四.Gvisor简介

gVisor是由Google开发的一种轻量级的容器隔离技术。它通过在容器与主机操作系统之间插入一个虚拟化层来实现隔离。gVisor提供了一个类似于Linux内核的API,使得容器可以在一个更加受控的环境中运行。它使用了一种称为“Sandbox”的机制,将容器的系统调用转换为对gVisor的API调用,然后再由gVisor转发给宿主操作系统。这种方式可以有效地隔离容器与主机操作系统之间的资源访问,提高了容器的安全性。

gVisor的虚拟化层引入了一定的性能开销,但是相对于传统的虚拟机来说,它的性能损失较小。根据Google的测试数据,gVisor的性能损失在10%左右。这主要是因为gVisor使用了一些优化技术,如JIT编译器和缓存机制,来减少虚拟化层的开销。gVisor还支持多核并发,可以在多核系统上实现更好的性能。

gVisor 工作的核心,在于它为应用进程、也就是用户容器,启动了一个名叫 Sentry 的进程。 而 Sentry 进程的主要职责,就是提供一个传统的操作系统内核的能力,即:运行用户程序,执行系统调用。所以说,Sentry 并不是使用 Go 语言重新实现了一个完整的 Linux 内核,而只是一个对应用进程“冒充”内核的系统组件。

在这种设计思想下,我们就不难理解,Sentry 其实需要自己实现一个完整的 Linux 内核网络栈,以便处理应用进程的通信请求。然后,把封装好的二层帧直接发送给 Kubernetes 设置的 Pod 的 Network Namespace 即可。

image-20240607111625439

五.容器runtime简介

在容器技术中,运行时(Runtime)是管理容器生命周期的软件。根据其提供的功能复杂度,可以将容器运行时分为低级别运行时和高级别运行时。

低级别运行时(Low-Level Runtime)通常指的是直接与操作系统内核交互的容器运行时管理工具。这些工具负责容器镜像的加载、容器的创建、启动、停止以及容器内部进程的管理。低级别运行时提供的功能主要包括:

  • 容器镜像管理:处理容器的镜像下载、存储和更新。
  • 容器生命周期管理:包括容器的创建、运行、暂停、恢复、停止和删除。
  • 进程和资源隔离:通过操作系统的控制组(cgroups)和命名空间(namespaces)实现资源的隔离和分配。
  • 网络配置:为容器提供网络接口和IP地址,以及容器间的通信机制。

低级别运行时有runC,lxc,gvisor,kata等等。

高级别运行时(High-Level Runtime)则通常是指在低级别运行时之上的容器编排和管理工具,它们提供了更高级的抽象和更多的管理功能。这些工具通常包括:

  • 容器编排:自动化容器的部署、扩展和管理。
  • 服务发现和负载均衡:自动配置服务间的相互发现和流量分配。
  • 存储编排:管理容器的持久化数据和存储卷。
  • 资源监控和日志管理:收集容器运行的监控数据和日志信息,以供分析和监控使用。

高级别运行时有docker,containerd,podman,ckt,cri-o,高级别运行时会调用低级别runtime。

k8s本身是不管理容器的,管理容器需要调用高级别运行时,k8s调用高级别运行时需要使用shim(垫片)接口,调用docker使用dockershim,调用containerd使用containerdshim,以此类推,kubelet里内置了dockershim,k8s1.24的时候要去除dockershim代码。

在实际应用中,低级别运行时和高级别运行时通常是协作工作的。低级别运行时负责底层的容器管理,而高级别运行时则在此基础上提供了更复杂的业务逻辑和自动化管理功能。

六.docker容器缺陷

可以查看docker默认的运行时,现在默认的runtime是runc。

root@k8scludes1:~# docker info | grep Runtime
 Runtimes: runc io.containerd.runc.v2 io.containerd.runtime.v1.linux
 Default Runtime: runc

现在宿主机上没有nginx进程。现在提出一个问题:“在宿主机上使用runc运行一个nginx容器,nginx容器运行着nginx进程,宿主机没运行nginx进程,在宿主机里能否看到nginx进程吗?”

root@k8scludes1:~# ps -ef | grep nginx | grep -v grep

现在有一个nginx镜像。

root@k8scludes1:~# docker images | grep nginx
nginx                                                             latest    605c77e624dd   5 months ago    141MB

使用nginx镜像创建一个容器。关于创建容器的详细操作,请查看博客《一文搞懂docker容器基础:docker镜像管理,docker容器管理》。

root@k8scludes1:~# docker run -dit --name=nginxrongqi --restart=always nginx
7844b98cf01cc1b6ba05c575d284146c47cb3fb66e1fa61d6eeac696f0dbc1c3

root@k8scludes1:~# docker ps | grep nginx
7844b98cf01c   nginx                                               "/docker-entrypoint.…"   8 seconds ago    Up 6 seconds    80/tcp    nginxrongqi

查看宿主机的nginx进程,宿主机可以看到nginx进程。

docker默认的runtime为runc,通过runc创建出来的容器,会共享宿主机的进程空间和内核空间,容器的进程是暴露给宿主机的,如果容器里存在漏洞,不法分子会使用容器漏洞影响到宿主机的安全。

root@k8scludes1:~# ps -ef | grep nginx 
root      45384  45337  0 15:33 pts/0    00:00:00 nginx: master process nginx -g daemon off;
systemd+  45465  45384  0 15:33 pts/0    00:00:00 nginx: worker process
systemd+  45466  45384  0 15:33 pts/0    00:00:00 nginx: worker process
systemd+  45467  45384  0 15:33 pts/0    00:00:00 nginx: worker process
systemd+  45468  45384  0 15:33 pts/0    00:00:00 nginx: worker process
root      46215   6612  0 15:34 pts/0    00:00:00 grep --color=auto nginx

以沙箱的方式运行容器,在宿主机里就看不到容器里运行的进程了,runc默认是不支持以沙箱的方式运行容器的,所以我们需要配置高级别runtime调用其他的低级别runtime运行,以实现沙箱的方式运行容器。

七.配置docker使用gVisor作为runtime

7.1 安装docker

我们在客户端机器etcd2(centos系统)上安装docker。

[root@etcd2 ~]# yum -y install docker-ce

设置docker开机自启动并现在启动docker。

[root@etcd2 ~]# systemctl enable docker --now
Created symlink from /etc/systemd/system/multi-user.target.wants/docker.service to /usr/lib/systemd/system/docker.service.

[root@etcd2 ~]# systemctl status docker
● docker.service - Docker Application Container Engine
   Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
   Active: active (running) since 二 2022-06-07 11:07:18 CST; 7s ago
     Docs: https://docs.docker.com
 Main PID: 1231 (dockerd)
   Memory: 36.9M
   CGroup: /system.slice/docker.service
           └─1231 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock

查看docker版本。

[root@etcd2 ~]# docker --version
Docker version 20.10.12, build e91ed57

配置docker镜像加速器。

[root@etcd2 ~]# vim /etc/docker/daemon.json

[root@etcd2 ~]# cat /etc/docker/daemon.json
{
"registry-mirrors": ["https://frz7i079.mirror.aliyuncs.com"] 
}

重启docker。

[root@etcd2 ~]# systemctl restart docker

设置iptables不对bridge的数据进行处理,启用IP路由转发功能。

[root@etcd2 ~]# vim /etc/sysctl.d/k8s.conf 

[root@etcd2 ~]# cat /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1

使配置生效。

[root@etcd2 ~]# sysctl -p /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1

现在docker默认的runtime为runc。

[root@etcd2 ~]# docker info | grep -i runtime
 Runtimes: runc io.containerd.runc.v2 io.containerd.runtime.v1.linux
 Default Runtime: runc

下面开始配置docker使用gvisor作为runtime。

7.2 升级系统内核

查看操作系统版本。

[root@etcd2 ~]# cat /etc/redhat-release 
CentOS Linux release 7.4.1708 (Core) 

查看系统内核。

[root@etcd2 ~]# uname -r
3.10.0-693.el7.x86_64

gVisor supports x86_64 and ARM64, and requires Linux 4.14.77+ ,安装gVisor需要Linux内核高于4.14.77,而当前内核版本只有3.10.0,需要升级系统内核。升级系统内核分为离线升级系统内核和在线升级系统内核,在博客《centos7 离线升级/在线升级操作系统内核》中进行了详细描述。

本文采用离线升级系统内核的方法。

更新yum源仓库。

[root@etcd2 ~]# yum -y update

启用 ELRepo 仓库,ELRepo 仓库是基于社区的用于企业级 Linux 仓库,提供对 RedHat Enterprise (RHEL) 和 其他基于 RHEL的 Linux 发行版(CentOS、Scientific、Fedora 等)的支持。ELRepo 聚焦于和硬件相关的软件包,包括文件系统驱动、显卡驱动、网络驱动、声卡驱动和摄像头驱动等。

导入ELRepo仓库的公共密钥。

[root@etcd2 ~]#  rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org

安装ELRepo仓库的yum源。

[root@etcd2 ~]#  rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-3.el7.elrepo.noarch.rpm

从elrepo下载系统内核包,如果不导入ELRepo仓库的公共密钥和安装ELRepo仓库的yum源,是下载不了内核包的。

[root@etcd2 ~]# wget https://elrepo.org/linux/kernel/el7/x86_64/RPMS/kernel-lt-5.4.160-1.el7.elrepo.x86_64.rpm

清华的这个镜像站可以直接下载。

[root@etcd2 ~]# wget https://mirrors.tuna.tsinghua.edu.cn/elrepo/kernel/el7/x86_64/RPMS/kernel-lt-5.4.197-1.el7.elrepo.x86_64.rpm --no-check-certificate

现在内核包就下载好了。

kernel-ml代表主线版本,总是保持主线最新的内核,kernel-lt代表长期支持版本,支持周期更长,如果你要追求最新的版本,直接选择带ml的rpm包即可,如果你要追求稳定且更长的支持周期,直接选择lt版本即可。

[root@etcd2 ~]# ll -h kernel-lt-5.4.197-1.el7.elrepo.x86_64.rpm*
-rw-r--r-- 1 root root 51M 6月   5 19:47 kernel-lt-5.4.197-1.el7.elrepo.x86_64.rpm

安装内核包。

[root@etcd2 ~]# rpm -ivh kernel-lt-5.4.197-1.el7.elrepo.x86_64.rpm
警告:kernel-lt-5.4.197-1.el7.elrepo.x86_64.rpm: 头V4 DSA/SHA256 Signature, 密钥 ID baadae52: NOKEY
准备中...                          ################################# [100%]
正在升级/安装...
   1:kernel-lt-5.4.197-1.el7.elrepo   ################################# [100%]

内核升级完毕后,需要我们修改内核的启动顺序,默认启动的顺序应该为1,升级以后内核是往前面插入为0,设置GRUB_DEFAULT=0。一般新安装的内核在第一个位置,所以设置default=0,意思是 GRUB 初始化页面的第一个内核将作为默认内核。

默认的grub文件,GRUB_DEFAULT=saved。

[root@etcd2 ~]# cat /etc/default/grub
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="gfxterm"
GRUB_CMDLINE_LINUX="rhgb quiet nomodeset"
GRUB_DISABLE_RECOVERY="true"

使 GRUB_DEFAULT=0。

[root@etcd2 ~]# vim /etc/default/grub

[root@etcd2 ~]# cat /etc/default/grub
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=0
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="gfxterm"
GRUB_CMDLINE_LINUX="rhgb quiet nomodeset"
GRUB_DISABLE_RECOVERY="true"

设置默认启动内核,grub2-set-default 0和/etc/default/grub文件里的GRUB_DEFAULT=0意思一样。

[root@etcd2 ~]# grub2-set-default 0

查看所有的内核。

[root@etcd2 ~]# awk -F\' '$1=="menuentry " {print i++ " : " $2}' /boot/grub2/grub.cfg
0 : CentOS Linux 7 Rescue 12667e2174a8483e915fd89a3bc359fc (5.4.197-1.el7.elrepo.x86_64)
1 : CentOS Linux (5.4.197-1.el7.elrepo.x86_64) 7 (Core)
2 : CentOS Linux (3.10.0-693.el7.x86_64) 7 (Core)
3 : CentOS Linux (0-rescue-80c608ceab5342779ba1adc2ac29c213) 7 (Core)

重新生成grub配置文件。

[root@etcd2 ~]# vim /boot/grub2/grub.cfg

[root@etcd2 ~]# grub2-mkconfig -o /boot/grub2/grub.cfg
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-5.4.197-1.el7.elrepo.x86_64
Found initrd image: /boot/initramfs-5.4.197-1.el7.elrepo.x86_64.img
Found linux image: /boot/vmlinuz-3.10.0-693.el7.x86_64
Found initrd image: /boot/initramfs-3.10.0-693.el7.x86_64.img
Found linux image: /boot/vmlinuz-0-rescue-12667e2174a8483e915fd89a3bc359fc
Found initrd image: /boot/initramfs-0-rescue-12667e2174a8483e915fd89a3bc359fc.img
Found linux image: /boot/vmlinuz-0-rescue-80c608ceab5342779ba1adc2ac29c213
Found initrd image: /boot/initramfs-0-rescue-80c608ceab5342779ba1adc2ac29c213.img
done

重启并查看内核版本。

[root@etcd2 ~]# reboot

可以看到内核升级成功。

[root@etcd2 ~]# uname -r
5.4.197-1.el7.elrepo.x86_64

[root@etcd2 ~]# uname -rs
Linux 5.4.197-1.el7.elrepo.x86_64

7.3 安装gvisor

查看CPU架构。

[root@etcd2 ~]# uname -m
x86_64

下载runsc,containerd-shim-runsc-v1,以及对应的校验和:runsc.sha512,containerd-shim-runsc-v1.sha512。

[root@etcd2 ~]# wget https://storage.googleapis.com/gvisor/releases/release/latest/x86_64/runsc https://storage.googleapis.com/gvisor/releases/release/latest/x86_64/runsc.sha512 https://storage.googleapis.com/gvisor/releases/release/latest/x86_64/containerd-shim-runsc-v1 https://storage.googleapis.com/gvisor/releases/release/latest/x86_64/containerd-shim-runsc-v1.sha512

[root@etcd2 ~]# ll -h runsc* containerd-shim*
-rw-r--r-- 1 root root 25M 5月  17 00:22 containerd-shim-runsc-v1
-rw-r--r-- 1 root root 155 5月  17 00:22 containerd-shim-runsc-v1.sha512
-rw-r--r-- 1 root root 38M 5月  17 00:22 runsc
-rw-r--r-- 1 root root 136 5月  17 00:22 runsc.sha512

使用sha512sum校验文件是否完整。

[root@etcd2 ~]# sha512sum -c runsc.sha512 -c containerd-shim-runsc-v1.sha512
runsc: 确定
containerd-shim-runsc-v1: 确定

[root@etcd2 ~]# cat *sha512
f24834bbd4d14d0d0827e31276ff74a1e08b7ab366c4a30fe9c30d656c1ec5cbfc2544fb06698b4749791e0c6f80e6d16ec746963ff6ecebc246dc6e5b2f34ba  containerd-shim-runsc-v1
e5bc1c46d021246a69174aae71be93ff49661ff08eb6a957f7855f36076b44193765c966608d11a99f14542612438634329536d88fccb4b12bdd9bf2af20557f  runsc

授予可执行权限。

[root@etcd2 ~]# chmod a+rx runsc containerd-shim-runsc-v1

把文件移动到/usr/local/bin目录下。

[root@etcd2 ~]# mv runsc containerd-shim-runsc-v1 /usr/local/bin

安装gvisor。

[root@etcd2 ~]# /usr/local/bin/runsc install
2022/06/07 13:04:16 Added runtime "runsc" with arguments [] to "/etc/docker/daemon.json".

安装gvisor之后,/etc/docker/daemon.json文件会新增runtimes:runsc: "path": "/usr/local/bin/runsc"。

注意:/etc/docker/daemon.json文件里的"runtimes":"runsc",runsc可以更改为其他名字,比如:"runtimes":"gvisor"。

[root@etcd2 ~]# cat /etc/docker/daemon.json
{
    "registry-mirrors": [
        "https://frz7i079.mirror.aliyuncs.com"
    ],
    "runtimes": {
        "runsc": {
            "path": "/usr/local/bin/runsc"
        }
    }
}

重新加载配置文件并重启docker。

[root@etcd2 ~]# systemctl daemon-reload ;systemctl restart docker 

查看runtime,可以发现Runtimes里现在已经有runsc了,说明现在docker是支持gvisor这个runtime的。

[root@etcd2 ~]# docker info | grep -i runtime
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc runsc
 Default Runtime: runc

查看runsc版本。

[root@etcd2 ~]# runsc --version
runsc version release-20220510.0
spec: 1.0.2-dev

7.4 配置docker默认的runtime为gVisor

查看docker状态。

[root@etcd2 ~]# systemctl status docker
● docker.service - Docker Application Container Engine
   Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
   Active: active (running) since 二 2022-06-07 17:02:18 CST; 12min ago
     Docs: https://docs.docker.com
 Main PID: 1109 (dockerd)
   Memory: 130.7M
   CGroup: /system.slice/docker.service
           └─1109 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock

docker启动参数如下:

[root@etcd2 ~]# cat /usr/lib/systemd/system/docker.service | grep ExecStart
ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock

查看docker帮助,--default-runtime可以指定docker的Default Runtime。

[root@etcd2 ~]# dockerd --help | grep default-runtime
      --default-runtime string                  Default OCI runtime for containers (default "runc")

现在需要修改docker的启动参数ExecStart,指定docker默认使用runsc作为runtime。

[root@etcd2 ~]# vim /usr/lib/systemd/system/docker.service

#--default-runtime runsc指定docker的Default Runtime为gvisor
[root@etcd2 ~]# cat /usr/lib/systemd/system/docker.service | grep ExecStart
ExecStart=/usr/bin/dockerd --default-runtime runsc -H fd:// --containerd=/run/containerd/containerd.sock

重新加载配置文件并重启docker。

[root@etcd2 ~]# systemctl daemon-reload ; systemctl restart docker

现在docker的Default Runtime就为gvisor了。

[root@etcd2 ~]# docker info | grep -i runtime
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc runsc
 Default Runtime: runsc

7.5 docker使用gVisor作为runtime创建容器

拉取nginx镜像。

[root@etcd2 ~]# docker pull hub.c.163.com/library/nginx:latest
latest: Pulling from library/nginx
5de4b4d551f8: Pull complete 
d4b36a5e9443: Pull complete 
0af1f0713557: Pull complete 
Digest: sha256:f84932f738583e0169f94af9b2d5201be2dbacc1578de73b09a6dfaaa07801d6
Status: Downloaded newer image for hub.c.163.com/library/nginx:latest
hub.c.163.com/library/nginx:latest

[root@etcd2 ~]# docker images 
REPOSITORY                    TAG       IMAGE ID       CREATED        SIZE
hub.c.163.com/library/nginx   latest    46102226f2fd   5 years ago    109MB

使用nginx镜像创建一个容器,默认是使用gVisor(runsc)创建的容器。

如果已经安装了gVisor,但是docker的Default Runtime为runc,则可以使用--runtime=runsc指定gvisor作为runtime创建容器,即:docker run -dit --runtime=runsc --name=nginxweb --restart=always hub.c.163.com/library/nginx:latest

[root@etcd2 ~]# docker run -dit  --name=nginxweb --restart=always hub.c.163.com/library/nginx:latest
9a7b9091d0d07052ae972b480687e7a345ae22e0e4968e91133b1ad6ac1d5b3a

查看容器。

[root@etcd2 ~]# docker ps
CONTAINER ID   IMAGE                                COMMAND                  CREATED         STATUS          PORTS     NAMES
9a7b9091d0d0   hub.c.163.com/library/nginx:latest   "nginx -g 'daemon of…"   3 minutes ago   Up 3 minutes    80/tcp    nginxweb
bc99f286802f   quay.io/calico/node:v2.6.12          "start_runit"            3 months ago    Up 19 seconds             calico-node

gvisor以沙箱的方式运行容器,在宿主机里就看不到容器里运行的进程了。

[root@etcd2 ~]# ps -ef | grep nginx
root       9031   2916  0 17:54 pts/1    00:00:00 grep --color=auto nginx

删除容器。

[root@etcd2 ~]# docker rm -f nginxweb
nginxweb

[root@etcd2 ~]# docker ps
CONTAINER ID   IMAGE                         COMMAND         CREATED        STATUS         PORTS     NAMES
bc99f286802f   quay.io/calico/node:v2.6.12   "start_runit"   3 months ago   Up 7 seconds             calico-node

八.配置containerd使用gvisor作为runtime

8.1 安装containerd

如果你熟悉docker,但是不了解containerd,请查看博客《在centos下使用containerd管理容器:5分钟从docker转型到containerd》,里面有详细讲解。

我们在客户端机器ubuntuk8sclient(ubuntu系统)上安装containerd。

更新软件源。

root@ubuntuk8sclient:~#  apt-get update

安装containerd。

root@ubuntuk8sclient:~# apt-get -y install containerd.io cri-tools 

设置containerd开机自启动并现在启动containerd。

root@ubuntuk8sclient:~# systemctl enable containerd --now

查看containerd状态。

root@ubuntuk8sclient:~# systemctl is-active containerd
active

root@ubuntuk8sclient:~# systemctl status containerd
● containerd.service - containerd container runtime
   Loaded: loaded (/lib/systemd/system/containerd.service; enabled; vendor preset: enabled)
   Active: active (running) since Sat 2022-06-04 15:54:08 CST; 58min ago
     Docs: https://containerd.io
 Main PID: 722 (containerd)
    Tasks: 8
   CGroup: /system.slice/containerd.service
           └─722 /usr/bin/containerd

containerd的配置文件为/etc/containerd/config.toml 。

root@ubuntuk8sclient:~# ll -h /etc/containerd/config.toml
-rw-r--r-- 1 root root 886 May  4 17:04 /etc/containerd/config.toml

containerd的默认配置文件/etc/containerd/config.toml 内容如下:

root@ubuntuk8sclient:~# cat /etc/containerd/config.toml
#   Copyright 2018-2022 Docker Inc.

#   Licensed under the Apache License, Version 2.0 (the "License");
#   you may not use this file except in compliance with the License.
#   You may obtain a copy of the License at

#       http://www.apache.org/licenses/LICENSE-2.0

#   Unless required by applicable law or agreed to in writing, software
#   distributed under the License is distributed on an "AS IS" BASIS,
#   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#   See the License for the specific language governing permissions and
#   limitations under the License.

disabled_plugins = ["cri"]

#root = "/var/lib/containerd"
#state = "/run/containerd"
#subreaper = true
#oom_score = 0

#[grpc]
#  address = "/run/containerd/containerd.sock"
#  uid = 0
#  gid = 0

#[debug]
#  address = "/run/containerd/debug.sock"
#  uid = 0
#  gid = 0
#  level = "info"

可以使用containerd config default > /etc/containerd/config.toml生成默认的配置文件,containerd config default生成的配置文件内容还是挺多的。

root@ubuntuk8sclient:~# containerd config default > /etc/containerd/config.toml

root@ubuntuk8sclient:~# vim /etc/containerd/config.toml 

containerd config dump显示当前的配置。

root@ubuntuk8sclient:~# containerd config dump
disabled_plugins = []
imports = ["/etc/containerd/config.toml"]
oom_score = 0
plugin_dir = ""
required_plugins = []
root = "/var/lib/containerd"
......
......
  address = ""
  gid = 0
  uid = 0

查看containerd版本。

root@ubuntuk8sclient:~# containerd --version
containerd containerd.io 1.6.4 212e8b6fa2f44b9c21b2798135fc6fb7c53efc16

root@ubuntuk8sclient:~# containerd -v
containerd containerd.io 1.6.4 212e8b6fa2f44b9c21b2798135fc6fb7c53efc16

修改配置文件,添加阿里云镜像加速器。

root@ubuntuk8sclient:~# vim /etc/containerd/config.toml 

root@ubuntuk8sclient:~# grep endpoint /etc/containerd/config.toml
    endpoint = "https://frz7i079.mirror.aliyuncs.com"

SystemdCgroup = false修改为SystemdCgroup = true。

root@ubuntuk8sclient:~# vim /etc/containerd/config.toml 

root@ubuntuk8sclient:~# grep SystemdCgroup -B 11 /etc/containerd/config.toml 
          [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
            BinaryName = ""
            CriuImagePath = ""
            CriuPath = ""
            CriuWorkPath = ""
            IoGid = 0
            IoUid = 0
            NoNewKeyring = false
            NoPivotRoot = false
            Root = ""
            ShimCgroup = ""
            SystemdCgroup = true

有个sandbox的镜像,k8s.gcr.io/pause:3.6访问不了。

root@ubuntuk8sclient:~# grep sandbox_image /etc/containerd/config.toml
    sandbox_image = "k8s.gcr.io/pause:3.6"

修改sandbox镜像为可以访问的阿里云镜像。

root@ubuntuk8sclient:~# vim /etc/containerd/config.toml 

root@ubuntuk8sclient:~# grep sandbox_image /etc/containerd/config.toml
    sandbox_image = "registry.aliyuncs.com/google_containers/pause:3.6"

重新加载配置文件并重启containerd服务。

root@ubuntuk8sclient:~# systemctl daemon-reload ; systemctl restart containerd

containerd 客户端工具有 ctr 和 crictl ,如果使用 crictl 命令的话,需要执行 crictl config runtime-endpoint unix:///var/run/containerd/containerd.sock ,不然会有告警。

root@ubuntuk8sclient:~# crictl config runtime-endpoint unix:///var/run/containerd/containerd.sock

查看containerd信息。

root@ubuntuk8sclient:~# crictl info 
{
  "status": {
    "conditions": [
      {
        "type": "RuntimeReady",
        "status": true,
        "reason": "",
        "message": ""
      },
......
    "enableUnprivilegedPorts": false,
    "enableUnprivilegedICMP": false,
    "containerdRootDir": "/var/lib/containerd",
    "containerdEndpoint": "/run/containerd/containerd.sock",
    "rootDir": "/var/lib/containerd/io.containerd.grpc.v1.cri",
    "stateDir": "/run/containerd/io.containerd.grpc.v1.cri"
  },
  "golang": "go1.17.9",
  "lastCNILoadStatus": "cni config load failed: no network config found in /etc/cni/net.d: cni plugin not initialized: failed to load cni config",
  "lastCNILoadStatus.default": "cni config load failed: no network config found in /etc/cni/net.d: cni plugin not initialized: failed to load cni config"
}

containerd里有命名空间的概念,docker里没有命名空间,对于containerd,在default命名空间里拉取的镜像和创建的容器,在其他命名空间是看不到的,如果这个containerd节点加入到k8s环境中,则k8s默认使用k8s.io这个命名空间。

查看命名空间。

root@ubuntuk8sclient:~# ctr ns list
NAME         LABELS 
moby                
plugins.moby        

查看镜像。

root@ubuntuk8sclient:~# ctr i list
REF TYPE DIGEST SIZE PLATFORMS LABELS 

root@ubuntuk8sclient:~# crictl images
IMAGE               TAG                 IMAGE ID            SIZE

使用crictl拉取镜像。

root@ubuntuk8sclient:~# crictl pull nginx
Image is up to date for sha256:0e901e68141fd02f237cf63eb842529f8a9500636a9419e3cf4fb986b8fe3d5d

root@ubuntuk8sclient:~# crictl images
IMAGE                     TAG                 IMAGE ID            SIZE
docker.io/library/nginx   latest              0e901e68141fd       56.7MB

ctr和crictl更多命令细节,请查看博客《在centos下使用containerd管理容器:5分钟从docker转型到containerd》。

containerd 客户端工具 ctr 和 crictl 不好用,推荐使用nerdctl,nerdctl是containerd的cli客户端工具,与docker cli大部分兼容,用法类似docker命令。

使用nerdctl命令需要两个安装包nerdctl-0.20.0-linux-amd64.tar.gz和cni-plugins-linux-amd64-v1.1.1.tgz。

nerdctl-0.20.0-linux-amd64.tar.gz下载地址:https://github.com/containerd/nerdctl/releases

网络插件cni-plugins-linux-amd64-v1.1.1.tgz下载地址:https://github.com/containernetworking/plugins/releases

root@ubuntuk8sclient:~# ll -h cni-plugins-linux-amd64-v1.1.1.tgz nerdctl-0.20.0-linux-amd64.tar.gz 
-rw-r--r-- 1 root root  35M Jun  5 12:19 cni-plugins-linux-amd64-v1.1.1.tgz
-rw-r--r-- 1 root root 9.8M Jun  5 12:15 nerdctl-0.20.0-linux-amd64.tar.gz

分别进行解压。

root@ubuntuk8sclient:~# tar xf nerdctl-0.20.0-linux-amd64.tar.gz -C /usr/local/bin/

root@ubuntuk8sclient:~# ls /usr/local/bin/
containerd-rootless-setuptool.sh  containerd-rootless.sh  nerdctl

root@ubuntuk8sclient:~# mkdir -p /opt/cni/bin

root@ubuntuk8sclient:~# tar xf cni-plugins-linux-amd64-v1.1.1.tgz -C /opt/cni/bin/

root@ubuntuk8sclient:~# ls /opt/cni/bin/
bandwidth  bridge  dhcp  firewall  host-device  host-local  ipvlan  loopback  macvlan  portmap  ptp  sbr  static  tuning  vlan  vrf

配置nerdctl命令tab自动补全,添加source <(nerdctl completion bash)。

root@ubuntuk8sclient:~# vim /etc/profile

root@ubuntuk8sclient:~# cat /etc/profile | head -3
# /etc/profile: system-wide .profile file for the Bourne shell (sh(1))
# and Bourne compatible shells (bash(1), ksh(1), ash(1), ...).
source <(nerdctl completion bash)

root@ubuntuk8sclient:~# nerdctl completion bash

使配置文件/etc/profile生效。

root@ubuntuk8sclient:~# source /etc/profile

查看镜像。

root@ubuntuk8sclient:~# nerdctl images 
REPOSITORY    TAG    IMAGE ID    CREATED    PLATFORM    SIZE    BLOB SIZE

查看命名空间。

root@ubuntuk8sclient:~# nerdctl ns list
NAME            CONTAINERS    IMAGES    VOLUMES    LABELS
default         0             0         0              
k8s.io          0             4         0              
moby            0             0         0              
plugins.moby    0             0         0    

nerdctl的命令和docker命令很相似,只要把docker命令里的docker换成nerdctl,基本都能执行成功。

拉取镜像。

root@ubuntuk8sclient:~# nerdctl pull hub.c.163.com/library/nginx:latest
                
root@ubuntuk8sclient:~# nerdctl images
REPOSITORY                     TAG       IMAGE ID        CREATED           PLATFORM       SIZE         BLOB SIZE
hub.c.163.com/library/nginx    latest    8eeb06742b41    22 seconds ago    linux/amd64    115.5 MiB    41.2 MiB

查看containerd信息。

root@ubuntuk8sclient:~# nerdctl info 

8.2 安装gVisor

Note: gVisor supports x86_64 and ARM64, and requires Linux 4.14.77+,gvisor要求内核版本大于4.14.77,此机器版本为4.15.0-112-generic,因此不用升级内核。如果需要升级内核,请参考博客《centos7 离线升级/在线升级操作系统内核》。

root@ubuntuk8sclient:~# uname -r
4.15.0-112-generic

下载gvisor对应的可执行文件。

root@ubuntuk8sclient:~# wget https://storage.googleapis.com/gvisor/releases/release/latest/x86_64/runsc https://storage.googleapis.com/gvisor/releases/release/latest/x86_64/runsc.sha512 https://storage.googleapis.com/gvisor/releases/release/latest/x86_64/containerd-shim-runsc-v1 https://storage.googleapis.com/gvisor/releases/release/latest/x86_64/containerd-shim-runsc-v1.sha512   

root@ubuntuk8sclient:~# ll -h runsc* containerd-shim*
-rwxr-xr-x 1 root root 25M Jun  7 18:24 containerd-shim-runsc-v1*
-rw-r--r-- 1 root root 155 Jun  7 18:24 containerd-shim-runsc-v1.sha512
-rwxr-xr-x 1 root root 38M Jun  7 18:24 runsc*
-rw-r--r-- 1 root root 136 Jun  7 18:24 runsc.sha512

进行文件校验。

root@ubuntuk8sclient:~# sha512sum -c runsc.sha512 -c containerd-shim-runsc-v1.sha512
runsc: OK
containerd-shim-runsc-v1: OK

root@ubuntuk8sclient:~# cat *sha512
f24834bbd4d14d0d0827e31276ff74a1e08b7ab366c4a30fe9c30d656c1ec5cbfc2544fb06698b4749791e0c6f80e6d16ec746963ff6ecebc246dc6e5b2f34ba  containerd-shim-runsc-v1
e5bc1c46d021246a69174aae71be93ff49661ff08eb6a957f7855f36076b44193765c966608d11a99f14542612438634329536d88fccb4b12bdd9bf2af20557f  runsc

授予可执行权限并移动到/usr/local/bin目录。

root@ubuntuk8sclient:~# chmod a+rx runsc containerd-shim-runsc-v1

root@ubuntuk8sclient:~# mv runsc containerd-shim-runsc-v1 /usr/local/bin

可以发现现在containerd只支持runc一种runtime。

root@ubuntuk8sclient:~# crictl info | grep -A10 runtimes
      "runtimes": {
        "runc": {
          "runtimeType": "io.containerd.runc.v2",
          "runtimePath": "",
          "runtimeEngine": "",
          "PodAnnotations": [],
          "ContainerAnnotations": [],
          "runtimeRoot": "",
          "options": {
            "BinaryName": "",
            "CriuImagePath": "",

8.3 配置containerd支持gVisor

需要先修改配置文件,使containerd支持多种runtime。

原本的内容是plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc,新添加plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runsc使containerd支持gvisor,runtime_type = "containerd-shim-runsc-v1"就是我们下载的containerd-shim-runsc-v1文件。

runtime_type = "containerd-shim-runsc-v1"这种写法后面验证了一下,在containerd里创建容器没问题,但是到k8s里就有问题,正确的写法应该是:runtime_type = "io.containerd.runsc.v1"

root@ubuntuk8sclient:~# vim /etc/containerd/config.toml 

root@ubuntuk8sclient:~# cat /etc/containerd/config.toml | grep -A27 "containerd.runtimes.runc"
        [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
          base_runtime_spec = ""
          cni_conf_dir = ""
          cni_max_conf_num = 0
          container_annotations = []
          pod_annotations = []
          privileged_without_host_devices = false
          runtime_engine = ""
          runtime_path = ""
          runtime_root = ""
          runtime_type = "io.containerd.runc.v2"

          [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
            BinaryName = ""
            CriuImagePath = ""
            CriuPath = ""
            CriuWorkPath = ""
            IoGid = 0
            IoUid = 0
            NoNewKeyring = false
            NoPivotRoot = false
            Root = ""
            ShimCgroup = ""
            SystemdCgroup = true
        [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runsc]
          base_runtime_spec = ""
          cni_conf_dir = ""
          cni_max_conf_num = 0
          container_annotations = []
          pod_annotations = []
          privileged_without_host_devices = false
          runtime_engine = ""
          runtime_path = ""
          runtime_root = ""
          runtime_type = "io.containerd.runsc.v1"

重新加载配置文件并重启containerd。

root@ubuntuk8sclient:~# systemctl daemon-reload ;systemctl restart containerd

现在就可以看到containerd支持两种runtime了:runc和runsc。

root@ubuntuk8sclient:~# crictl info | grep -A36 runtimes
      "runtimes": {
        "runc": {
          "runtimeType": "io.containerd.runc.v2",
          "runtimePath": "",
          "runtimeEngine": "",
          "PodAnnotations": [],
          "ContainerAnnotations": [],
          "runtimeRoot": "",
          "options": {
            "BinaryName": "",
            "CriuImagePath": "",
            "CriuPath": "",
            "CriuWorkPath": "",
            "IoGid": 0,
            "IoUid": 0,
            "NoNewKeyring": false,
            "NoPivotRoot": false,
            "Root": "",
            "ShimCgroup": "",
            "SystemdCgroup": true
          },
          "privileged_without_host_devices": false,
          "baseRuntimeSpec": "",
          "cniConfDir": "",
          "cniMaxConfNum": 0
        },
        "runsc": {
          "runtimeType": "containerd-shim-runsc-v1",
          "runtimePath": "",
          "runtimeEngine": "",
          "PodAnnotations": [],
          "ContainerAnnotations": [],
          "runtimeRoot": "",
          "options": null,
          "privileged_without_host_devices": false,
          "baseRuntimeSpec": "",
          "cniConfDir": "",

查看容器。

root@ubuntuk8sclient:~# nerdctl ps 
CONTAINER ID    IMAGE    COMMAND    CREATED    STATUS    PORTS    NAMES

查看镜像。

root@ubuntuk8sclient:~# nerdctl images
REPOSITORY                      TAG                                                                 IMAGE ID        CREATED         PLATFORM       SIZE         BLOB SIZE
hub.c.163.com/library/nginx     latest                                                              8eeb06742b41    2 days ago      linux/amd64    115.5 MiB    41.2 MiB
sha256                          e5bc191dff1f971254305a0dbc58c4145c783e34090bbd4360a36d7447fe3ef2    8eeb06742b41    2 days ago      linux/amd64    115.5 MiB    41.2 MiB

使用nginx镜像创建容器,默认使用runc作为runtime。

root@ubuntuk8sclient:~# nerdctl run -d --name=nginxweb --restart=always hub.c.163.com/library/nginx:latest
bdef5e3fa6e6fb7c08f4df19810a42c81b7bc1bf7a16b3beaca53508ac4cedab

查看容器。

root@ubuntuk8sclient:~# nerdctl ps
CONTAINER ID    IMAGE                                 COMMAND                   CREATED          STATUS    PORTS    NAMES
bdef5e3fa6e6    hub.c.163.com/library/nginx:latest    "nginx -g daemon off;"    4 seconds ago    Up                 nginxweb    

containerd默认使用runc作为runtime创建的容器,会共享宿主机的进程空间和内核空间,容器的进程是暴露给宿主机的,如果容器里存在漏洞,不法分子会使用容器漏洞影响到宿主机的安全。

root@ubuntuk8sclient:~# ps -ef | grep nginx
root       6540   6505  0 21:36 ?        00:00:00 nginx: master process nginx -g daemon off;
systemd+   6625   6540  0 21:36 ?        00:00:00 nginx: worker process
root       6634   6251  0 21:36 pts/1    00:00:00 grep --color=auto nginx

删除容器。

root@ubuntuk8sclient:~# nerdctl rm -f nginxweb
nginxweb

删除容器之后,宿主机就看不到nginx进程了。

root@ubuntuk8sclient:~# nerdctl ps
CONTAINER ID    IMAGE    COMMAND    CREATED    STATUS    PORTS    NAMES

root@ubuntuk8sclient:~# ps -ef | grep nginx
root       6726   6251  0 21:38 pts/1    00:00:00 grep --color=auto nginx

8.4 containerd使用gvisor作为runtime创建容器

创建容器,--runtime=runsc指定containerd使用gvisor作为runtime创建容器。

root@ubuntuk8sclient:~# nerdctl run -d --runtime=runsc --name=nginxweb --restart=always hub.c.163.com/library/nginx:latest
8ea86e8936374efbb626d11f79a9cb79fb32d9a44fafd71c02556a5ae842cac7

containerd使用gvisor作为runtime,以沙箱的方式运行容器,在宿主机里就看不到容器里运行的进程了。

root@ubuntuk8sclient:~# nerdctl ps
CONTAINER ID    IMAGE                                 COMMAND                   CREATED           STATUS    PORTS    NAMES
8ea86e893637    hub.c.163.com/library/nginx:latest    "nginx -g daemon off;"    51 seconds ago    Up                 nginxweb    

root@ubuntuk8sclient:~# ps -ef | grep nginx
root       7153   6251  0 21:41 pts/1    00:00:00 grep --color=auto nginx

删除不了正在运行的容器。

root@ubuntuk8sclient:~# nerdctl rm -f nginxweb
WARN[0000] failed to delete task 8ea86e8936374efbb626d11f79a9cb79fb32d9a44fafd71c02556a5ae842cac7  error="unknown error after kill: runsc did not terminate successfully: exit status 128: sandbox is not running\n: unknown"
WARN[0000] failed to remove container "8ea86e8936374efbb626d11f79a9cb79fb32d9a44fafd71c02556a5ae842cac7"  error="cannot delete running task 8ea86e8936374efbb626d11f79a9cb79fb32d9a44fafd71c02556a5ae842cac7: failed precondition"
WARN[0000] failed to remove container "8ea86e8936374efbb626d11f79a9cb79fb32d9a44fafd71c02556a5ae842cac7"  error="cannot delete running task 8ea86e8936374efbb626d11f79a9cb79fb32d9a44fafd71c02556a5ae842cac7: failed precondition"
WARN[0000] failed to release name store for container "8ea86e8936374efbb626d11f79a9cb79fb32d9a44fafd71c02556a5ae842cac7"  error="cannot delete running task 8ea86e8936374efbb626d11f79a9cb79fb32d9a44fafd71c02556a5ae842cac7: failed precondition"
FATA[0000] cannot delete running task 8ea86e8936374efbb626d11f79a9cb79fb32d9a44fafd71c02556a5ae842cac7: failed precondition 

先停止容器,再删除容器。

root@ubuntuk8sclient:~# nerdctl stop nginxweb
nginxweb

root@ubuntuk8sclient:~# nerdctl rm nginxweb

root@ubuntuk8sclient:~# nerdctl ps
CONTAINER ID    IMAGE    COMMAND    CREATED    STATUS    PORTS    NAMES

九.在k8s环境里,配置containerd作为高级别runtime,containerd使用gvisor作为低级别runtime

9.1 把ubuntuk8sclient节点加入k8s集群

注意docker作为k8s的高级别runtime的时候,不支持gvisor作为docker的低级别runtime,只有单机版的时候,gvisor才能作为docker的低级别runtime。

描述一下当前的系统环境:现在有一个k8s集群,1个master,2个worker,三台机器都是使用docker作为高级别runtime,现在添加一个新的worker节点,新的worker节点使用containerd作为高级别runtime,gvisor作为containerd的低级别runtime

现在把ubuntuk8sclient机器加入k8s集群,ubuntuk8sclient的CONTAINER-RUNTIME为containerd。

查看集群节点。

root@k8scludes1:~# kubectl get nodes -o wide
NAME         STATUS   ROLES                  AGE   VERSION   INTERNAL-IP       EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION       CONTAINER-RUNTIME
k8scludes1   Ready    control-plane,master   55d   v1.22.2   192.168.110.128   <none>        Ubuntu 18.04.5 LTS   4.15.0-112-generic   docker://20.10.14
k8scludes2   Ready    <none>                 55d   v1.22.2   192.168.110.129   <none>        Ubuntu 18.04.5 LTS   4.15.0-112-generic   docker://20.10.14
k8scludes3   Ready    <none>                 55d   v1.22.2   192.168.110.130   <none>        Ubuntu 18.04.5 LTS   4.15.0-112-generic   docker://20.10.14

先在所有的机器配置IP主机名映射(以ubuntuk8sclient为例)。

root@ubuntuk8sclient:~# vim /etc/hosts

root@ubuntuk8sclient:~# cat /etc/hosts
127.0.0.1	localhost
127.0.1.1	tom
192.168.110.139 ubuntuk8sclient
192.168.110.128 k8scludes1
192.168.110.129 k8scludes2
192.168.110.130 k8scludes3

# The following lines are desirable for IPv6 capable hosts
::1     localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

配置软件源,软件源如下,最后三行是k8s源。

root@ubuntuk8sclient:~# cat /etc/apt/sources.list
deb http://mirrors.aliyun.com/ubuntu/ bionic main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic main restricted universe multiverse

deb http://mirrors.aliyun.com/ubuntu/ bionic-security main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic-security main restricted universe multiverse

deb http://mirrors.aliyun.com/ubuntu/ bionic-updates main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic-updates main restricted universe multiverse

deb http://mirrors.aliyun.com/ubuntu/ bionic-proposed main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic-proposed main restricted universe multiverse

deb http://mirrors.aliyun.com/ubuntu/ bionic-backports main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic-backports main restricted universe multiverse

deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main
deb [arch=amd64] https://mirrors.aliyun.com/docker-ce/linux/ubuntu bionic stable
# deb-src [arch=amd64] https://mirrors.aliyun.com/docker-ce/linux/ubuntu bionic stable

apt-key.gpg是k8s的deb源公钥,加载k8s的deb源公钥 apt-key add apt-key.gpg。

下载并加载k8s的deb源公钥:curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add - ; apt-get update。但是谷歌的网址访问不了,我们直接去网上下载apt-key.gpg文件,加载k8s的deb源公钥。

root@ubuntuk8sclient:~# cat apt-key.gpg | apt-key add -
OK

更新软件源。

root@ubuntuk8sclient:~# apt-get update

Linux swapoff命令用于关闭系统交换分区(swap area)。如果不关闭swap,就会在kubeadm初始化Kubernetes的时候报错:“[ERROR Swap]: running with swap on is not supported. Please disable swap”。

root@ubuntuk8sclient:~# swapoff -a ;sed -i '/swap/d' /etc/fstab

root@ubuntuk8sclient:~# cat /etc/fstab
# /etc/fstab: static file system information.
#
# Use 'blkid' to print the universally unique identifier for a
# device; this may be used with UUID= as a more robust way to name devices
# that works even if disks are added and removed. See fstab(5).
#
# <file system> <mount point>   <type>  <options>       <dump>  <pass>
/dev/mapper/tom--vg-root /               ext4    errors=remount-ro 0       1

查看containerd版本。

root@ubuntuk8sclient:~# containerd -v
containerd containerd.io 1.6.4 212e8b6fa2f44b9c21b2798135fc6fb7c53efc16

registry.aliyuncs.com/google_containers/pause:3.6这个镜像需要提前拉取好。

root@ubuntuk8sclient:~# cat /etc/containerd/config.toml | grep pause
    pause_threshold = 0.02
    sandbox_image = "registry.aliyuncs.com/google_containers/pause:3.6"

拉取镜像。

root@ubuntuk8sclient:~# nerdctl pull registry.aliyuncs.com/google_containers/pause:3.6

查看镜像。

root@ubuntuk8sclient:~# nerdctl images | grep pause
registry.aliyuncs.com/google_containers/pause    3.6                                                                 3d380ca88645    3 days ago    linux/amd64    672.0 KiB    294.7 KiB

root@ubuntuk8sclient:~# crictl images | grep pause
registry.aliyuncs.com/google_containers/pause   3.6                 6270bb605e12e       302kB

设置containerd当前命名空间为k8s.io。

root@ubuntuk8sclient:~# cat /etc/nerdctl/nerdctl.toml | head -3
namespace = "k8s.io"

加载overlay和br_netfilter模块。

root@ubuntuk8sclient:~# cat > /etc/modules-load.d/containerd.conf <<EOF 
> overlay 
> br_netfilter 
> EOF

root@ubuntuk8sclient:~# cat /etc/modules-load.d/containerd.conf
overlay 
br_netfilter 

root@ubuntuk8sclient:~# modprobe overlay

root@ubuntuk8sclient:~# modprobe br_netfilter

设置iptables不对bridge的数据进行处理,启用IP路由转发功能。

root@ubuntuk8sclient:~# cat <<EOF> /etc/sysctl.d/k8s.conf 
> net.bridge.bridge-nf-call-ip6tables = 1 
> net.bridge.bridge-nf-call-iptables = 1 
> net.ipv4.ip_forward = 1 
> EOF

使配置生效。

root@ubuntuk8sclient:~# sysctl -p /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1

为了k8s节点间的通信,需要安装cni网络插件,提前下载好calico镜像,calico镜像版本要和k8s的那三个节点的calico版本一致。

root@ubuntuk8sclient:~# nerdctl pull docker.io/calico/cni:v3.22.2

root@ubuntuk8sclient:~# nerdctl pull docker.io/calico/pod2daemon-flexvol:v3.22.2
                               
root@ubuntuk8sclient:~# nerdctl pull docker.io/calico/node:v3.22.2
                        
root@ubuntuk8sclient:~# nerdctl pull docker.io/calico/kube-controllers:v3.22.2
                               
root@ubuntuk8sclient:~# nerdctl images | grep calico
calico/cni                                       v3.22.2                                                             757d06fe361c    4 minutes ago     linux/amd64    227.1 MiB    76.8 MiB
calico/kube-controllers                          v3.22.2                                                             751f1a8ba0af    20 seconds ago    linux/amd64    128.1 MiB    52.4 MiB
calico/node                                      v3.22.2                                                             41aac6d0a440    2 minutes ago     linux/amd64    194.2 MiB    66.5 MiB
calico/pod2daemon-flexvol                        v3.22.2                                                             413c5ebad6a5    3 minutes ago     linux/amd64    19.0 MiB     8.0 MiB

安装kubelet,kubeadm,kubectl。

  • Kubelet 是 kubernetes 工作节点上的一个代理组件,运行在每个节点上;
  • Kubeadm 是一个快捷搭建kubernetes(k8s)的安装工具,它提供了 kubeadm init 以及 kubeadm join 这两个命令来快速创建 kubernetes 集群;kubeadm 通过执行必要的操作来启动和运行一个最小可用的集群;
  • kubectl是Kubernetes集群的命令行工具,通过kubectl能够对集群本身进行管理,并能够在集群上进行容器化应用的安装部署。
root@ubuntuk8sclient:~# apt-get -y install kubelet=1.22.2-00 kubeadm=1.22.2-00 kubectl=1.22.2-00

设置kubelet开机自启动并现在启动。

root@ubuntuk8sclient:~# systemctl enable kubelet --now

在k8s的master节点,查看k8s worker节点加入k8s集群的token。

root@k8scludes1:~# kubeadm token create --print-join-command
kubeadm join 192.168.110.128:6443 --token rwau00.plx8xdksa8zdnfrn --discovery-token-ca-cert-hash sha256:3f401b6187ed44ff8f4b50aa6453cf3eacc3b86d6a72e3bf2caba02556cb918e 

把ubuntuk8sclient节点加入k8s集群。

root@ubuntuk8sclient:~# kubeadm join 192.168.110.128:6443 --token rwau00.plx8xdksa8zdnfrn --discovery-token-ca-cert-hash sha256:3f401b6187ed44ff8f4b50aa6453cf3eacc3b86d6a72e3bf2caba02556cb918e
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluster.

去k8s master节点查看是否加入k8s集群,可以看到ubuntuk8sclient成功加入k8s集群,并且CONTAINER-RUNTIME为containerd://1.6.4。

root@k8scludes1:~# kubectl get node -o wide
NAME              STATUS   ROLES                  AGE   VERSION   INTERNAL-IP       EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION       CONTAINER-RUNTIME
k8scludes1        Ready    control-plane,master   55d   v1.22.2   192.168.110.128   <none>        Ubuntu 18.04.5 LTS   4.15.0-112-generic   docker://20.10.14
k8scludes2        Ready    <none>                 55d   v1.22.2   192.168.110.129   <none>        Ubuntu 18.04.5 LTS   4.15.0-112-generic   docker://20.10.14
k8scludes3        Ready    <none>                 55d   v1.22.2   192.168.110.130   <none>        Ubuntu 18.04.5 LTS   4.15.0-112-generic   docker://20.10.14
ubuntuk8sclient   Ready    <none>                 87s   v1.22.2   192.168.110.139   <none>        Ubuntu 18.04.5 LTS   4.15.0-112-generic   containerd://1.6.4

现在需要配置containerd支持多个runtime,使其支持gvisor。

原本的内容是plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc,新添加plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runsc使containerd支持gvisor,runtime_type = "containerd-shim-runsc-v1"就是我们下载的containerd-shim-runsc-v1文件。

root@ubuntuk8sclient:~# cat /etc/containerd/config.toml | grep -A27 "containerd.runtimes.runc"
        [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
          base_runtime_spec = ""
          cni_conf_dir = ""
          cni_max_conf_num = 0
          container_annotations = []
          pod_annotations = []
          privileged_without_host_devices = false
          runtime_engine = ""
          runtime_path = ""
          runtime_root = ""
          runtime_type = "io.containerd.runc.v2"

          [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
            BinaryName = ""
            CriuImagePath = ""
            CriuPath = ""
            CriuWorkPath = ""
            IoGid = 0
            IoUid = 0
            NoNewKeyring = false
            NoPivotRoot = false
            Root = ""
            ShimCgroup = ""
            SystemdCgroup = true
        [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runsc]
          base_runtime_spec = ""
          cni_conf_dir = ""
          cni_max_conf_num = 0
          container_annotations = []
          pod_annotations = []
          privileged_without_host_devices = false
          runtime_engine = ""
          runtime_path = ""
          runtime_root = ""
          runtime_type = "containerd-shim-runsc-v1"

重新加载配置文件并重启containerd。

root@ubuntuk8sclient:~# systemctl daemon-reload ;systemctl restart containerd  

现在就可以看到containerd支持两种runtime了:runc和runsc。

root@ubuntuk8sclient:~# crictl info | grep -A36 runtimes
      "runtimes": {
        "runc": {
          "runtimeType": "io.containerd.runc.v2",
          "runtimePath": "",
          "runtimeEngine": "",
          "PodAnnotations": [],
          "ContainerAnnotations": [],
          "runtimeRoot": "",
          "options": {
            "BinaryName": "",
            "CriuImagePath": "",
            "CriuPath": "",
            "CriuWorkPath": "",
            "IoGid": 0,
            "IoUid": 0,
            "NoNewKeyring": false,
            "NoPivotRoot": false,
            "Root": "",
            "ShimCgroup": "",
            "SystemdCgroup": true
          },
          "privileged_without_host_devices": false,
          "baseRuntimeSpec": "",
          "cniConfDir": "",
          "cniMaxConfNum": 0
        },
        "runsc": {
          "runtimeType": "containerd-shim-runsc-v1",
          "runtimePath": "",
          "runtimeEngine": "",
          "PodAnnotations": [],
          "ContainerAnnotations": [],
          "runtimeRoot": "",
          "options": null,
          "privileged_without_host_devices": false,
          "baseRuntimeSpec": "",
          "cniConfDir": "",

9.2 配置kubelet使其支持gVisor

配置kubelet,使其可以支持gvisor作为containerd的低级别runtime,修改kubelet参数,让其支持runsc作为runtime。

root@ubuntuk8sclient:~# cat > /etc/systemd/system/kubelet.service.d/0-cri-containerd.conf <<EOF 
> [Service] 
> Environment="KUBELET_EXTRA_ARGS=--container-runtime=remote --runtime-request-timeout=15m 
> --container-runtime-endpoint=unix:///run/containerd/containerd.sock" 
> EOF


root@ubuntuk8sclient:~# cat /etc/systemd/system/kubelet.service.d/0-cri-containerd.conf
[Service] 
Environment="KUBELET_EXTRA_ARGS=--container-runtime=remote --runtime-request-timeout=15m  --container-runtime-endpoint=unix:///run/containerd/containerd.sock" 

重新加载配置文件并重启kubelet。

root@ubuntuk8sclient:~# systemctl daemon-reload ; systemctl restart kubelet

root@ubuntuk8sclient:~# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─0-cri-containerd.conf, 10-kubeadm.conf
   Active: active (running) since Sat 2022-06-11 18:00:31 CST; 14s ago
     Docs: https://kubernetes.io/docs/home/
 Main PID: 31685 (kubelet)
    Tasks: 13 (limit: 1404)
   CGroup: /system.slice/kubelet.service
           └─31685 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-runtime=remote --con

一切就绪,现在就创建pod。

给ubuntuk8sclient节点定义一个标签:con=gvisor。

root@k8scludes1:~# kubectl label nodes ubuntuk8sclient con=gvisor
node/ubuntuk8sclient labeled

root@k8scludes1:~# kubectl get node -l con=gvisor
NAME              STATUS   ROLES    AGE   VERSION
ubuntuk8sclient   Ready    <none>   29m   v1.22.2

创建目录存放文件。

root@k8scludes1:~# mkdir containerd-gvisor

root@k8scludes1:~# cd containerd-gvisor/

编辑pod配置文件,nodeSelector:con: gvisor 指定pod运行在ubuntuk8sclient节点,使用nginx镜像创建pod。

root@k8scludes1:~/containerd-gvisor# vim pod.yaml 

root@k8scludes1:~/containerd-gvisor# cat pod.yaml 
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    run: podtest
  name: podtest
spec:
  #当需要关闭容器时,立即杀死容器而不等待默认的30秒优雅停机时长。
  terminationGracePeriodSeconds: 0
  #nodeSelector:con: gvisor 指定pod运行在ubuntuk8sclient节点
  nodeSelector:
    con: gvisor
  containers:
  - image: hub.c.163.com/library/nginx:latest
    #imagePullPolicy: IfNotPresent:表示如果本地已经存在该镜像,则不重新下载;否则从远程 Docker Hub 下载该镜像
    imagePullPolicy: IfNotPresent
    name: podtest
    resources: {}
  dnsPolicy: ClusterFirst
  restartPolicy: Always
status: {}

创建pod。

root@k8scludes1:~/containerd-gvisor# kubectl apply -f pod.yaml 
pod/podtest created

root@k8scludes1:~/containerd-gvisor# kubectl get pod -o wide
NAME      READY   STATUS    RESTARTS   AGE   IP             NODE              NOMINATED NODE   READINESS GATES
podtest   1/1     Running   0          16s   10.244.228.1   ubuntuk8sclient   <none>           <none>

创建pod之后,去ubuntuk8sclient查看,看看宿主机是否能看到容器里的nginx进程,宿主机里看到了pod里的nginx进程,这说明pod是默认使用runc作为低级别runtime创建pod的。

root@ubuntuk8sclient:~# ps -ef | grep nginx
root      38308  38227  0 18:15 ?        00:00:00 nginx: master process nginx -g daemon off;
systemd+  38335  38308  0 18:15 ?        00:00:00 nginx: worker process
root      39009  27377  0 18:17 pts/1    00:00:00 grep --color=auto nginx

删除pod。

root@k8scludes1:~/containerd-gvisor# kubectl delete pod podtest 
pod "podtest" deleted

删除pod之后,宿主机也就没有nginx进程了。

root@ubuntuk8sclient:~# ps -ef | grep nginx
root      40044  27377  0 18:20 pts/1    00:00:00 grep --color=auto nginx

9.3 创建容器运行时类(Runtime Class)

在k8s里使用gvisor创建pod,需要使用到容器运行时类(Runtime Class)。

RuntimeClass 是一个用于选择容器运行时配置的特性,容器运行时配置用于运行 Pod 中的容器。你可以在不同的 Pod 设置不同的 RuntimeClass,以提供性能与安全性之间的平衡。 例如,如果你的部分工作负载需要高级别的信息安全保证,你可以决定在调度这些 Pod 时,尽量使它们在使用硬件虚拟化的容器运行时中运行。 这样,你将从这些不同运行时所提供的额外隔离中获益,代价是一些额外的开销。

你还可以使用 RuntimeClass 运行具有相同容器运行时,但具有不同设置的 Pod。

注意RuntimeClass是全局生效的,不受命名空间限制。

查看runtimeclass。

root@k8scludes1:~/containerd-gvisor# kubectl get runtimeclass
No resources found

编辑RuntimeClass配置文件,handler后面写runtime的名字,我们要使用gvisor就写runsc。

root@k8scludes1:~/containerd-gvisor# vim myruntimeclass.yaml

#创建runtimeclass,指定使用runsc
root@k8scludes1:~/containerd-gvisor# cat myruntimeclass.yaml 
# RuntimeClass 定义于 node.k8s.io API 组
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
  # 用来引用 RuntimeClass 的名字
  # RuntimeClass 是一个集群层面的资源
  name: myruntimeclass  
# 对应的 CRI 配置的名称
#handler: myconfiguration
#注意:handler后面写runtime的名字,我们要使用gvisor就写runsc
handler: runsc

创建runtimeclass。

root@k8scludes1:~/containerd-gvisor# kubectl apply -f myruntimeclass.yaml 
runtimeclass.node.k8s.io/myruntimeclass created

root@k8scludes1:~/containerd-gvisor# kubectl get runtimeclass
NAME             HANDLER   AGE
myruntimeclass   runsc     20s

9.4 使用gVisor创建pod

一旦完成集群中 RuntimeClasses 的配置, 你就可以在 Pod spec 中指定 runtimeClassName 来使用它。

runtimeClassName这一设置会告诉 kubelet 使用所指的 RuntimeClass 来运行该 pod。 如果所指的 RuntimeClass 不存在或者 CRI 无法运行相应的 handler, 那么 pod 将会进入 Failed 终止 阶段。 你可以查看相应的事件, 获取执行过程中的错误信息。如果未指定 runtimeClassName ,则将使用默认的 RuntimeHandler,相当于禁用 RuntimeClass 功能特性。

编辑pod配置文件,runtimeClassName: myruntimeclass指定用myruntimeclass里的runsc来运行pod。

root@k8scludes1:~/containerd-gvisor# vim pod.yaml 

root@k8scludes1:~/containerd-gvisor# cat pod.yaml 
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    run: podtest
  name: podtest
spec:
  #当需要关闭容器时,立即杀死容器而不等待默认的30秒优雅停机时长。
  terminationGracePeriodSeconds: 0
  runtimeClassName: myruntimeclass
  nodeSelector:
    con: gvisor
  containers:
  - image: hub.c.163.com/library/nginx:latest
    #imagePullPolicy: IfNotPresent:表示如果本地已经存在该镜像,则不重新下载;否则从远程 Docker Hub 下载该镜像
    imagePullPolicy: IfNotPresent
    name: podtest
    resources: {}
  dnsPolicy: ClusterFirst
  restartPolicy: Always
status: {}

创建pod。

root@k8scludes1:~/containerd-gvisor# kubectl apply -f pod.yaml 
pod/podtest created

查看pod,但是创建失败。

root@k8scludes1:~/containerd-gvisor# kubectl get pod -o wide
NAME      READY   STATUS              RESTARTS   AGE   IP       NODE              NOMINATED NODE   READINESS GATES
podtest   0/1     ContainerCreating   0          24s   <none>   ubuntuk8sclient   <none>           <none>

查看pod描述,invalid runtime name containerd-shim-runsc-v1, correct runtime name should be either format like io.containerd.runc.v1 or a full path to the binary: unknown 告诉我们containerd-shim-runsc-v1的格式不对。

root@k8scludes1:~/containerd-gvisor# kubectl describe pod podtest 
Name:         podtest
Namespace:    minsvcbug
Priority:     0
Node:         ubuntuk8sclient/192.168.110.139

    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              con=gvisor
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age                 From               Message
  ----     ------                  ----                ----               -------
  Normal   Scheduled               49s                 default-scheduler  Successfully assigned minsvcbug/podtest to ubuntuk8sclient
  Warning  FailedCreatePodSandBox  22s (x25 over 47s)  kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to start shim: failed to resolve runtime path: invalid runtime name containerd-shim-runsc-v1, correct runtime name should be either format like `io.containerd.runc.v1` or a full path to the binary: unknown

删除pod。

root@k8scludes1:~/containerd-gvisor# kubectl delete pod podtest 
pod "podtest" deleted

回到ubuntuk8sclient修改containerd配置文件,runsc的runtime_type不应该写为containerd-shim-runsc-v1,而应该是runtime_type = "io.containerd.runsc.v1"。

root@ubuntuk8sclient:~# vim /etc/containerd/config.toml 

root@ubuntuk8sclient:~# grep runtime_type /etc/containerd/config.toml
        runtime_type = ""
          runtime_type = "io.containerd.runc.v2"
          runtime_type = "io.containerd.runsc.v1"
        runtime_type = ""

重新加载配置文件并重启containerd。

root@ubuntuk8sclient:~# systemctl daemon-reload ;systemctl restart containerd

继续创建pod。

root@k8scludes1:~/containerd-gvisor# kubectl apply -f pod.yaml 
pod/podtest created

pod创建成功了。

root@k8scludes1:~/containerd-gvisor# kubectl get pod -o wide
NAME      READY   STATUS    RESTARTS   AGE   IP              NODE              NOMINATED NODE   READINESS GATES
podtest   1/1     Running   0          10s   10.244.228.27   ubuntuk8sclient   <none>           <none>

在宿主机上查看nginx容器。

root@ubuntuk8sclient:~# nerdctl ps | grep podtest
d4604b2b8b39    registry.aliyuncs.com/google_containers/pause:3.6             "/pause"                  46 seconds ago    Up                 k8s://minsvcbug/podtest                            
dcb76b70a98e    hub.c.163.com/library/nginx:latest                            "nginx -g daemon off;"    45 seconds ago    Up                 k8s://minsvcbug/podtest/podtest                    

gvisor以沙箱的方式运行容器,在宿主机里就看不到容器里运行的进程了。

root@ubuntuk8sclient:~# ps -ef | grep nginx
root     111683  27377  0 02:36 pts/1    00:00:00 grep --color=auto nginx

删除pod。

root@k8scludes1:~/containerd-gvisor# kubectl delete pod podtest 
pod "podtest" deleted

十.总结

Gvisor作为一种安全容器运行时,通过引入沙箱机制,实现了对容器进程的细粒度控制,有效提高了容器的安全性。虽然相较于传统容器技术,Gvisor可能带来一定的性能开销,但其在安全性方面的优势足以弥补这一不足。

posted @ 2024-06-20 10:25  人生的哲理  阅读(702)  评论(1编辑  收藏  举报