浅谈 docker 挂载 GPU 原理
浅谈 docker 挂载 GPU 原理
基础知识
对于 Docker 等大多数 Linux 容器来说,Cgroups 技术是用来制造约束的主要手段,而 Namespace 技术则是用来修改进程视图的主要方法。
Docker 启动的只是一个进程而已,而不是别的。
参考:
隔离(Namespace)
写代码调用 clone 的时候,传入 CLONE_NEWPID/CLONE_NEWNS/CLONE_NEWUTS/CLONE_NEWNET/CLONE_NEWIPC 等就可以启动一个被隔离的进程
简单来说 Namespace 是一个障眼法:
- PID Namespace
- Mount 只能看到当前 Namespace 中的挂载点信息
- UTS
- IPC
- Network 只能看到当前 Namespace 中的网络设备
- User
- 时间是不可以 Namespace 化,即在某个容器内修改了系统时间,该 host 上所有 container 和 host 的系统时间都将被改变
限制(Cgroup)
Linux Control Group。它最主要的作用,就是限制一个进程组能够使用的资源上限,包括 CPU、内存、磁盘、网络带宽等等。
Cgroups 给用户暴露出来的操作接口是文件系统,即它以文件和目录的方式组织在操作系统的 /sys/fs/cgroup 路径下。
启动容器时填写:
docker run -it --cpu-period=100000 --cpu-quota=20000 ubuntu /bin/bash
在启动这个容器后,我们可以通过查看 Cgroups 文件系统下,CPU 子系统中,“docker” 这个控制组里的资源限制文件的内容来确认:
$ cat /sys/fs/cgroup/cpu/docker/5d5c9f67d/cpu.cfs_period_us
100000
$ cat /sys/fs/cgroup/cpu/docker/5d5c9f67d/cpu.cfs_quota_us
20000
挂载 GPU 实验
使用 nvidia-docker2
简言之,使用 nvidia-docker2
,可以不费吹灰之力就能使用到 GPU,仅仅需要配置 runtime 使用 nvidia
cat /etc/docker/daemon.json
{
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
},
"exec-opts": ["native.cgroupdriver=systemd"]
}
启动容器之后,运行 nvidia-smi 能看到所有的 GPU 卡:
[root@localhost] docker run -it 98b41a1e975d bash
root@6db1dd28459d:/notebooks# nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.79 Driver Version: 410.79 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... On | 00000000:8A:00.0 Off | 0 |
| N/A 40C P0 57W / 300W | 4053MiB / 16130MiB | 4% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-SXM2... On | 00000000:8B:00.0 Off | 0 |
| N/A 38C P0 40W / 300W | 0MiB / 16130MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla V100-SXM2... On | 00000000:8C:00.0 Off | 0 |
| N/A 42C P0 46W / 300W | 0MiB / 16130MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla V100-SXM2... On | 00000000:8D:00.0 Off | 0 |
| N/A 39C P0 40W / 300W | 0MiB / 16130MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 4 Tesla V100-SXM2... On | 00000000:B3:00.0 Off | 0 |
| N/A 39C P0 42W / 300W | 0MiB / 16130MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 5 Tesla V100-SXM2... On | 00000000:B4:00.0 Off | 0 |
| N/A 41C P0 57W / 300W | 7279MiB / 16130MiB | 4% Default |
+-------------------------------+----------------------+----------------------+
| 6 Tesla V100-SXM2... On | 00000000:B5:00.0 Off | 0 |
| N/A 40C P0 45W / 300W | 0MiB / 16130MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 7 Tesla V100-SXM2... On | 00000000:B6:00.0 Off | 0 |
| N/A 41C P0 44W / 300W | 0MiB / 16130MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
通过 NVIDIA_DRIVER_CAPABILITIES
可以加入部分的 library。通过 NVIDIA_VISIBLE_DEVICES
可以只使用某些 GPU 卡,具体请参考 如何通过 nvidia-docker 通过环境变量配置资源
[root@localhost cuda-9.0]# docker run -it --env NVIDIA_DRIVER_CAPABILITIES="compute,utility" --env NVIDIA_VISIBLE_DEVICES=0,1 98b41a1e975d bash
root@97bf127ff83a:/notebooks# nvidia-smi
Tue Oct 15 09:29:45 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.79 Driver Version: 410.79 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... On | 00000000:8A:00.0 Off | 0 |
| N/A 39C P0 57W / 300W | 4053MiB / 16130MiB | 3% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-SXM2... On | 00000000:8B:00.0 Off | 0 |
| N/A 37C P0 40W / 300W | 0MiB / 16130MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
原生 docker 使用 GPU
原生 docker 使用 GPU 遇到了很多坑,首先需要将 runtime 换回 default 值:
[root@localhost ~]# cat /etc/docker/daemon.json
{
"exec-opts": ["native.cgroupdriver=systemd"]
}
重启 docker 服务后,尝试直接挂载 GPU:
docker run --device /dev/nvidia0:/dev/nvidia0 -it 98b41a1e975d bash
root@a85d5e5f69d9:/notebooks# nvidia-smi
bash: nvidia-smi: command not found
root@a85d5e5f69d9:/notebooks# ll /dev/|grep nvidia
crw-rw-rw- 1 root root 195, 0 Oct 15 06:06 nvidia0
nvidia-smi
不存在,那么我们可以把宿主机中的 nvidia-smi
所在目录直接映射进去:
[root@localhost cuda-9.0]# docker run --device /dev/nvidia0:/dev/nvidia0 -v /usr/bin/:/usr/bin -it 98b41a1e975d bash
root@cf29b4477304:/notebooks# nvidia-smi
NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system.
Please also try adding directory that contains libnvidia-ml.so to your system PATH.
libnvidia-ml.so
找不到,libnvidia-ml.so
实际上是 Nvidia Management Library
库(简称 NVML
库),它属于 Nvidia Driver
的范畴。nvidia-smi
通过调用 libnvidia-ml.so
来管理 GPU。因此我们需要把它也挂载进去:
[root@localhost cuda-9.0]# docker run --device /dev/nvidia0:/dev/nvidia0 -v /usr/bin/:/usr/bin -v /usr/lib64:/usr/lib64 -it 98b41a1e975d bash
root@ee39b2b3b1a4:/notebooks# nvidia-smi
Failed to initialize NVML: Unknown Error
Failed to initialize NVML: Unknown Error
出现了初始化 NVML
失败的问题,NVML
库会和 Nvidia Driver
通信,会不会是通信受阻?于是查看 Nvidia 内核模块有哪些,是否需要将其全部映射进容器?
[root@localhost cuda-9.0]# lsmod|grep nvidia
nvidia_drm 39843 0
nvidia_modeset 1036498 1 nvidia_drm
nvidia_uvm 786729 0
nvidia 16594443 77 nvidia_modeset,nvidia_uvm
ipmi_msghandler 46608 3 ipmi_devintf,nvidia,ipmi_si
drm_kms_helper 163265 2 ast,nvidia_drm
drm 370825 5 ast,ttm,drm_kms_helper,nvidia_drm
i2c_core 40756 6 ast,drm,i2c_i801,drm_kms_helper,i2c_algo_bit,nvidia
[root@localhost cuda-9.0]# ll /dev/|grep nvidia
crw-rw-rw- 1 root root 195, 0 Jul 23 10:56 nvidia0
crw-rw-rw- 1 root root 195, 1 Jul 23 10:56 nvidia1
crw-rw-rw- 1 root root 195, 2 Jul 23 10:56 nvidia2
crw-rw-rw- 1 root root 195, 3 Jul 23 10:56 nvidia3
crw-rw-rw- 1 root root 195, 4 Jul 23 10:56 nvidia4
crw-rw-rw- 1 root root 195, 5 Jul 23 10:56 nvidia5
crw-rw-rw- 1 root root 195, 6 Jul 23 10:56 nvidia6
crw-rw-rw- 1 root root 195, 7 Jul 23 10:56 nvidia7
crw-rw-rw- 1 root root 195, 255 Jul 23 10:56 nvidiactl
crw-rw-rw- 1 root root 195, 254 Jul 23 10:56 nvidia-modeset
crw-rw-rw- 1 root root 237, 0 Jul 23 10:56 nvidia-uvm
crw-rw-rw- 1 root root 237, 1 Jul 23 10:56 nvidia-uvm-tools
综上,我们可以再次尝试,把 /dev/nvidiactl
、/dev/nvidia-uvm
、/dev/nvidia-uvm-tools
、/dev/nvidia-modeset
全部映射进去:
[root@localhost cuda-9.0]# docker run --device /dev/nvidia0:/dev/nvidia0 --device /dev/nvidiactl:/dev/nvidiactl --device /dev/nvidia-uvm:/dev/nvidia-uvm --device /dev/nvidia-uvm-tools:/dev/nvidia-uvm-tools --device /dev/nvidia-modeset:/dev/nvidia-modeset -v /usr/bin/:/usr/bin -v /usr/lib64:/usr/lib64 -it 98b41a1e975d bash
root@bc21e395d885:/notebooks# nvidia-smi
Tue Oct 15 09:47:26 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.79 Driver Version: 410.79 CUDA Version: N/A |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... On | 00000000:8A:00.0 Off | 0 |
| N/A 37C P0 44W / 300W | 0MiB / 16130MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
终于得到了我们期望的内容,这个探索的过程引起了我深深的思考,nvidia-docker
是如何做到的?莫非也是 --device
+ 映射 nvidia driver
来实现的?
nvidia-docker 原理
首先我们参考了:
和我们猜测的一样,nvidia-docker
确实是这么做的,nvidia-container-runtime 封装了 runc
,在容器启动之前会调用 pre-start hook
,这个 hook 会调用 nvidia-container-cli,nvidia-container-cli 会分析出需要映射的 GPU 设备、库文件、可执行文件,在容器启动后挂载到容器内部,达到配置好 GPU 环境的目的。
安装 Nvidia driver 驱动
因为在测试的过程中遇到了很多问题,首先就是对 Nvidia 提供的各种驱动不熟悉,不知道他们属于哪一层,导致有些乱,这里整理了下。
Nvidia GPU 相关驱动包含两类:
- Nvidia driver
- CUDA Toolkit
Nvidia driver
安装方法:
- 下载这么一个东西
NVIDIA-Linux-x86_64-384.59.run
然后直接安装,安装后所有的文件默认在/usr/local/nvidia
下,这也是为什么大多数教程上docker -v /usr/local/nvidia:/usr/local/nvidia
的原因 - 还有一种就是通过 rpm 来安装,配置好源之后,
yum install cuda-drivers-410.79-1
(注意自己修改版本),这种方式默认在/usr/bin
、/usr/lib64
下
我动手把 Nvidia driver 主要的 rpm 包都解包了下:
库文件:
nvidia-driver-410.79-1.el7.x86_64.rpm 29MB 核心驱动
./usr/lib64/nvidia/xorg/libglxserver_nvidia.so 15M
./usr/lib64/xorg/modules/drivers/nvidia_drv.so 7.5M
nvidia-driver-libs-410.79-1.el7.x86_64.rpm 44MB 核心库文件
./etc/ld.so.conf.d/nvidia-x86_64.conf
./usr/lib64/libEGL_nvidia.so.410.79 1008K
./usr/lib64/libGLESv1_CM_nvidia.so.410.79 59K
./usr/lib64/libGLESv2_nvidia.so.410.79 109K
./usr/lib64/libGLX_nvidia.so.410.79 1.3M
./usr/lib64/libnvidia-cbl.so.410.79 363K
./usr/lib64/libnvidia-cfg.so.410.79 176K
./usr/lib64/libnvidia-eglcore.so.410.79 25M
./usr/lib64/libnvidia-glcore.so.410.79 26M
./usr/lib64/libnvidia-glsi.so.410.79 568K
./usr/lib64/libnvidia-glvkspirv.so.410.79 14M
./usr/lib64/libnvidia-rtcore.so.410.79 26M
./usr/lib64/libnvidia-tls.so.410.79 15K
./usr/lib64/libnvoptix.so.410.79 34M
./usr/lib64/vdpau/libvdpau_nvidia.so.410.79 965K
./usr/share/glvnd/egl_vendor.d/10_nvidia.json
nvidia-driver-NVML-410.79-1.el7.x86_64.rpm 560K Nvidia Management Library
./usr/lib64/libnvidia-ml.so.410.79 1.5M
nvidia-driver-cuda-libs-410.79-1.el7.x86_64.rpm 33M Nvidia CUDA API Driver?
./usr/lib64/libcuda.so.410.79 15M
./usr/lib64/libnvcuvid.so.410.79 2.7M
./usr/lib64/libnvidia-compiler.so.410.79 46M
./usr/lib64/libnvidia-encode.so.410.79 165K
./usr/lib64/libnvidia-fatbinaryloader.so.410.79 286K
./usr/lib64/libnvidia-opencl.so.410.79 28M
./usr/lib64/libnvidia-ptxjitcompiler.so.410.79 12M
可执行:
nvidia-driver-cuda-410.79-1.el7.x86_64.rpm 394K MPS 和 Nvidia-smi,常用命令
./usr/bin/nvidia-cuda-mps-control
./usr/bin/nvidia-cuda-mps-server
./usr/bin/nvidia-debugdump
./usr/bin/nvidia-smi
nvidia-modprobe-410.79-1.el7.x86_64.rpm 71K 不详
./usr/bin/nvidia-modprobe
不常用:
nvidia-libXNVCtrl-devel-410.79-1.el7.x86_64 62K 不详
./usr/include/NVCtrl
./usr/include/NVCtrl/NVCtrl.h
./usr/include/NVCtrl/NVCtrlLib.h
./usr/include/NVCtrl/nv_control.h
./usr/lib64/libXNVCtrl.so
dkms-nvidia-410.79-1.el7.x86_64.rpm 12M 不详
Registering the NVIDIA Kernel Module with DKMS 不太懂
nvidia-driver-NvFBCOpenGL-410.79-1.el7.x86_64.rpm 135K 不详
./usr/lib64/libnvidia-fbc.so.1
./usr/lib64/libnvidia-fbc.so.410.79
./usr/lib64/libnvidia-ifr.so.1
./usr/lib64/libnvidia-ifr.so.410.79
CUDA Toolkit
安装方法:
wget http://developer.download.nvidia.com/compute/cuda/10.1/Prod/local_installers/cuda_10.1.243_418.87.00_linux.run
sudo sh cuda_10.1.243_418.87.00_linux.run
执行完了之后,应该会在 /usr/local/cuda-9.0/
(版本注意修改)
/usr/local/cuda-9.0/lib64/ 中包含了所有的 CUDA 库文件,从上层到底层分别是:
libcublas.so
libcufft.so
属于 CUDA librarylibcudart.so
属于 CUDA runtimelibcuda.so
属于 CUDA driver API (nv driver 范畴)- nvidia driver (user mode)(nv driver 范畴)
- nvidia driver (kernel mode)(nv driver 范畴)
注意,/usr/local/cuda-9.0/lib64/stubs
文件夹下有很多 libcuda.so
等文件,这个和 Nvidia driver
提供的 libcuda.so
名字一模一样,但是实际上 stubs
下的库是不正确的,目前也不知道他有什么用。