Ubuntu16.04下docker部署深度学习服务器+cuda+cudnn+ssh
1. 首先安装换服务器上的Ubuntu系统,NVIDIA驱动和Docker环境
本文环境为:
服务器Ubuntu版本为16.04
NVIDIA驱动版本为10.2
Docker版本为20.10.7
参考网址:
https://blog.csdn.net/qq_39638989/article/details/121275230
2. ssh登录到服务器上,创建Docker容器的外挂目录,并分配权限
ssh user_name@server_ip
mkdir -p /mnt/sda0/gd00000
sudo chmod 777 /mnt/sda0/gd00000/
3. 从dockerhub仓库中拉取操作系统和显卡驱动对应版本的镜像
NVIDIA-CUDA镜像地址:
https://hub.docker.com/r/nvidia/cuda
NVIDIA驱动支持的CUDA版本:
https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#title-resolved-issues
本文环境对应的镜像版本为:nvidia/cuda:10.2-devel-ubuntu18.04,运行下面命令拉取镜像
docker pull nvidia/cuda:10.2-devel-ubuntu18.04
查看拉取的镜像
sudo docker images
4. 进入前面外挂目录下,通过镜像建立容器
cd /mnt/sda0/00000/
docker run -itd --name guider -v ${PWD}:/mnt -p 10001:22 -p 10002:8080 --gpus all -e NVIDIA_DRIVER_CAPABILITIES=compute,utility -e NVIDIA_VISIBLE_DEVICES=all --privileged=true --shm-size 32G nvidia/cuda:10.2-cudnn7-devel-ubuntu18.04 /bin/bash
run 表示创建并运行容器
-itd 表示以交互方式运行容器,并在后台运行
--name guider 表示创建的容器名称
-v ${PWD}:/mnt 表示把当前目录挂载到容器的/mnt目录下
-p 10001:22 -p 10002:8080 表示把容器中22和8080端口映射到宿主机10001和10002端口
-gpus all -e NVIDIA_DRIVER_CAPABILITIES=compute,utility -e NVIDIA_VISIBLE_DEVICES=all 表示赋予容器使用宿主机的GPU能力
--shm-size 创建共享内存
nvidia/cuda:10.2-cudnn7-devel-ubuntu18.04 表示使用的镜像和版本号
查看容器运行情况
docker ps -a
参考网址:
https://www.jianshu.com/p/8f38a63b86cc
https://blog.csdn.net/weixin_43590796/article/details/114848742
https://blog.csdn.net/weixin_44966641/article/details/123930747
5. 进入容器进行配置,安装vim、ssh、sudo,并创建用户
docker exec -it guider /bin/bash
exec 表示进行容器
-it 表示以交互方式运行容器
guider 表示容器名称
/bin/bash 表示进入容器后运行bash
更新源和安装vim、ssh、sudo
5.1更新源
apt-get update
5.2解决GPG error:
https://developer.download.nvidia.cn/compute/cuda/repos/ubuntu1804/x86_64 InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC
apt-key adv --recv-keys --keyserver keyserver.ubuntu.com 对应错误公钥
5.3换仓库源
cp /etc/apt/sources.list /etc/apt/sources.list.bak
sed -i 's/archive.ubuntu.com/mirrors.ustc.edu.cn/g' /etc/apt/sources.list
apt update
5.4安装vim、ssh、sudo
apt-get install vim
apt-get install openssh-server
apt-get install sudo
参考网址:
https://blog.csdn.net/weixin_45722313/article/details/121117394
https://blog.csdn.net/hello_1995/article/details/109222650
6. Docker容器内ssh服务开机自启动
6.1创建start_ssh.sh文件,并赋予可执行权限
touch /root/start_ssh.sh
vim /root/start_ssh.sh
#!/bin/bash
LOGTIME=$(date "+%Y-%m-%d %H:%M:%S")
echo "[$LOGTIME] startup run..." >>/root/start_ssh.log
service ssh start >>/root/start_ssh.log
chmod +x /root/start_ssh.sh
6.2将start_ssh.sh脚本添加到启动文件中
vim /root/.bashrc
# startup run
if [ -f /root/start_ssh.sh ]; then
./root/start_ssh.sh
fi
6.3退出容器,重启后进入,查看ssh服务是否开启
exit
docker restart guider
docker ps -a
docker exec -it guider /bin/bash
ps -e |grep ssh
参考网址:
https://blog.csdn.net/qq_38603541/article/details/124028994
http://www.wjhsh.net/bigben0123-p-3184115.html
https://www.cnblogs.com/zhongzhaoxie/p/13064433.html
7. 创建管理员用户
adduser username
adduser username sudo
id username
su username
8. 验证是否能通过ssh远程登录创建的管理员用户,并创建工作目录和数据目录,使用挂载目录进行开发
ssh user_name@server_ip -p port_id
user_name 表示管理员用户名称
server_ip 表示服务器IP地址
port_id 表示容器中22号ssh端口映射到主机对应的端口号
在挂载目录中创建工作目录和数据目录
mkdir -p /mnt/work
mkdir -p /mnt/data
9. 打包导出容器,并对镜像进行更新
9.1导出容器
docker ps -a
docker export guider > guider-v1.tar
gzip -k guider-v1.tar
9.2导入容器快照
gzip -d -k guider-v1.tar.gz
docker images
docker import guider-v1.tar guider:v1
参考网址:
https://www.linuxcool.com/gzip
https://blog.csdn.net/zfw_666666/article/details/124670125
删除容器
docker rm -f guider
删除镜像
docker rmi guider:v1
10. 给管理员用户添加CUDA环境变量
docker run -itd --name guider -v /mnt/sda0/00000:/mnt -p 10001:22 -p 10002:8080 --gpus all -e NVIDIA_DRIVER_CAPABILITIES=compute,utility -e NVIDIA_VISIBLE_DEVICES=all --privileged=true --shm-size 8G guider:v1 /bin/bash
ssh user_name@server_ip -p port_id
vim .bashrc
export PATH=/usr/local/cuda-10.2/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-10.2/lib64:$LD_LIBRARY_PATH
export C_INCLUDE_PATH=/usr/local/cuda-10.2/include:$C_INCLUDE_PATH
export CPLUS_INCLUDE_PATH=/usr/local/cuda-10.2/include:$CPLUS_INCLUDE_PATH
source .bashrc
参考地址:
https://blog.csdn.net/qq_36814762/article/details/122374053
https://blog.csdn.net/Willen_/article/details/103489485
11. 安装Anaconda
11.1.安装Anaconda
sh Anaconda3-2022.10-Linux-x86_64.sh
11.2添加anaconda清华源
conda config --show channels
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge/
conda config --set show_channel_urls yes
vim .condarc
conda clean -i
conda create -n myenv numpy
参考网址:
https://blog.csdn.net/run_success/article/details/124841938
http://t.zoukankan.com/BlairGrowing-p-15265922.html
https://mirror.tuna.tsinghua.edu.cn/help/anaconda/
11.3添加pip中科大源
pip config set global.index-url https://pypi.mirrors.ustc.edu.cn/simple/
vim .config/pip/pip.conf 或 pip config list
参考网址:
https://mirrors4.tuna.tsinghua.edu.cn/help/anaconda/
12. 安装opencv-python
sudo apt-get update
sudo apt-get install ffmpeg libsm6 libxext6
pip install opencv-python
python
import cv2
print(cv2.__version__)
参考地址:
https://www.jianshu.com/p/6f7e2ccd146a
https://blog.csdn.net/weixin_42990464/article/details/125203404
13. 安装paddle
12.1切换conda虚拟环境
conda create --name paddle python=3.9
conda env list
conda activate paddle
conda install paddlepaddle-gpu==2.3.2 cudatoolkit=10.2 --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/
12.2验证paddle
python
import paddle
paddle.utils.run_check()
12.3删除paddle
conda activate base
conda env remove -n paddle
参考网址:
14. 安装pytorch
13.1切换conda虚拟环境
conda create --name pytorch python=3.9
conda env list
conda activate pytorch
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=10.2 -c pytorch
13.2验证pytorch
python
import torch
torch.cuda.is_available()
print(torch.__version__)
print(torch.version.cuda)
print(torch.backends.cudnn.version())
torch.cuda.nccl.version()
13.3删除pytorch
conda activate base
conda env remove -n pytorch
conda clean -p //删除没有用的包(推荐)
conda clean -t //tar打包
conda clean -y -all //删除全部的安装包及cache
参考网址:
https://blog.csdn.net/Pin_BOY/article/details/120479861
13.4使用pip安装pytorch
pip install torch==1.8.2 torchvision==0.9.2 torchaudio==0.8.2 --extra-index-url https://download.pytorch.org/whl/lts/1.8/cu102
参考网址:
https://pytorch.org/get-started/locally/
http://t.zoukankan.com/jimlau-p-13260269.html
https://zhuanlan.zhihu.com/p/388212600
https://www.cnpython.com/qa/1295282
15. 安装opencv
14.1到官网下载对应的安装包,并进行编译安装
sudo apt update && sudo apt install -y cmake g++ wget unzip pkg-config
wget -O opencv.zip https://github.com/opencv/opencv/archive/4.x.zip
wget -O opencv_contrib.zip https://github.com/opencv/opencv_contrib/archive/4.x.zip
unzip opencv.zip
unzip opencv_contrib.zip
mkdir -p build && cd build
cmake -DOPENCV_EXTRA_MODULES_PATH=../opencv_contrib-4.6.0/modules ../opencv-4.6.0 -DOPENCV_GENERATE_PKGCONFIG=YES -DBUILD_opencv_world=ON
make -j12
sudo make install
14.2修改.bashrc环境变量
vim .bashrc
export PATH=/usr/local/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
export C_INCLUDE_PATH=/usr/local/include/opencv4:$C_INCLUDE_PATH
export CPLUS_INCLUDE_PATH=/usr/local/include/opencv4:$CPLUS_INCLUDE_PATH
export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig:$PKG_CONFIG_PATH
source .bashrc
14.3编写测试程序,进行编译测试
vim test.cpp
#include <iostream>
#include "opencv2/opencv.hpp"
int main(int argc, char **argv)
{
cv::Mat image = cv::imread("test.jpg");
cv::imwrite("save.jpg", image);
return 0;
}
g++ test.cpp `pkg-config --cflags --libs opencv4`
参考网址:
https://zhuanlan.zhihu.com/p/455045315
https://docs.opencv.org/4.x/d7/d9f/tutorial_linux_install.html
https://blog.csdn.net/qq_38505858/article/details/117780774
https://blog.csdn.net/u013798145/article/details/121698120
16. 打包导出容器,并对镜像进行更新
15.1导出容器
docker ps -a
docker export guider > guider-v2.tar
gzip -k guider-v2.tar
15.2导入容器快照
gzip -d -k guider-v2.tar.gz
docker images
docker import guider-v2.tar guider:v2
17. 创建并运行容器
mkdir -p /home/ubuntu/SDA/docker/08760
docker run -itd --name 08760 \
-v /home/ubuntu/SDA/docker/08760:/home/guider/work \
-p 10001:22 -p 10002:8080 \
--gpus all -e NVIDIA_DRIVER_CAPABILITIES=compute,utility -e NVIDIA_VISIBLE_DEVICES=all \
--privileged=true --shm-size 32G guider:v2 /bin/bash
18. 安装OpenCV4.6.0
conda search opencv
conda install opencv=4.6.0
使用时在CmakeLists.txt中加入以下内容,目录/home/xxx/envs/opencv/share/OpenCV为opencv在虚拟环境中的安装目录
set(OpenCV_DIR /home/xxx/envs/opencv/share/OpenCV)
参考地址:
https://blog.csdn.net/guanjing_dream/article/details/121074834
https://www.dandelioncloud.cn/article/details/1507667653236957186
https://zhuanlan.zhihu.com/p/573341843
https://blog.csdn.net/weixin_44327262/article/details/105860213
19. 在本地docker中,显示图形化界面
18.1在需要显示图形化界面的Linux系统中安装x11服务
sudo apt-get install x11-xserver-utils
xhost +
18.2启动并进入docker容器
docker run -itd --name guider -v /tmp/.x11-unix:/tmp/.x11-unix -e DISPLAY=unix$DISPLAY -e GDK_SCALE -e GDK_DPI_SCALE --net=host guider:v2 /bin/bash
docker exec -it guider /bin/bash
sudo apt-get install xarclock
xarclock
-v /tmp/.x11-unix:/tmp/.x11-unix \ #共享本地unix端口
-e DISPLAY=unix$DISPLAY \ #修改环境变量DISPLAY
-e GDK_SCALE \ #这两个是与显示效果相关的环境变量,没有细究
-e GDK_DPI_SCALE
--net=host 设置网络模式,与主机共享IP
参考地址:
https://zhuanlan.zhihu.com/p/460494660
https://www.csdn.net/tags/MtTaEgysMDIxMDYtYmxvZwO0O0OO0O0O.html
https://blog.csdn.net/lxyoucan/article/details/121679346
20. 在服务器docker中,显示图形化界面
参考地址:
https://www.cnblogs.com/jcchen1987/p/10553930.html
https://blog.csdn.net/ywxuan/article/details/118462658
https://www.modb.pro/db/212354
21. 安装yolov5出现ImportError: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.22' not found
sudo find / -name 'libstdc++.so*'
sudo rm /usr/lib/gcc/x86_64-linux-gnu/7/libstdc++.so
sudo rm /usr/lib/x86_64-linux-gnu/libstdc++.so*
sudo cp /home/guider/anaconda3/lib/libstdc++.so.6.0.29 /usr/lib/gcc/x86_64-linux-gnu/7/
sudo cp /home/guider/anaconda3/lib/libstdc++.so.6.0.29 /usr/lib/x86_64-linux-gnu/
cd /usr/lib/x86_64-linux-gnu/
sudo ln -s libstdc++.so.6.0.29 libstdc++.so.6
sudo ln -s libstdc++.so.6.0.29 libstdc++.so
参考地址:
https://blog.csdn.net/weixin_36488777/article/details/116897183
https://blog.csdn.net/wenroudebaozi/article/details/107564647/
https://blog.csdn.net/qq_36396104/article/details/88774797
https://blog.51cto.com/u_12630471/3705832
https://blog.csdn.net/qq_36396104/article/details/88774797
22. 远端访问6006端口
参考网址:
https://zhuanlan.zhihu.com/p/508422931
https://zhuanlan.zhihu.com/p/57630633