Ubuntu16.04下docker部署深度学习服务器+cuda+cudnn+ssh

1. 首先安装换服务器上的Ubuntu系统,NVIDIA驱动和Docker环境

本文环境为:

服务器Ubuntu版本为16.04

NVIDIA驱动版本为10.2

Docker版本为20.10.7

 

参考网址:

https://blog.csdn.net/qq_39638989/article/details/121275230

 

2. ssh登录到服务器上,创建Docker容器的外挂目录,并分配权限

ssh user_name@server_ip

mkdir -p /mnt/sda0/gd00000

sudo chmod 777 /mnt/sda0/gd00000/

 

3. dockerhub仓库中拉取操作系统和显卡驱动对应版本的镜像

NVIDIA-CUDA镜像地址:

https://hub.docker.com/r/nvidia/cuda

 

NVIDIA驱动支持的CUDA版本:

https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#title-resolved-issues

 

本文环境对应的镜像版本为:nvidia/cuda:10.2-devel-ubuntu18.04,运行下面命令拉取镜像

docker pull nvidia/cuda:10.2-devel-ubuntu18.04

 

查看拉取的镜像

sudo docker images

 

4. 进入前面外挂目录下,通过镜像建立容器

cd /mnt/sda0/00000/

docker run -itd --name guider -v ${PWD}:/mnt -p 10001:22 -p 10002:8080 --gpus all -e NVIDIA_DRIVER_CAPABILITIES=compute,utility -e NVIDIA_VISIBLE_DEVICES=all  --privileged=true --shm-size 32G nvidia/cuda:10.2-cudnn7-devel-ubuntu18.04 /bin/bash

 

run 表示创建并运行容器

-itd 表示以交互方式运行容器,并在后台运行

--name guider 表示创建的容器名称

-v ${PWD}:/mnt 表示把当前目录挂载到容器的/mnt目录下

-p 10001:22 -p 10002:8080 表示把容器中22和8080端口映射到宿主机1000110002端口

-gpus all -e NVIDIA_DRIVER_CAPABILITIES=compute,utility -e NVIDIA_VISIBLE_DEVICES=all 表示赋予容器使用宿主机的GPU能力

--shm-size 创建共享内存

nvidia/cuda:10.2-cudnn7-devel-ubuntu18.04 表示使用的镜像和版本号

 

查看容器运行情况

docker ps -a

参考网址:

https://www.jianshu.com/p/8f38a63b86cc

https://blog.csdn.net/weixin_43590796/article/details/114848742

https://blog.csdn.net/weixin_44966641/article/details/123930747

 

5. 进入容器进行配置,安装vimsshsudo,并创建用户

docker exec -it guider /bin/bash

 

exec 表示进行容器

-it 表示以交互方式运行容器

guider 表示容器名称

/bin/bash 表示进入容器后运行bash

 

更新源和安装vimsshsudo

5.1更新源

apt-get update

 

5.2解决GPG error:

https://developer.download.nvidia.cn/compute/cuda/repos/ubuntu1804/x86_64  InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC

apt-key adv --recv-keys --keyserver keyserver.ubuntu.com 对应错误公钥

 

5.3换仓库源

cp /etc/apt/sources.list /etc/apt/sources.list.bak

sed -i 's/archive.ubuntu.com/mirrors.ustc.edu.cn/g' /etc/apt/sources.list

apt update

 

5.4安装vimsshsudo

apt-get install vim

apt-get install openssh-server

apt-get install sudo

 

参考网址:

https://blog.csdn.net/weixin_45722313/article/details/121117394

https://blog.csdn.net/hello_1995/article/details/109222650

 

6. Docker容器内ssh服务开机自启动

6.1创建start_ssh.sh文件,并赋予可执行权限

touch /root/start_ssh.sh

 

vim /root/start_ssh.sh

#!/bin/bash

LOGTIME=$(date "+%Y-%m-%d %H:%M:%S")

echo "[$LOGTIME] startup run..." >>/root/start_ssh.log

service ssh start >>/root/start_ssh.log

 

chmod +x /root/start_ssh.sh

 

6.2start_ssh.sh脚本添加到启动文件中

vim /root/.bashrc

# startup run

if [ -f /root/start_ssh.sh ]; then

      ./root/start_ssh.sh

fi

 

6.3退出容器,重启后进入,查看ssh服务是否开启

exit

docker restart guider

docker ps -a

docker exec -it guider /bin/bash

ps -e |grep ssh

 

参考网址:

https://blog.csdn.net/qq_38603541/article/details/124028994

http://www.wjhsh.net/bigben0123-p-3184115.html

https://www.cnblogs.com/zhongzhaoxie/p/13064433.html

 

7. 创建管理员用户

adduser username

adduser username sudo

id username

su username

 

8. 验证是否能通过ssh远程登录创建的管理员用户,并创建工作目录和数据目录,使用挂载目录进行开发

ssh user_name@server_ip -p port_id

user_name 表示管理员用户名称

server_ip 表示服务器IP地址

port_id 表示容器中22号ssh端口映射到主机对应的端口号

 

在挂载目录中创建工作目录和数据目录

mkdir -p /mnt/work

mkdir -p /mnt/data

 

9. 打包导出容器,并对镜像进行更新

9.1导出容器

docker ps -a

docker export guider > guider-v1.tar

gzip -k guider-v1.tar

 

9.2导入容器快照

gzip -d -k guider-v1.tar.gz

docker images

docker import guider-v1.tar guider:v1

 

参考网址:

https://www.linuxcool.com/gzip

https://blog.csdn.net/zfw_666666/article/details/124670125

 

删除容器

docker rm -f guider

删除镜像

docker rmi guider:v1

 

10. 给管理员用户添加CUDA环境变量

docker run -itd --name guider -v /mnt/sda0/00000:/mnt -p 10001:22 -p 10002:8080 --gpus all -e NVIDIA_DRIVER_CAPABILITIES=compute,utility -e NVIDIA_VISIBLE_DEVICES=all  --privileged=true --shm-size 8G guider:v1 /bin/bash

 

ssh user_name@server_ip -p port_id

 

vim .bashrc

export PATH=/usr/local/cuda-10.2/bin:$PATH

export LD_LIBRARY_PATH=/usr/local/cuda-10.2/lib64:$LD_LIBRARY_PATH

export C_INCLUDE_PATH=/usr/local/cuda-10.2/include:$C_INCLUDE_PATH

export CPLUS_INCLUDE_PATH=/usr/local/cuda-10.2/include:$CPLUS_INCLUDE_PATH

source .bashrc

 

参考地址:

https://blog.csdn.net/qq_36814762/article/details/122374053

https://blog.csdn.net/Willen_/article/details/103489485

 

11. 安装Anaconda

11.1.安装Anaconda

sh Anaconda3-2022.10-Linux-x86_64.sh

 

11.2添加anaconda清华源

conda config --show channels

conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/

conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/

conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge/

conda config --set show_channel_urls yes

 

vim .condarc

 

conda clean -i

conda create -n myenv numpy

 

参考网址:

https://blog.csdn.net/run_success/article/details/124841938

http://t.zoukankan.com/BlairGrowing-p-15265922.html

https://mirror.tuna.tsinghua.edu.cn/help/anaconda/

 

11.3添加pip中科大源

pip config set global.index-url https://pypi.mirrors.ustc.edu.cn/simple/

vim .config/pip/pip.conf pip config list

 

参考网址:

https://mirrors4.tuna.tsinghua.edu.cn/help/anaconda/

 

12. 安装opencv-python

sudo apt-get update

sudo apt-get install ffmpeg libsm6 libxext6

 

pip install opencv-python

 

python

import cv2

print(cv2.__version__)

 

参考地址:

https://www.jianshu.com/p/6f7e2ccd146a

https://blog.csdn.net/weixin_42990464/article/details/125203404

 

 

13. 安装paddle

12.1切换conda虚拟环境

conda create --name paddle python=3.9

conda env list

conda activate paddle 

conda install paddlepaddle-gpu==2.3.2 cudatoolkit=10.2 --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/

 

12.2验证paddle

python

import paddle

paddle.utils.run_check()

 

12.3删除paddle

conda activate base

conda env remove -n paddle

 

参考网址:

https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/conda/linux-conda.html

 

14. 安装pytorch

13.1切换conda虚拟环境

conda create --name pytorch python=3.9

conda env list

conda activate pytorch

conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=10.2 -c pytorch

 

13.2验证pytorch

python

import torch

torch.cuda.is_available()

print(torch.__version__)

print(torch.version.cuda)

print(torch.backends.cudnn.version())

torch.cuda.nccl.version()

 

13.3删除pytorch

conda activate base

conda env remove -n pytorch

 

conda clean -p      //删除没有用的包(推荐)

conda clean -t      //tar打包

conda clean -y -all //删除全部的安装包及cache

 

参考网址:

https://blog.csdn.net/Pin_BOY/article/details/120479861

 

13.4使用pip安装pytorch

pip install torch==1.8.2 torchvision==0.9.2 torchaudio==0.8.2 --extra-index-url https://download.pytorch.org/whl/lts/1.8/cu102

 

参考网址:

https://pytorch.org/get-started/locally/

http://t.zoukankan.com/jimlau-p-13260269.html

https://zhuanlan.zhihu.com/p/388212600

https://www.cnpython.com/qa/1295282

 

15. 安装opencv

14.1到官网下载对应的安装包,并进行编译安装

sudo apt update && sudo apt install -y cmake g++ wget unzip pkg-config

 

wget -O opencv.zip https://github.com/opencv/opencv/archive/4.x.zip

wget -O opencv_contrib.zip https://github.com/opencv/opencv_contrib/archive/4.x.zip

unzip opencv.zip

unzip opencv_contrib.zip

 

mkdir -p build && cd build

 

cmake -DOPENCV_EXTRA_MODULES_PATH=../opencv_contrib-4.6.0/modules ../opencv-4.6.0 -DOPENCV_GENERATE_PKGCONFIG=YES -DBUILD_opencv_world=ON

 

make -j12

sudo make install

 

14.2修改.bashrc环境变量

vim .bashrc

export PATH=/usr/local/bin:$PATH

export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH

export C_INCLUDE_PATH=/usr/local/include/opencv4:$C_INCLUDE_PATH

export CPLUS_INCLUDE_PATH=/usr/local/include/opencv4:$CPLUS_INCLUDE_PATH

export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig:$PKG_CONFIG_PATH

source .bashrc

 

14.3编写测试程序,进行编译测试

vim test.cpp

 

#include <iostream>

#include "opencv2/opencv.hpp"

 

int main(int argc, char **argv)

{

    cv::Mat image = cv::imread("test.jpg");

    cv::imwrite("save.jpg", image);

    return 0;

}

 

g++ test.cpp `pkg-config --cflags --libs opencv4`

 

参考网址:

https://zhuanlan.zhihu.com/p/455045315

https://docs.opencv.org/4.x/d7/d9f/tutorial_linux_install.html

https://blog.csdn.net/qq_38505858/article/details/117780774

https://blog.csdn.net/u013798145/article/details/121698120

 

16. 打包导出容器,并对镜像进行更新

15.1导出容器

docker ps -a

docker export guider > guider-v2.tar

gzip -k guider-v2.tar

 

15.2导入容器快照

gzip -d -k guider-v2.tar.gz

docker images

docker import guider-v2.tar guider:v2

 

17. 创建并运行容器

mkdir -p /home/ubuntu/SDA/docker/08760

 

docker run -itd --name 08760 \

-v /home/ubuntu/SDA/docker/08760:/home/guider/work \

-p 10001:22 -p 10002:8080 \

--gpus all -e NVIDIA_DRIVER_CAPABILITIES=compute,utility -e NVIDIA_VISIBLE_DEVICES=all \

--privileged=true --shm-size 32G guider:v2 /bin/bash

 

18. 安装OpenCV4.6.0

conda search opencv

conda install opencv=4.6.0

 

使用时CmakeLists.txt中加入以下内容,目录/home/xxx/envs/opencv/share/OpenCVopencv在虚拟环境中的安装目录

set(OpenCV_DIR /home/xxx/envs/opencv/share/OpenCV)

 

参考地址:

https://blog.csdn.net/guanjing_dream/article/details/121074834

https://www.dandelioncloud.cn/article/details/1507667653236957186

https://zhuanlan.zhihu.com/p/573341843

https://blog.csdn.net/weixin_44327262/article/details/105860213

 

19. 在本地docker中,显示图形化界面

18.1在需要显示图形化界面的Linux系统中安装x11服务

sudo apt-get install x11-xserver-utils

xhost +

 

18.2启动并进入docker容器

docker run -itd --name guider -v /tmp/.x11-unix:/tmp/.x11-unix -e DISPLAY=unix$DISPLAY -e GDK_SCALE -e GDK_DPI_SCALE --net=host guider:v2 /bin/bash

 

docker exec -it guider /bin/bash

 

sudo apt-get install xarclock

xarclock

 

-v /tmp/.x11-unix:/tmp/.x11-unix \ #共享本地unix端口

-e DISPLAY=unix$DISPLAY \ #修改环境变量DISPLAY

-e GDK_SCALE \ #这两个是与显示效果相关的环境变量,没有细究

-e GDK_DPI_SCALE

--net=host 设置网络模式,与主机共享IP

 

参考地址:

https://zhuanlan.zhihu.com/p/460494660

https://www.csdn.net/tags/MtTaEgysMDIxMDYtYmxvZwO0O0OO0O0O.html

https://blog.csdn.net/lxyoucan/article/details/121679346

 

20. 在服务器docker中,显示图形化界面

参考地址:

https://www.cnblogs.com/jcchen1987/p/10553930.html

https://blog.csdn.net/ywxuan/article/details/118462658

https://www.modb.pro/db/212354

 

21. 安装yolov5出现ImportError: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.22' not found

 

sudo find / -name 'libstdc++.so*'

 

sudo rm /usr/lib/gcc/x86_64-linux-gnu/7/libstdc++.so

sudo rm /usr/lib/x86_64-linux-gnu/libstdc++.so*

 

sudo cp /home/guider/anaconda3/lib/libstdc++.so.6.0.29 /usr/lib/gcc/x86_64-linux-gnu/7/

sudo cp /home/guider/anaconda3/lib/libstdc++.so.6.0.29 /usr/lib/x86_64-linux-gnu/

 

cd /usr/lib/x86_64-linux-gnu/

sudo ln -s libstdc++.so.6.0.29 libstdc++.so.6

sudo ln -s libstdc++.so.6.0.29 libstdc++.so

 

参考地址:

https://blog.csdn.net/weixin_36488777/article/details/116897183

https://blog.csdn.net/wenroudebaozi/article/details/107564647/

https://blog.csdn.net/qq_36396104/article/details/88774797

https://blog.51cto.com/u_12630471/3705832

https://blog.csdn.net/qq_36396104/article/details/88774797

 

22. 远端访问6006端口

参考网址:

https://zhuanlan.zhihu.com/p/508422931

https://zhuanlan.zhihu.com/p/57630633

 

 

posted @ 2022-11-09 12:15  盛夏夜  阅读(714)  评论(0编辑  收藏  举报