搭建一套基于docker的开发训练和Pycharm调试环境
搭建一套基于docker的开发训练环境
步骤如下:
一、安装ubuntu20.04及NVidia驱动,CUDA,CUDNN
1、GPU显示驱动安装:
安装显卡驱动:
sudo ./NVIDIA-Linux-x86_64-460.39.run -no-x-check -no-nouveau-check -no-opengl-files //只有禁用opengl这样安装才不会出现循环登陆的问题 Ubuntu 16.04安装NVIDIA驱动后导致的循环登录问题
-no-x-check:安装驱动时关闭X服务
-no-nouveau-check:安装驱动时禁用nouveau
-no-opengl-files:只安装驱动文件,不安装OpenGL文件
2、CUDA安装:
CUDA版本下载链接:https://developer.download.nvidia.cn/compute/cuda/repos/ubuntu1804/x86_64/
先下载与显卡驱动版本对应的CUDA包:
dpkg -i cuda-repo-ubuntu1804-11-2-local_11.2.1-460.32.03-1_amd64.deb,存入当前目录
#install CUDA
sudo dpkg -i sudo dpkg -i cuda-repo-ubuntu1804-11-2-local_11.2.1-460.32.03-1_amd64.deb
sudo apt-key add /var/cuda-repo-ubuntu1804-11-2-local/7fa2af80.pub
sudo apt-get update
sudo apt-get -y install cuda
安装完之后,需要设置环境变量
sudo gedit ~/.bashrc
export LD_LIBRARY_PATH=/usr/local/cuda/lib
export PATH=$PATH:/usr/local/cuda/bin
然后,输入以下命令使配置的环境变量立即生效:
source ~/.bashrc
查看当前cuda是否成功安装
nvcc -V
二、安装宿主机docker环境
NVidia-Docker2安装与常用命令
前提条件:配置apt国内镜像源
1.备份sources.list
sudo cp /etc/apt/sources.list /etc/apt/sources.list.bak
2.修改sources.list
sudo gedit /etc/apt/sources.list
3.替换云镜像,以下为Ubuntu18.04,如其他系统,需找到对应版本。
阿里
deb http://mirrors.aliyun.com/ubuntu/ bionic main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ bionic-security main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ bionic-updates main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ bionic-proposed main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ bionic-backports main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic-security main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic-updates main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic-proposed main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic-backports main restricted universe multiverse
4.更新
sudo apt update
5.安装Docker
当前版本为:Ubuntu18.04
清除系统原有docker
sudo apt-get remove docker docker-engine docker.io
更新程序
sudo apt update
安装依赖
sudo apt install apt-transport-https ca-certificates curl software-properties-common
添加Docker官方密钥到系统中
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
在云桌面执行该命令时报错:curl:(35) gnutils_handshake() failed:Error in the push function. gpg:找不到有效的OpenPGP数据
解决方法:sudo apt-get install build-essential fakeroot dpkg-dev libcurl4-openssl-dev
sudo apt-key fingerprint 0EBFCD88
添加Docker源 [先查看当前操作系统版本,确定是bionic还是xenic lsb_release -cs]
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu bionic stable" 或直接使用下面这条命令
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
更新一下源
sudo apt update
查看可以安装的docker版本
apt-cache policy docker-ce 如果有列表显示,说明可以正常安装了
开始安装docker
sudo apt install docker-ce
测试
docker --version
sudo docker run hello-world 出现unable to find image 'hello-world:latest' locally说明已安装成功
6.安装NVIDIA-docker
1.安装gpgkey
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
碰到问题:gpg: 找不到有效的 OpenPGP 数据。
解决方法:依次输入以下命令:
wget https://download.docker.com/linux/ubuntu/gpg
sudo apt-key add gpg
curl -s -L https://nvidia.github.io/nvidia-docker/ubuntu18.04/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
2.安装NVIDIA-DOCKER
方法1:
1)安装nvidia-container-runtime ,如果安装nvidia-docker2的话
添加package repositories
curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey | sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.list | sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list
sudo apt-get update
sudo sed -i -e '/experimental/ s/^#//g' /etc/apt/sources.list.d/nvidia-container-runtime.list
sudo apt-get update
sudo apt-get install nvidia-container-runtime
2)安装nvidia-docker2软件包并重新加载docker守护程序配置
安装 nvidia-docker2
sudo apt-get install nvidia-docker2
sudo pkill -SIGHUP dockerd
方法2:
直接下载好4个包:
libnvidia-container1_1.0.1-1_amd64.deb
libnvidia-container-tools_1.0.1-1_amd64.deb
nvidia-container-runtime_3.1.4-1_amd64.deb
nvidia-container-toolkit_1.0.5-1_amd64.deb
sudo dpkg -i *.deb 依次安装即可
测试安装
docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi
一般会报错,使用命令:nvidia-smi查看当前CUDA版本[cuda:11.0]
输入以下命令:
sudo nvidia-docker run --rm nvidia/cuda:11.0-devel nvidia-smi
如果出现unable to find image 'nvidia/cuda:11.0-devel' locally
11.0-devel:Pulling from nvidia-cuda
3ff22d22a855 Downloading ...
就表示安装成功了
ubuntu18.04的docker基本镜像dockerhub链接:https://hub.docker.com/_/ubuntu,在命令行使用docker pull ubuntu:18.04下载
三、安装sudo权限
apt-get update,如果不成功,需要更换源:/etc/apt/sources.list
注意区分18.04和16.04,本人亲测阿里云可以正常更新,但清华源无法下载。替换方法,由于当前镜像下没有vi,vim,也无法安装通过sudo安装,可以先采用映射文件夹到docker的方式,拷贝一个准备好的sources.list进来。
四、镜像安装CUDA,CUDNN
Docker镜像无需安装NVidia驱动,只需安装CUDA和CUDNN
五、制作Docker镜像(安装软件包)
1、安装miniconda
先下载包,放入docker相应映射目录:Miniconda3-latest-Linux-x86_64
bash Miniconda3-latest-Linux-x86_64,安装过程中需要注意,有一个选项:是否需要每次启动conda?输入no,其它都默认yes就可以了
安装完后,需要设置环境变量: sudo vim ~/.bashrc
在文件最后加上:export PATH="/miniconda/bin":$PATH
source ~/.bashrc 使修改生效
进入python :command not support 需要先安装python:sudo apt-get install python
sudo update-alternatives --install /usr/bin/python3 python3
/miniconda3/bin/python3.8 1 设置python3.8为默认python位置
输入python,正常显示即说明配置成功
2、pytorch,torchvision
先下载好安装文件,然后使用pip安装
如果pip没有安装,需先安装pip:sudo apt-get insatll python-pip
然后依次安装torch,torchvision :
链接:https://download.pytorch.org/whl/torch_stable.html
pip install torch-1.7.0-cp38-cp38-manylinux1_x86_64.whl
pip install torchvision-0.8.0-cp38-cp38-manylinux1_x86_64.whl
3、使用pycharm连接docker
配置部分可参考该链接:使用docker来配置pycharm开发和训练环境
4、安装opencv
pip install opencv-python
确认安装成功方法:输入python: 进入python环境,>> import cv2
可能会有报错:ImportError:libGl1.so.1:cannot open shared object file:No such file or directory
解决方法:sudo apt-get install libgl1-mesa-glx
5、安装yacs,skimage
pip install yacs,scikit-image,安装scikit-image的同时会安装matplotlib, scipy
6、安装apex
git clone https://github.com/NVIDIA/apex
cd apex
python3 setup.py install
六、保存Docker镜像为文件
sudo docker commit -a "name" containid imagename