ubuntu安装cuda、cudnn和nvidia-docker
本文参考自Ubuntu18.04安装CUDA10.1和cuDNN v7.6.5
安装前的工作
lspci | grep -i nvidia
查看可用的nvidia设备——
01:00.0 VGA compatible controller: NVIDIA Corporation GP106 [GeForce GTX 1060 6GB] (rev a1)
01:00.1 Audio device: NVIDIA Corporation GP106 High Definition Audio Controller (rev a1)
uname -m && cat /etc/*release
知晓操作系统的信息——64位的ubuntu20.04系统
x86_64
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
DISTRIB_DESCRIPTION="Ubuntu 20.04.2 LTS"
NAME="Ubuntu"
VERSION="20.04.2 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.2 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
gcc --version
检查是否已安装gcc——version:(Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
uname -r
linux内核版本——5.8.0-50-generic
要安装的cuda和cudnn版本说明
根据windows踩坑的情况,rtx1060适配的cuda版本10.1.105_418,cudnn版本10.1v7.6.5.32
安装cuda
下载好cuda10.1.105_418,由于没有ubuntu20.04对应的版本,我选择了18.10包。按照下载页面执行如下命令:
sudo dpkg -i cuda-repo-ubuntu1810-10-1-local-10.1.105-418.39_1.0-1_amd64.deb
/*执行第一条命令打印出的内容
Selecting previously unselected package cuda-repo-ubuntu1810-10-1-local-10.1.105-418.39.
(Reading database ... 186150 files and directories currently installed.)
Preparing to unpack cuda-repo-ubuntu1810-10-1-local-10.1.105-418.39_1.0-1_amd64.deb ...
Unpacking cuda-repo-ubuntu1810-10-1-local-10.1.105-418.39 (1.0-1) ...
Setting up cuda-repo-ubuntu1810-10-1-local-10.1.105-418.39 (1.0-1) ...
The public CUDA GPG key does not appear to be installed.
To install the key, run this command:
sudo apt-key add /var/cuda-repo-10-1-local-10.1.105-418.39/7fa2af80.pub
*/
sudo apt-key add /var/cuda-repo-10-1-local-10.1.105-418.39/7fa2af80.pub
sudo apt-get update
sudo apt-get install cuda
之后重启
检查cuda的安装情况
重启后执行nvidia-smi
获取显卡信息。执行nvcc -V
,建议“sudo apt install nvidia-cuda-toolkit”,不要如此做,因为本地已有与cuda对应的nvcc程序,从线上安装nvidia-cuda-toolkit可能造成toolkit与cuda的版本冲突,令cuda环境失效。(我曾经乱在主机上装nvidia-cuda-toolkit导致nvidia-smi命令无法使用,整个主机无法使用nv显卡,需要重新装cuda环境。)
下面将nvcc添加到环境变量中
vim ~/.bashrc
# 添加一行:export PATH="/usr/local/cuda-10.1/bin:$PATH"
source ~/.bashrc
之后执行nvcc -V
命令得到结果:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Fri_Feb__8_19:08:17_PST_2019
Cuda compilation tools, release 10.1, V10.1.105
安装cudnn
去nv网站下载cudnn-10.1-linux-x64-v7.6.5.32.tgz(cudnn for linux)
tar -xzvf cudnn-10.1-linux-x64-v7.6.5.32.tgz
sudo cp cuda/include/cudnn*.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn* # 所有用户组赋上读权限
vim ~/.bashrc
# 添加一行:export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
source ~/.bashrc
安装nvidia-docker
根据Docker-Getting Started-Installing on Ubuntu and Debian文档的说明,执行如下命令:
curl https://get.docker.com | sh \
&& sudo systemctl --now enable docker
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
&& curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install -y nvidia-docker2
sudo systemctl restart docker
sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
sudo docker images
/*
REPOSITORY TAG IMAGE ID CREATED SIZE
nvidia/cuda 11.0-base 2ec708416bb8 8 months ago 122MB
*/
在红米book14上的实践
参考Win10+MX250+CUDA10.1+cuDNN+Pytorch1.4安装+测试全过程(吐血),使用的CUDA和cudnn还是这篇博文中用到的软件。按照本文的操作得到正确结果,中间遇到一个问题:执行nvidia-smi
命令报错“VIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.”,在BIOS设定好管理员的密码关闭安全启动模式,解决该问题。
本文创建于2021年 05月 05日 星期三 19:41:19 CST,修改于2021年7月19日14点44分