Ubuntu 20.04 CUDA环境配置

CUDA环境配置

摘要

CUDA编程的第一步就是配置CUDA环境,而Windows环境对NVIDIA各种工具并不能提供良好的支持,所以我们选择Linux系统。Linux系统选择Ubuntu 20.04.

0.写在前面

如果你是刚安装的Ubuntu 20.04系统,那么你需要先做一些基本的操作,比如换源、安装基本依赖等。

0.1 换源

备份源文件

sudo cp /etc/apt/sources.list /etc/apt/sources.list.backup

然后编辑/etc/apt/sources.list文件

sudo gedit /etc/apt/sources.list

将文件中的内容替换为:

网易163源

# 默认注释了源码镜像以提高 apt update 速度,如有需要可自行取消注释
deb http://mirrors.163.com/ubuntu/ focal main restricted universe multiverse
deb http://mirrors.163.com/ubuntu/ focal-security main restricted universe multiverse
deb http://mirrors.163.com/ubuntu/ focal-updates main restricted universe multiverse
deb http://mirrors.163.com/ubuntu/ focal-backports main restricted universe multiverse
# deb-src http://mirrors.163.com/ubuntu/ focal main restricted universe multiverse
# deb-src http://mirrors.163.com/ubuntu/ focal-security main restricted universe multiverse
# deb-src http://mirrors.163.com/ubuntu/ focal-updates main restricted universe multiverse
# deb-src http://mirrors.163.com/ubuntu/ focal-backports main restricted universe multiverse
# 预发布软件源,不建议启用
# deb http://mirrors.163.com/ubuntu/ focal-proposed main restricted universe multiverse
# deb-src http://mirrors.163.com/ubuntu/ focal-proposed main restricted universe multiverse

或者

#阿里云源

deb http://mirrors.aliyun.com/ubuntu/ focal main restricted universe multiverse
#deb-src http://mirrors.aliyun.com/ubuntu/ focal main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ focal-security main restricted universe multiverse
#deb-src http://mirrors.aliyun.com/ubuntu/ focal-security main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ focal-updates main restricted universe multiverse
#deb-src http://mirrors.aliyun.com/ubuntu/ focal-updates main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ focal-proposed main restricted universe multiverse
#deb-src http://mirrors.aliyun.com/ubuntu/ focal-proposed main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ focal-backports main restricted universe multiverse
#deb-src http://mirrors.aliyun.com/ubuntu/ focal-backports main restricted universe multiverse

或者

#清华源

# 默认注释了源码镜像以提高 apt update 速度,如有需要可自行取消注释
deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal main restricted universe multiverse
# deb-src https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal main restricted universe multiverse
deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal-updates main restricted universe multiverse
# deb-src https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal-updates main restricted universe multiverse
deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal-backports main restricted universe multiverse
# deb-src https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal-backports main restricted universe multiverse
deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal-security main restricted universe multiverse
# deb-src https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal-security main restricted universe multiverse

# 预发布软件源,不建议启用
# deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal-proposed main restricted universe multiverse
# deb-src https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal-proposed main restricted universe multiverse

或者

#中科大源

deb https://mirrors.ustc.edu.cn/ubuntu/ focal main restricted universe multiverse
deb-src https://mirrors.ustc.edu.cn/ubuntu/ focal main restricted universe multiverse
deb https://mirrors.ustc.edu.cn/ubuntu/ focal-updates main restricted universe multiverse
deb-src https://mirrors.ustc.edu.cn/ubuntu/ focal-updates main restricted universe multiverse
deb https://mirrors.ustc.edu.cn/ubuntu/ focal-backports main restricted universe multiverse
deb-src https://mirrors.ustc.edu.cn/ubuntu/ focal-backports main restricted universe multiverse
deb https://mirrors.ustc.edu.cn/ubuntu/ focal-security main restricted universe multiverse
deb-src https://mirrors.ustc.edu.cn/ubuntu/ focal-security main restricted universe multiverse
deb https://mirrors.ustc.edu.cn/ubuntu/ focal-proposed main restricted universe multiverse
deb-src https://mirrors.ustc.edu.cn/ubuntu/ focal-proposed main restricted universe multiverse

保存并退出后,执行更新命令

sudo apt-get update
sudo apt-get upgrade

0.2 安装必要的工具

sudo apt-get update
sudo apt-get install g++
sudo apt-get install gcc
sudo apt-get install make

0.3 系统切换为英文

在命令行界面,似乎不能输入中文,此时如果要进入中文目录,就会很麻烦,而且存放nvidia驱动的文件夹也要是英文,所以直接将系统改为英文,就比较方便。

这里的修改,直接在系统设置中就可以修改,然后在重启一下就ok。

1.大致步骤

  1. 确定GPU型号以及要安装的CUDA版本
  2. 确定对应当前GPU型号和CUDA版本要求的驱动程序
  3. 禁用Ubuntu自带的开源显卡驱动
  4. 切换桌面显示管理器
  5. 卸载当前显卡驱动,下载驱动,安装新驱动
  6. 安装CUDA

接下来我们就一步步操作,配置一个我们需要的CUDA环境。

2.确定GPU型号以及当前驱动版本

查看GPU型号

lspci | grep -i nvidia

查看NVIDIA驱动版本

cat /proc/driver/nvidia/version

3.确定需要的显卡驱动

运行 CUDA 应用程序需要系统至少具有一个支持 CUDA 的 GPU 和一个与 CUDA Toolkit 兼容的驱动程序。

驱动版本要同时满足GPU型号以及CUDA版本的要求。

版本对照表:https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html

驱动下载:https://www.nvidia.cn/geforce/drivers/

实验室主机配置:GTX 1080Ti、CUDA11.5,所以需要驱动版本 >= 495.29.05

在网站中根据系统和显卡型号下载所需要的驱动,我下载的如下图所示。

image-20220104230134114

下载后,得到一个安装包

NVIDIA-Linux-x86_64-495.46.run

4.切换桌面显示管理器

Ubuntu 20.04自带的桌面显示管理器是gdm,但是gdm与NVIDIA 显卡驱动存在一些兼容性问题,你可能刚安装上驱动是没问题,但是指不定那次就黑屏了(别问我咋知道的。。。)所以我们使用以下命令改用lightdm管理器

sudo apt-get install lightdm

安装lightdm后,紧接着就会让你选择管理器,我们选择lightdm,就可以了。

5.卸载并重新安装驱动

我们已经下载了所需要的显卡驱动,下一步就是禁用开源驱动或者卸载现有的驱动(如果之前安装过旧版本的驱动)。

禁用nouveau驱动

nouveau是Ubuntu自带的开源的显卡驱动,我们首先需要禁用它,才能安装NVIDIA驱动。按照以下步骤禁用它。

sudo vi /etc/modprobe.d/blacklist.conf

然后在文本最后添加:

blacklist nouveau
options nouveau modeset=0

然后执行:

sudo update-initramfs -u

然后重启

reboot

重启之后,执行以下命令,如果没有屏幕输出,说明禁用nouveau成功:

lsmod | grep nouveau~~

卸载旧驱动

如果电脑之前安装过旧版本的驱动,那么还需要先卸载驱动。

由于正在使用图形界面,也就是GPU正在被使用,导致显卡驱动卸载失败,所以需要关闭图形界面,在命令界面(tty模式)下完成驱动的更新。

以下操作(包括安装新驱动)都需要在命令界面操作,执行以下快捷键进入命令界面,并登录:(登录所使用的就是Ubuntu上的用户名和密码)

Ctrl-Alt+F1

执行以下命令禁用X-Window服务,否则无法安装显卡驱动:

sudo service lightdm stop

执行以下三条命令卸载原有显卡驱动:

sudo apt-get remove --purge nvidia*
sudo chmod +x NVIDIA-Linux-x86_64-xxx.xx.run
sudo ./NVIDIA-Linux-x86_64-xxx.xx.run --uninstall

注:这里的 NVIDIA-Linux-x86_64-xxx.xx.run 是刚才下载的新的驱动,也就是说我们可以通过新的驱动卸载旧的驱动

安装新驱动

进入有驱动的目录,直接执行驱动文件即可安装新驱动,一直默认即可:

sudo chmod +X NVIDIA-Linux-x86_64-495.46.run
sudo ./NVIDIA-Linux-x86_64-495.46.run

安装的时候,会有一个 pre-install scipt failed 错误,不用担心,只需要确认一下几个方面没有问题,就可以了。它仅仅是为了让你确认你真是是要安装这个

安装过程中的一些选项

The distribution-provided pre-install script failed! Are you sure you want to continue? 选择 yes 继续。

Would you like to register the kernel module souces with DKMS? This will allow DKMS to automatically build a new module, if you install a different kernel later?  选择 No 继续。

问题没记住,选项是:install without signing

问题大概是:Nvidia's 32-bit compatibility libraries? 选择 No 继续。

Would you like to run the nvidia-xconfigutility to automatically update your x configuration so that the NVIDIA x driver will be used when you restart x? Any pre-existing x confile will be backed up.  选择 Yes  继续

安装完成后,执行以下命令启动X-Window服务

sudo service lightdm start

最后执行重启命令,重启系统即可:

reboot

接下来的操作都是在图形界面下完成的。

重启系统后,打开终端输入以下命令,查看显卡驱动是否配置成功

nvidia-smi

如果显示的内容大致如下(这是我找的网图,不过大致效果是一样的)所示,那么就代表驱动配置成功

img

配置好驱动后,我们就可以安装CUDA了。

6.卸载当前CUDA

如果之前安装过CUDA环境,那么就需要先卸载,否则可以直接跳过这一步。

执行的是CUDA自带的卸载脚本,要根据自己的cuda版本找到卸载脚本:

sudo /usr/local/cuda-11.4/bin/uninstall_cuda_11.4.pl

卸载之后,还有一些残留的文件夹,之前安装的是CUDA 11.4。可以一并删除:

sudo rm -rf /usr/local/cuda-11.4

7.安装CUDA

下载CUDA:https://developer.nvidia.cn/cuda-downloads

image-20220104232238634

下载完成之后,给文件赋予执行权限:

chmod +x cuda_11.5.1_495.29.05_linux.run

执行安装包,开始安装:

sudo ./cuda_11.5.1_495.29.05_linux.run

安装配置

(是否同意条款,必须同意才能继续安装)
accept/decline/quit: accept
(这里不要安装驱动,因为已经安装最新的驱动了,否则可能会安装旧版本的显卡驱动,导致重复登录的情况)
Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 495.29.05?
(y)es/(n)o/(q)uit: n

Install the CUDA 11.5 Toolkit?(是否安装CUDA 11 ,这里必须要安装)
(y)es/(n)o/(q)uit: y

其余的默认配置就可以。

安装完成之后,可以配置他们的环境变量,在~/.bashrc的最后加上以下配置信息:

首先打开~/.bashrc

sudo gedit ~/.bashrc

然后在后面加入下面三行

export CUDA_HOME=/usr/local/cuda-11.5
export LD_LIBRARY_PATH=${CUDA_HOME}/lib64
export PATH=${CUDA_HOME}/bin:${PATH}

保存并退出。最后使用命令 source ~/.bashrc 使它生效。
可以使用命令 nvcc -V 查看安装的版本信息:

test@test:~$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver     
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0

8.测试安装效果

执行以下几条命令:

cd /usr/local/cuda-11.5/samples/1_Utilities/deviceQuery
sudo make
./deviceQuery

正常情况下输出:

./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 2 CUDA Capable device(s)

Device 0: "NVIDIA GeForce GTX 1080 Ti"
  CUDA Driver Version / Runtime Version          11.5 / 11.5
  CUDA Capability Major/Minor version number:    6.1
  Total amount of global memory:                 11178 MBytes (11721506816 bytes)
  (028) Multiprocessors, (128) CUDA Cores/MP:    3584 CUDA Cores
  GPU Max Clock rate:                            1645 MHz (1.64 GHz)
  Memory Clock rate:                             5505 Mhz
  Memory Bus Width:                              352-bit
  L2 Cache Size:                                 2883584 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total shared memory per multiprocessor:        98304 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Managed Memory:                Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 21 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

Device 1: "NVIDIA GeForce GTX 1080 Ti"
  CUDA Driver Version / Runtime Version          11.5 / 11.5
  CUDA Capability Major/Minor version number:    6.1
  Total amount of global memory:                 11170 MBytes (11712659456 bytes)
  (028) Multiprocessors, (128) CUDA Cores/MP:    3584 CUDA Cores
  GPU Max Clock rate:                            1645 MHz (1.64 GHz)
  Memory Clock rate:                             5505 Mhz
  Memory Bus Width:                              352-bit
  L2 Cache Size:                                 2883584 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total shared memory per multiprocessor:        98304 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Managed Memory:                Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 33 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
> Peer access from NVIDIA GeForce GTX 1080 Ti (GPU0) -> NVIDIA GeForce GTX 1080 Ti (GPU1) : Yes
> Peer access from NVIDIA GeForce GTX 1080 Ti (GPU1) -> NVIDIA GeForce GTX 1080 Ti (GPU0) : Yes

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.5, CUDA Runtime Version = 11.5, NumDevs = 2
Result = PASS

参考资料

https://blog.csdn.net/qq_33200967/article/details/80689543

https://blog.csdn.net/weixin_44629973/article/details/105480753

https://blog.csdn.net/Perfect886/article/details/119109380

遇到的问题

黑屏

更新显卡驱动后,黑屏,尝试更换为lightdm也不好使,使用以下方法,成功解决黑屏问题。

解决方法:

sudo nano /etc/default/grub
找到GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"
改成GRUB_CMDLINE_LINUX_DEFAULT="quiet splash nomodeset"
保存退出后,重启电脑

重启后,一直卡在引导

安装完显卡驱动后,一直卡在引导,无法进入系统。这是因为核显与独显的冲突造成的。解决办法就是进入bios,然后禁用核显,再重启系统就可以了。

(原谅我当时遇到问题时,忘记拍照片了)

不过,遇到类似的问题,直接再浏览器中搜索,我相信一定会有大把的教程。

NVIDIA-SMI has failed

配置好环境,使用一段时间后,出现如下的问题

终端输入如下的命令

nvidia-smi

出现如下的错误

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

这个原因很可能时Ubuntu内核更新造成的,当然也有可能是其他原因,我们先按照内核更新原因,进行解决,如果不好使,再想办法。

首先重启电脑,进入引导界面,选择第二个选项,也就是advanced。然后选择之前的内核版本(不要选择 recovery mode)启动,重启后,如果问题解决,那么就皆大欢喜,否则就需要继续探索了。

posted @ 2022-05-04 15:15  LLW_NEU  阅读(2820)  评论(0编辑  收藏  举报