WSL2安装CUDA踩坑Debug全记录

💡 安装的前言+总结：

nvidia驱动请在Windows主机上安装
wsl系统里安装cuda toolkit前务必对照官网与驱动版本对应
Ubuntu可以不用最新版，20.04即可

第一次尝试安装并添加环境变量

官网安装，选择自己的版本

https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=22.04&target_type=deb_local

wget <https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin>
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget <https://developer.download.nvidia.com/compute/cuda/12.2.0/local_installers/cuda-repo-ubuntu2204-12-2-local_12.2.0-535.54.03-1_amd64.deb>
sudo dpkg -i cuda-repo-ubuntu2204-12-2-local_12.2.0-535.54.03-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2204-12-2-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda

查看nvcc发现没有，但/usr/local/cuda/bin下是有nvcc的

root@DESKTOP-PO8BKKM:~# nvcc --version
Command 'nvcc' not found, but can be installed with:
apt install nvidia-cuda-toolkit

所以需要添加一下环境变量，就可以了。

root@DESKTOP-PO8BKKM:~# export PATH=$PATH:/usr/local/cuda/bin
root@DESKTOP-PO8BKKM:~# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Jun_13_19:16:58_PDT_2023
Cuda compilation tools, release 12.2, V12.2.91
Build cuda_12.2.r12.2/compiler.32965470_0

解决nvidia驱动问题

root@DESKTOP-PO8BKKM:~# nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

尝试按这个解决：NVIDIA-SMI has failed because it couldn‘t communicate with the NVIDIA driver. 最全解决方案！详细！

root@DESKTOP-PO8BKKM:~# ls /usr/src | grep nvidia
nvidia-535.54.03

root@DESKTOP-PO8BKKM:~# sudo apt-get install dkms
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
dkms is already the newest version (2.8.7-2ubuntu2.2).
dkms set to manually installed.
0 upgraded, 0 newly installed, 0 to remove and 60 not upgraded.

root@DESKTOP-PO8BKKM:~# sudo dkms install -m nvidia -v 535.54.03
Error! Your kernel headers for kernel 5.10.16.3-microsoft-standard-WSL2 cannot be found.
Please install the linux-headers-5.10.16.3-microsoft-standard-WSL2 package or use the --kernelsourcedir option to tell DKMS where it's located.

卸载CUDA试试看

参考：http://www.manongjc.com/detail/62-afswonvqlgmvots.html

sudo apt-get purge nvidia*
sudo apt-get autoremove
sudo apt-get autoclean
sudo rm -rf /usr/local/cuda*

尝试重新装一个wls-ubuntu版本cuda

wget <https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin>
sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget <https://developer.download.nvidia.com/compute/cuda/12.2.0/local_installers/cuda-repo-wsl-ubuntu-12-2-local_12.2.0-1_amd64.deb>
sudo dpkg -i cuda-repo-wsl-ubuntu-12-2-local_12.2.0-1_amd64.deb
sudo cp /var/cuda-repo-wsl-ubuntu-12-2-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda

依然没有用，哈哈

尝试直接windows安装驱动，成功在WSL2里也能用

参考：WSL ubuntu 显卡驱动报错 NVIDIA-SMI has failed because it couldn‘t communicate with the NVIDIA driver.

未知原因，如果命令只写nvidia-smi，风扇那里会写ERR!，但nvidia-smi.exe就一切正常。

记得在.bashrc里加环境变量

export PATH=$PATH:/usr/local/cuda/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/lib/x86_64-linux-gnu

解决驱动版本与CUDA版本不匹配的问题

跑cuda程序报错，显示驱动版本不匹配

ERROR: /home/rthete/CUDA_Freshman/3_sum_arrays/sum_arrays.cu:38,code:35,reason:CUDA driver version is insufficient for CUDA runtime version

在官网查看一下：https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html

发现我474.44的驱动装了个12.2的cuda，显然是不匹配的

按第四步卸载之后继续重装CUDA 11.4

wget <https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin>
sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget <https://developer.download.nvidia.com/compute/cuda/11.4.0/local_installers/cuda-repo-wsl-ubuntu-11-4-local_11.4.0-1_amd64.deb>
sudo dpkg -i cuda-repo-wsl-ubuntu-11-4-local_11.4.0-1_amd64.deb
sudo apt-key add /var/cuda-repo-wsl-ubuntu-11-4-local/7fa2af80.pub
sudo apt-get update
sudo apt-get -y install cuda

sudo apt-get -y install cuda时又遇到问题

rthete@DESKTOP-PO8BKKM:~$ sudo apt-get -y install cuda
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 libcufile-11-4 : Depends: liburcu6 but it is not installable
E: Unable to correct problems, you have held broken packages.

参考：WSL2+python下各种折腾

配置CUDA时需要liburcu6，但是Ubuntu22.04已经没有这个源了，需要自己下载安装，所以为了避免麻烦最好还是用Ubuntu20.04。

手动安装缺失的库之后可以正常install cuda了

wget <http://archive.ubuntu.com/ubuntu/pool/main/libu/liburcu/liburcu6_0.11.1-2_amd64.deb>
sudo dpkg -i liburcu6_0.11.1-2_amd64.deb

解决gcc版本问题

正常装好cuda了，编译一下hello_world.cu，make依然报错unsupported GNU version

rthete@DESKTOP-PO8BKKM:~/CUDA_Freshman/0_hello_world/build$ make
[ 50%] Building NVCC (Device) object CMakeFiles/hello_world.dir/hello_world_generated_hello_world.cu.o
In file included from /usr/local/cuda/include/cuda_runtime.h:83,
                 from <command-line>:
/usr/local/cuda/include/crt/host_config.h:139:2: error: #error -- unsupported GNU version! gcc versions later than 10 are not supported! The nvcc flag '-allow-unsupported-compiler' can be used to override this version check; however, using an unsupported host compiler may cause compilation failure or incorrect run time execution. Use at your own risk.
  139 | #error -- unsupported GNU version! gcc versions later than 10 are not supported! The nvcc flag '-allow-unsupported-compiler' can be used to override this version check; however, using an unsupported host compiler may cause compilation failure or incorrect run time execution. Use at your own risk.
      |  ^~~~~
CMake Error at hello_world_generated_hello_world.cu.o.cmake:220 (message):
  Error generating
  /home/rthete/CUDA_Freshman/0_hello_world/build/CMakeFiles/hello_world.dir//./hello_world_generated_hello_world.cu.o

make[2]: *** [CMakeFiles/hello_world.dir/build.make:77: CMakeFiles/hello_world.dir/hello_world_generated_hello_world.cu.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:82: CMakeFiles/hello_world.dir/all] Error 2
make: *** [Makefile:91: all] Error 2

参考以下解决：

CUDA编译报错unsupported GNU version! gcc versions later than 10 are not supported!

Ubuntu18.04——切换gcc版本 / 报错解决: error -- unsupported GNU version gcc later than 10 are not supported

首先查看gcc版本，发现是gcc-11

rthete@DESKTOP-PO8BKKM:~/CUDA_Freshman/0_hello_world/build$ ls /usr/bin/gcc* -l
lrwxrwxrwx 1 root root  6 Aug  5  2021 /usr/bin/gcc -> gcc-11
lrwxrwxrwx 1 root root 23 Jan 16 18:17 /usr/bin/gcc-11 -> x86_64-linux-gnu-gcc-11
lrwxrwxrwx 1 root root  9 Aug  5  2021 /usr/bin/gcc-ar -> gcc-ar-11
lrwxrwxrwx 1 root root 26 Jan 16 18:17 /usr/bin/gcc-ar-11 -> x86_64-linux-gnu-gcc-ar-11
lrwxrwxrwx 1 root root  9 Aug  5  2021 /usr/bin/gcc-nm -> gcc-nm-11
lrwxrwxrwx 1 root root 26 Jan 16 18:17 /usr/bin/gcc-nm-11 -> x86_64-linux-gnu-gcc-nm-11
lrwxrwxrwx 1 root root 13 Aug  5  2021 /usr/bin/gcc-ranlib -> gcc-ranlib-11
lrwxrwxrwx 1 root root 30 Jan 16 18:17 /usr/bin/gcc-ranlib-11 -> x86_64-linux-gnu-gcc-ranlib-11

安装gcc-10

sudo apt-get install gcc-10
sudo apt-get install g++-10

指定软链接

# 删除原先的软链接
sudo rm gcc
# 新建gcc-10到gcc的软链接
sudo ln -s /usr/bin/gcc-10 /usr/bin/gcc
# 删除原先的软链接
sudo rm g++
# 新建gcc-10到gcc的软链接
sudo ln -s /usr/bin/g++-10 /usr/bin/g++

即可正常编译。

解决编译无报错但依然无法使用GPU加速的问题

运行编译得到hello_world.exe做测试，发现只会输出CPU的打印：

rthete@DESKTOP-PO8BKKM:~/test$ ./hello_world
CPU：Hello World!

参考：CUDA编程（三）：Hello world 为程序增加错误处理模块

运行后可以看到报错：

rthete@DESKTOP-PO8BKKM:~/test$ ./hello_world
CPU：Hello World!
CUDA Error: no kernel image is available for execution on the device

在nvidia官网查询自己的GPU算力：

您的 GPU 计算能力

要想正常运行cuda程序，需要加上-arch sm_35，因为本机的GPU算力太低

nvcc -arch sm_35 hello_world.cu -o hello_world

即可正常运行

rthete@DESKTOP-PO8BKKM:~/test$ nvcc -arch sm_35 hello_world.cu -o hello_world
nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
rthete@DESKTOP-PO8BKKM:~/test$ ./hello_world
CPU：Hello World!
GPU：Hello World!
GPU：Hello World!
GPU：Hello World!
GPU：Hello World!
GPU：Hello World!
GPU：Hello World!
GPU：Hello World!
GPU：Hello World!
GPU：Hello World!
GPU：Hello World!

写成一个简单的CMakeLists方便编译的话是这样：

cmake_minimum_required(VERSION 3.4)

project(test_cuda)

set(CUDA_NVCC_FLAGS -arch=sm_35;-G;-g)

find_package(CUDA)

CUDA_ADD_EXECUTABLE(hello_world hello_world.cu)

posted @ 2023-07-13 16:46 rthete 阅读(10497) 评论(2) 编辑收藏举报

刷新页面返回顶部

Loading

Twice!