WSL2安装CUDA踩坑Debug全记录
💡 安装的前言+总结:
- nvidia驱动请在Windows主机上安装
- wsl系统里安装cuda toolkit前务必对照官网与驱动版本对应
- Ubuntu可以不用最新版,20.04即可
第一次尝试安装并添加环境变量
-
官网安装,选择自己的版本
wget <https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin> sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600 wget <https://developer.download.nvidia.com/compute/cuda/12.2.0/local_installers/cuda-repo-ubuntu2204-12-2-local_12.2.0-535.54.03-1_amd64.deb> sudo dpkg -i cuda-repo-ubuntu2204-12-2-local_12.2.0-535.54.03-1_amd64.deb sudo cp /var/cuda-repo-ubuntu2204-12-2-local/cuda-*-keyring.gpg /usr/share/keyrings/ sudo apt-get update sudo apt-get -y install cuda
-
查看nvcc发现没有,但
/usr/local/cuda/bin
下是有nvcc的root@DESKTOP-PO8BKKM:~# nvcc --version Command 'nvcc' not found, but can be installed with: apt install nvidia-cuda-toolkit
所以需要添加一下环境变量,就可以了。
root@DESKTOP-PO8BKKM:~# export PATH=$PATH:/usr/local/cuda/bin root@DESKTOP-PO8BKKM:~# nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2023 NVIDIA Corporation Built on Tue_Jun_13_19:16:58_PDT_2023 Cuda compilation tools, release 12.2, V12.2.91 Build cuda_12.2.r12.2/compiler.32965470_0
解决nvidia驱动问题
root@DESKTOP-PO8BKKM:~# nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
-
尝试按这个解决:NVIDIA-SMI has failed because it couldn‘t communicate with the NVIDIA driver. 最全解决方案!详细!
root@DESKTOP-PO8BKKM:~# ls /usr/src | grep nvidia nvidia-535.54.03
root@DESKTOP-PO8BKKM:~# sudo apt-get install dkms Reading package lists... Done Building dependency tree... Done Reading state information... Done dkms is already the newest version (2.8.7-2ubuntu2.2). dkms set to manually installed. 0 upgraded, 0 newly installed, 0 to remove and 60 not upgraded.
root@DESKTOP-PO8BKKM:~# sudo dkms install -m nvidia -v 535.54.03 Error! Your kernel headers for kernel 5.10.16.3-microsoft-standard-WSL2 cannot be found. Please install the linux-headers-5.10.16.3-microsoft-standard-WSL2 package or use the --kernelsourcedir option to tell DKMS where it's located.
-
卸载CUDA试试看
参考:http://www.manongjc.com/detail/62-afswonvqlgmvots.html
sudo apt-get purge nvidia* sudo apt-get autoremove sudo apt-get autoclean sudo rm -rf /usr/local/cuda*
-
尝试重新装一个wls-ubuntu版本cuda
wget <https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin> sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600 wget <https://developer.download.nvidia.com/compute/cuda/12.2.0/local_installers/cuda-repo-wsl-ubuntu-12-2-local_12.2.0-1_amd64.deb> sudo dpkg -i cuda-repo-wsl-ubuntu-12-2-local_12.2.0-1_amd64.deb sudo cp /var/cuda-repo-wsl-ubuntu-12-2-local/cuda-*-keyring.gpg /usr/share/keyrings/ sudo apt-get update sudo apt-get -y install cuda
依然没有用,哈哈
-
尝试直接windows安装驱动,成功在WSL2里也能用
参考:WSL ubuntu 显卡驱动报错 NVIDIA-SMI has failed because it couldn‘t communicate with the NVIDIA driver.
未知原因,如果命令只写
nvidia-smi
,风扇那里会写ERR!
,但nvidia-smi.exe
就一切正常。 -
记得在.bashrc里加环境变量
export PATH=$PATH:/usr/local/cuda/bin export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/lib/x86_64-linux-gnu
解决驱动版本与CUDA版本不匹配的问题
跑cuda程序报错,显示驱动版本不匹配
ERROR: /home/rthete/CUDA_Freshman/3_sum_arrays/sum_arrays.cu:38,code:35,reason:CUDA driver version is insufficient for CUDA runtime version
-
在官网查看一下:https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html
发现我474.44的驱动装了个12.2的cuda,显然是不匹配的
-
按第四步卸载之后继续重装CUDA 11.4
wget <https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin> sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600 wget <https://developer.download.nvidia.com/compute/cuda/11.4.0/local_installers/cuda-repo-wsl-ubuntu-11-4-local_11.4.0-1_amd64.deb> sudo dpkg -i cuda-repo-wsl-ubuntu-11-4-local_11.4.0-1_amd64.deb sudo apt-key add /var/cuda-repo-wsl-ubuntu-11-4-local/7fa2af80.pub sudo apt-get update sudo apt-get -y install cuda
-
sudo apt-get -y install cuda
时又遇到问题rthete@DESKTOP-PO8BKKM:~$ sudo apt-get -y install cuda Reading package lists... Done Building dependency tree... Done Reading state information... Done Some packages could not be installed. This may mean that you have requested an impossible situation or if you are using the unstable distribution that some required packages have not yet been created or been moved out of Incoming. The following information may help to resolve the situation: The following packages have unmet dependencies: libcufile-11-4 : Depends: liburcu6 but it is not installable E: Unable to correct problems, you have held broken packages.
配置CUDA时需要liburcu6,但是Ubuntu22.04已经没有这个源了,需要自己下载安装,所以为了避免麻烦最好还是用Ubuntu20.04。
手动安装缺失的库之后可以正常install cuda了
wget <http://archive.ubuntu.com/ubuntu/pool/main/libu/liburcu/liburcu6_0.11.1-2_amd64.deb> sudo dpkg -i liburcu6_0.11.1-2_amd64.deb
解决gcc版本问题
正常装好cuda了,编译一下hello_world.cu,make依然报错unsupported GNU version
rthete@DESKTOP-PO8BKKM:~/CUDA_Freshman/0_hello_world/build$ make
[ 50%] Building NVCC (Device) object CMakeFiles/hello_world.dir/hello_world_generated_hello_world.cu.o
In file included from /usr/local/cuda/include/cuda_runtime.h:83,
from <command-line>:
/usr/local/cuda/include/crt/host_config.h:139:2: error: #error -- unsupported GNU version! gcc versions later than 10 are not supported! The nvcc flag '-allow-unsupported-compiler' can be used to override this version check; however, using an unsupported host compiler may cause compilation failure or incorrect run time execution. Use at your own risk.
139 | #error -- unsupported GNU version! gcc versions later than 10 are not supported! The nvcc flag '-allow-unsupported-compiler' can be used to override this version check; however, using an unsupported host compiler may cause compilation failure or incorrect run time execution. Use at your own risk.
| ^~~~~
CMake Error at hello_world_generated_hello_world.cu.o.cmake:220 (message):
Error generating
/home/rthete/CUDA_Freshman/0_hello_world/build/CMakeFiles/hello_world.dir//./hello_world_generated_hello_world.cu.o
make[2]: *** [CMakeFiles/hello_world.dir/build.make:77: CMakeFiles/hello_world.dir/hello_world_generated_hello_world.cu.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:82: CMakeFiles/hello_world.dir/all] Error 2
make: *** [Makefile:91: all] Error 2
参考以下解决:
CUDA编译报错unsupported GNU version! gcc versions later than 10 are not supported!
Ubuntu18.04——切换gcc版本 / 报错解决: error -- unsupported GNU version gcc later than 10 are not supported
-
首先查看gcc版本,发现是gcc-11
rthete@DESKTOP-PO8BKKM:~/CUDA_Freshman/0_hello_world/build$ ls /usr/bin/gcc* -l lrwxrwxrwx 1 root root 6 Aug 5 2021 /usr/bin/gcc -> gcc-11 lrwxrwxrwx 1 root root 23 Jan 16 18:17 /usr/bin/gcc-11 -> x86_64-linux-gnu-gcc-11 lrwxrwxrwx 1 root root 9 Aug 5 2021 /usr/bin/gcc-ar -> gcc-ar-11 lrwxrwxrwx 1 root root 26 Jan 16 18:17 /usr/bin/gcc-ar-11 -> x86_64-linux-gnu-gcc-ar-11 lrwxrwxrwx 1 root root 9 Aug 5 2021 /usr/bin/gcc-nm -> gcc-nm-11 lrwxrwxrwx 1 root root 26 Jan 16 18:17 /usr/bin/gcc-nm-11 -> x86_64-linux-gnu-gcc-nm-11 lrwxrwxrwx 1 root root 13 Aug 5 2021 /usr/bin/gcc-ranlib -> gcc-ranlib-11 lrwxrwxrwx 1 root root 30 Jan 16 18:17 /usr/bin/gcc-ranlib-11 -> x86_64-linux-gnu-gcc-ranlib-11
-
安装gcc-10
sudo apt-get install gcc-10 sudo apt-get install g++-10
指定软链接
# 删除原先的软链接 sudo rm gcc # 新建gcc-10到gcc的软链接 sudo ln -s /usr/bin/gcc-10 /usr/bin/gcc # 删除原先的软链接 sudo rm g++ # 新建gcc-10到gcc的软链接 sudo ln -s /usr/bin/g++-10 /usr/bin/g++
即可正常编译。
解决编译无报错但依然无法使用GPU加速的问题
运行编译得到hello_world.exe做测试,发现只会输出CPU的打印:
rthete@DESKTOP-PO8BKKM:~/test$ ./hello_world
CPU:Hello World!
参考:CUDA编程(三):Hello world 为程序增加错误处理模块
-
运行后可以看到报错:
rthete@DESKTOP-PO8BKKM:~/test$ ./hello_world CPU:Hello World! CUDA Error: no kernel image is available for execution on the device
-
在nvidia官网查询自己的GPU算力:
-
要想正常运行cuda程序,需要加上-arch sm_35,因为本机的GPU算力太低
nvcc -arch sm_35 hello_world.cu -o hello_world
即可正常运行
rthete@DESKTOP-PO8BKKM:~/test$ nvcc -arch sm_35 hello_world.cu -o hello_world nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). rthete@DESKTOP-PO8BKKM:~/test$ ./hello_world CPU:Hello World! GPU:Hello World! GPU:Hello World! GPU:Hello World! GPU:Hello World! GPU:Hello World! GPU:Hello World! GPU:Hello World! GPU:Hello World! GPU:Hello World! GPU:Hello World!
-
写成一个简单的CMakeLists方便编译的话是这样:
cmake_minimum_required(VERSION 3.4) project(test_cuda) set(CUDA_NVCC_FLAGS -arch=sm_35;-G;-g) find_package(CUDA) CUDA_ADD_EXECUTABLE(hello_world hello_world.cu)