Loading

WSL2安装CUDA踩坑Debug全记录

💡 安装的前言+总结:

  • nvidia驱动请在Windows主机上安装
  • wsl系统里安装cuda toolkit前务必对照官网与驱动版本对应
  • Ubuntu可以不用最新版,20.04即可

第一次尝试安装并添加环境变量

  1. 官网安装,选择自己的版本

    https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=22.04&target_type=deb_local

    wget <https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin>
    sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
    wget <https://developer.download.nvidia.com/compute/cuda/12.2.0/local_installers/cuda-repo-ubuntu2204-12-2-local_12.2.0-535.54.03-1_amd64.deb>
    sudo dpkg -i cuda-repo-ubuntu2204-12-2-local_12.2.0-535.54.03-1_amd64.deb
    sudo cp /var/cuda-repo-ubuntu2204-12-2-local/cuda-*-keyring.gpg /usr/share/keyrings/
    sudo apt-get update
    sudo apt-get -y install cuda
  2. 查看nvcc发现没有,但/usr/local/cuda/bin下是有nvcc的

    root@DESKTOP-PO8BKKM:~# nvcc --version
    Command 'nvcc' not found, but can be installed with:
    apt install nvidia-cuda-toolkit

    所以需要添加一下环境变量,就可以了。

    root@DESKTOP-PO8BKKM:~# export PATH=$PATH:/usr/local/cuda/bin
    root@DESKTOP-PO8BKKM:~# nvcc --version
    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2023 NVIDIA Corporation
    Built on Tue_Jun_13_19:16:58_PDT_2023
    Cuda compilation tools, release 12.2, V12.2.91
    Build cuda_12.2.r12.2/compiler.32965470_0

解决nvidia驱动问题

root@DESKTOP-PO8BKKM:~# nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
  1. 尝试按这个解决:NVIDIA-SMI has failed because it couldn‘t communicate with the NVIDIA driver. 最全解决方案!详细!

    root@DESKTOP-PO8BKKM:~# ls /usr/src | grep nvidia
    nvidia-535.54.03
    root@DESKTOP-PO8BKKM:~# sudo apt-get install dkms
    Reading package lists... Done
    Building dependency tree... Done
    Reading state information... Done
    dkms is already the newest version (2.8.7-2ubuntu2.2).
    dkms set to manually installed.
    0 upgraded, 0 newly installed, 0 to remove and 60 not upgraded.
    root@DESKTOP-PO8BKKM:~# sudo dkms install -m nvidia -v 535.54.03
    Error! Your kernel headers for kernel 5.10.16.3-microsoft-standard-WSL2 cannot be found.
    Please install the linux-headers-5.10.16.3-microsoft-standard-WSL2 package or use the --kernelsourcedir option to tell DKMS where it's located.
  2. 卸载CUDA试试看

    参考:http://www.manongjc.com/detail/62-afswonvqlgmvots.html

    sudo apt-get purge nvidia*
    sudo apt-get autoremove
    sudo apt-get autoclean
    sudo rm -rf /usr/local/cuda*
  3. 尝试重新装一个wls-ubuntu版本cuda

    wget <https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin>
    sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600
    wget <https://developer.download.nvidia.com/compute/cuda/12.2.0/local_installers/cuda-repo-wsl-ubuntu-12-2-local_12.2.0-1_amd64.deb>
    sudo dpkg -i cuda-repo-wsl-ubuntu-12-2-local_12.2.0-1_amd64.deb
    sudo cp /var/cuda-repo-wsl-ubuntu-12-2-local/cuda-*-keyring.gpg /usr/share/keyrings/
    sudo apt-get update
    sudo apt-get -y install cuda

    依然没有用,哈哈

  4. 尝试直接windows安装驱动,成功在WSL2里也能用

    参考:WSL ubuntu 显卡驱动报错 NVIDIA-SMI has failed because it couldn‘t communicate with the NVIDIA driver.

    未知原因,如果命令只写nvidia-smi,风扇那里会写ERR!,但nvidia-smi.exe就一切正常。

  5. 记得在.bashrc里加环境变量

    export PATH=$PATH:/usr/local/cuda/bin
    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/lib/x86_64-linux-gnu

解决驱动版本与CUDA版本不匹配的问题

跑cuda程序报错,显示驱动版本不匹配

ERROR: /home/rthete/CUDA_Freshman/3_sum_arrays/sum_arrays.cu:38,code:35,reason:CUDA driver version is insufficient for CUDA runtime version
  1. 在官网查看一下:https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html

    发现我474.44的驱动装了个12.2的cuda,显然是不匹配的

  2. 按第四步卸载之后继续重装CUDA 11.4

    wget <https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin>
    sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600
    wget <https://developer.download.nvidia.com/compute/cuda/11.4.0/local_installers/cuda-repo-wsl-ubuntu-11-4-local_11.4.0-1_amd64.deb>
    sudo dpkg -i cuda-repo-wsl-ubuntu-11-4-local_11.4.0-1_amd64.deb
    sudo apt-key add /var/cuda-repo-wsl-ubuntu-11-4-local/7fa2af80.pub
    sudo apt-get update
    sudo apt-get -y install cuda
  3. sudo apt-get -y install cuda时又遇到问题

    rthete@DESKTOP-PO8BKKM:~$ sudo apt-get -y install cuda
    Reading package lists... Done
    Building dependency tree... Done
    Reading state information... Done
    Some packages could not be installed. This may mean that you have
    requested an impossible situation or if you are using the unstable
    distribution that some required packages have not yet been created
    or been moved out of Incoming.
    The following information may help to resolve the situation:
    
    The following packages have unmet dependencies:
     libcufile-11-4 : Depends: liburcu6 but it is not installable
    E: Unable to correct problems, you have held broken packages.

    参考:WSL2+python下各种折腾

    配置CUDA时需要liburcu6,但是Ubuntu22.04已经没有这个源了,需要自己下载安装,所以为了避免麻烦最好还是用Ubuntu20.04。

    手动安装缺失的库之后可以正常install cuda了

    wget <http://archive.ubuntu.com/ubuntu/pool/main/libu/liburcu/liburcu6_0.11.1-2_amd64.deb>
    sudo dpkg -i liburcu6_0.11.1-2_amd64.deb

解决gcc版本问题

正常装好cuda了,编译一下hello_world.cu,make依然报错unsupported GNU version

rthete@DESKTOP-PO8BKKM:~/CUDA_Freshman/0_hello_world/build$ make
[ 50%] Building NVCC (Device) object CMakeFiles/hello_world.dir/hello_world_generated_hello_world.cu.o
In file included from /usr/local/cuda/include/cuda_runtime.h:83,
                 from <command-line>:
/usr/local/cuda/include/crt/host_config.h:139:2: error: #error -- unsupported GNU version! gcc versions later than 10 are not supported! The nvcc flag '-allow-unsupported-compiler' can be used to override this version check; however, using an unsupported host compiler may cause compilation failure or incorrect run time execution. Use at your own risk.
  139 | #error -- unsupported GNU version! gcc versions later than 10 are not supported! The nvcc flag '-allow-unsupported-compiler' can be used to override this version check; however, using an unsupported host compiler may cause compilation failure or incorrect run time execution. Use at your own risk.
      |  ^~~~~
CMake Error at hello_world_generated_hello_world.cu.o.cmake:220 (message):
  Error generating
  /home/rthete/CUDA_Freshman/0_hello_world/build/CMakeFiles/hello_world.dir//./hello_world_generated_hello_world.cu.o

make[2]: *** [CMakeFiles/hello_world.dir/build.make:77: CMakeFiles/hello_world.dir/hello_world_generated_hello_world.cu.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:82: CMakeFiles/hello_world.dir/all] Error 2
make: *** [Makefile:91: all] Error 2

参考以下解决:

CUDA编译报错unsupported GNU version! gcc versions later than 10 are not supported!

Ubuntu18.04——切换gcc版本 / 报错解决: error -- unsupported GNU version gcc later than 10 are not supported

  1. 首先查看gcc版本,发现是gcc-11

    rthete@DESKTOP-PO8BKKM:~/CUDA_Freshman/0_hello_world/build$ ls /usr/bin/gcc* -l
    lrwxrwxrwx 1 root root  6 Aug  5  2021 /usr/bin/gcc -> gcc-11
    lrwxrwxrwx 1 root root 23 Jan 16 18:17 /usr/bin/gcc-11 -> x86_64-linux-gnu-gcc-11
    lrwxrwxrwx 1 root root  9 Aug  5  2021 /usr/bin/gcc-ar -> gcc-ar-11
    lrwxrwxrwx 1 root root 26 Jan 16 18:17 /usr/bin/gcc-ar-11 -> x86_64-linux-gnu-gcc-ar-11
    lrwxrwxrwx 1 root root  9 Aug  5  2021 /usr/bin/gcc-nm -> gcc-nm-11
    lrwxrwxrwx 1 root root 26 Jan 16 18:17 /usr/bin/gcc-nm-11 -> x86_64-linux-gnu-gcc-nm-11
    lrwxrwxrwx 1 root root 13 Aug  5  2021 /usr/bin/gcc-ranlib -> gcc-ranlib-11
    lrwxrwxrwx 1 root root 30 Jan 16 18:17 /usr/bin/gcc-ranlib-11 -> x86_64-linux-gnu-gcc-ranlib-11
  2. 安装gcc-10

    sudo apt-get install gcc-10
    sudo apt-get install g++-10

    指定软链接

    # 删除原先的软链接
    sudo rm gcc
    # 新建gcc-10到gcc的软链接
    sudo ln -s /usr/bin/gcc-10 /usr/bin/gcc
    # 删除原先的软链接
    sudo rm g++
    # 新建gcc-10到gcc的软链接
    sudo ln -s /usr/bin/g++-10 /usr/bin/g++

    即可正常编译。

解决编译无报错但依然无法使用GPU加速的问题

运行编译得到hello_world.exe做测试,发现只会输出CPU的打印:

rthete@DESKTOP-PO8BKKM:~/test$ ./hello_world
CPU:Hello World!

参考:CUDA编程(三):Hello world 为程序增加错误处理模块

  1. 运行后可以看到报错:

    rthete@DESKTOP-PO8BKKM:~/test$ ./hello_world
    CPU:Hello World!
    CUDA Error: no kernel image is available for execution on the device
  2. 在nvidia官网查询自己的GPU算力:

    您的 GPU 计算能力

  3. 要想正常运行cuda程序,需要加上-arch sm_35,因为本机的GPU算力太低

    nvcc -arch sm_35 hello_world.cu -o hello_world

    即可正常运行

    rthete@DESKTOP-PO8BKKM:~/test$ nvcc -arch sm_35 hello_world.cu -o hello_world
    nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
    rthete@DESKTOP-PO8BKKM:~/test$ ./hello_world
    CPU:Hello World!
    GPU:Hello World!
    GPU:Hello World!
    GPU:Hello World!
    GPU:Hello World!
    GPU:Hello World!
    GPU:Hello World!
    GPU:Hello World!
    GPU:Hello World!
    GPU:Hello World!
    GPU:Hello World!
  4. 写成一个简单的CMakeLists方便编译的话是这样:

    cmake_minimum_required(VERSION 3.4)
    
    project(test_cuda)
    
    set(CUDA_NVCC_FLAGS -arch=sm_35;-G;-g)
    
    find_package(CUDA)
    
    CUDA_ADD_EXECUTABLE(hello_world hello_world.cu)
posted @ 2023-07-13 16:46  rthete  阅读(9095)  评论(1编辑  收藏  举报