Ubuntu16.04安装cuda和pytorch
1.安装cuda
pytorch可以不依赖GPU运行,但是如果需要使用NVIDIA的GPU,则需要安装cuda
查看是否安装cuda
lintong@master:~$ nvcc -V 程序“nvcc”尚未安装。 您可以使用以下命令安装: sudo apt install nvidia-cuda-toolkit
查看GPU型号,GPU型号是GTX1050Ti
lspci | grep -i nvidia 01:00.0 VGA compatible controller: NVIDIA Corporation GP107 [GeForce GTX 1050 Ti] (rev a1) 01:00.1 Audio device: NVIDIA Corporation GP107GL High Definition Audio Controller (rev a1)
查看是否安装NVIDIA GPU的驱动,驱动的版本是430.64,最高能支持到的cuda版本是10.1
nvidia-smi Sun Oct 23 20:27:21 2022 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 430.64 Driver Version: 430.64 CUDA Version: 10.1 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 105... Off | 00000000:01:00.0 On | N/A | | 40% 29C P8 N/A / 100W | 370MiB / 4036MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 1106 G /usr/lib/xorg/Xorg 259MiB | | 0 28186 G compiz 106MiB | | 0 28455 G ...AAAAAAAAAAAACAAAAAAAAAA= --shared-files 1MiB | +-----------------------------------------------------------------------------+
去官方下载runfile来安装cuda
https://developer.nvidia.com/cuda-toolkit-archive
安装,选择continue
accept
去除driver选项,然后选择install
安装完成
sudo sh cuda_10.1.243_418.87.00_linux.run =========== = Summary = =========== Driver: Not Selected Toolkit: Installed in /usr/local/cuda-10.1/ Samples: Installed in /home/lintong/, but missing recommended libraries Please make sure that - PATH includes /usr/local/cuda-10.1/bin - LD_LIBRARY_PATH includes /usr/local/cuda-10.1/lib64, or, add /usr/local/cuda-10.1/lib64 to /etc/ld.so.conf and run ldconfig as root To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-10.1/bin Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-10.1/doc/pdf for detailed information on setting up CUDA. ***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 418.00 is required for CUDA 10.1 functionality to work. To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file: sudo <CudaInstaller>.run --silent --driver Logfile is /var/log/cuda-installer.log
在~/.bashrc或者/etc/profile中添加,然后source
# cuda export PATH=/usr/local/cuda-10.1/bin${PATH:+:${PATH}} export LD_LIBRARY_PATH=/usr/local/cuda-10.1/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
验证是否安装成功
lintong@master:~/下载$ nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2019 NVIDIA Corporation Built on Sun_Jul_28_19:07:16_PDT_2019 Cuda compilation tools, release 10.1, V10.1.243
禁用 Nouveau,编译/etc/modprobe.d/blacklist.conf
,添加
blacklist nouveau options nouveau modeset=0
若下面命令没有任何输出,则说明禁用成功
lsmod | grep nouveau
更新并重启
sudo update-initramfs -u sudo reboot
2.安装nvidia驱动
重启后发现nvidia驱动掉了,nvidia-smi命令无法正常工作,导致ubuntu的图形界面无法登入,所以要使用terminal再次安装nvidia驱动
关闭图形界面
sudo service lightdm stop
卸载原有的驱动
sudo apt-get remove nvidia-*
下载最新的nvidia驱动,这里的版本是515.76
https://www.nvidia.com/Download/index.aspx
下载和安装nvidia驱动
wget https://us.download.nvidia.cn/XFree86/Linux-x86_64/515.76/NVIDIA-Linux-x86_64-515.76.run sudo chmod +x ./NVIDIA-Linux-x86_64-515.76.run sudo ./NVIDIA-Linux-x86_64-515.76.run -no-x-check -no-nouveau-check -no-opengl-files
安装过程如何选择
1. There appears to already be a driver installed on your system (version: 515.76). As part of installing this driver (version: 515.76), the existing driver will be uninstalled. Are you sure you want to continue? Continue installation Abort installation (选择 Coninue,如果是重装的话) 2. The distribution-provided pre-install script failed! Are you sure you want to continue? Continue installation Abort installation (选择 Cotinue) 3. Would you like to register the kernel module sources with DKMS? This will allow DKMS to automatically build a new module, if you install a different kernel later. Yes No (这里选 No) 4. Install NVIDIA's 32-bit compatibility libraries? Yes No (这里选 No) 5. Installation of the kernel module for the NVIDIA Accelerated Graphics Driver for Linux-x86_64 (version 515.76) is now complete. OK 6.Would you like to run the nvidia-xconfigutility to automatically update your x configuration so that the NVIDIA x driver will be used when you restart x? Any pre-existing x confile will be backed up. Yes No (这里选 Yes)
reboot重启或者启动图形界面
sudo service lightdm start
安装成功,ubuntu图形界面也恢复正常
nvidia-smi Tue Oct 25 23:51:22 2022 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 515.76 Driver Version: 515.76 CUDA Version: 11.7 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... Off | 00000000:01:00.0 On | N/A | | 40% 36C P0 N/A / 100W | 371MiB / 4096MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 1106 G /usr/lib/xorg/Xorg 369MiB | +-----------------------------------------------------------------------------+
3.安装cuDNN
参考:https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html#installlinux
cuDNN是GPU 加速的深度神经网络基元库,官网地址,下载的时候需要注册nvidia账号
https://developer.nvidia.com/rdp/cudnn-archive
下载的文件:cudnn-linux-x86_64-8.5.0.96_cuda10-archive.tar.xz
安装
sudo tar -xvf cudnn-linux-x86_64-8.5.0.96_cuda10-archive.tar.xz sudo cp cudnn-*-archive/include/cudnn*.h /usr/local/cuda/include sudo cp -P cudnn-*-archive/lib/libcudnn* /usr/local/cuda/lib64 sudo chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn*
下载的文件:cudnn-local-repo-ubuntu1604-8.5.0.96_1.0-1_amd64.deb
安装
sudo dpkg -i ./cudnn-local-repo-ubuntu1604-8.5.0.96_1.0-1_amd64.deb
验证cudnn是否安装成功
python3.6 Python 3.6.13 (default, Feb 20 2021, 21:42:50) [GCC 5.4.0 20160609] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from torch.backends import cudnn >>> print(cudnn.is_available()) True
4.安装pytorch
pytorch官方安装文档
https://pytorch.org/get-started/locally/
使用pytorch进行验证GPU是否可用
如果遇到下面报错的话,说明nvidia驱动的版本过低,则需要重新安装最新的版本,这里是由于安装了430.64的低版本,重新安装515.76的最新版本后就不会报错了
python3.6 Python 3.6.13 (default, Feb 20 2021, 21:42:50) [GCC 5.4.0 20160609] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import torch >>> print(torch.cuda.is_available()) /home/lintong/.local/lib/python3.6/site-packages/torch/cuda/__init__.py:80: UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 10010). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:112.) return torch._C._cuda_getDeviceCount() > 0 False
安装515.76版本后
python3.6 Python 3.6.13 (default, Feb 20 2021, 21:42:50) [GCC 5.4.0 20160609] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import torch >>> print(torch.cuda.is_available()) True >>> print(torch.cuda.device_count()) 1 >>> print(torch.cuda.get_device_name(0)) 'NVIDIA GeForce GTX 1050 Ti'
本文只发表于博客园和tonglin0325的博客,作者:tonglin0325,转载请注明原文链接:https://www.cnblogs.com/tonglin0325/p/5736920.html