Ubuntu16.04安装cuda和pytorch

1.安装cuda

参考:Ubuntu下安装CUDA

pytorch可以不依赖GPU运行,但是如果需要使用NVIDIA的GPU,则需要安装cuda

查看是否安装cuda

lintong@master:~$ nvcc -V
程序“nvcc”尚未安装。 您可以使用以下命令安装:
sudo apt install nvidia-cuda-toolkit

查看GPU型号,GPU型号是GTX1050Ti

lspci | grep -i nvidia
01:00.0 VGA compatible controller: NVIDIA Corporation GP107 [GeForce GTX 1050 Ti] (rev a1)
01:00.1 Audio device: NVIDIA Corporation GP107GL High Definition Audio Controller (rev a1)

查看是否安装NVIDIA GPU的驱动,驱动的版本是430.64,最高能支持到的cuda版本是10.1

nvidia-smi
Sun Oct 23 20:27:21 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.64       Driver Version: 430.64       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 105...  Off  | 00000000:01:00.0  On |                  N/A |
| 40%   29C    P8    N/A / 100W |    370MiB /  4036MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1106      G   /usr/lib/xorg/Xorg                           259MiB |
|    0     28186      G   compiz                                       106MiB |
|    0     28455      G   ...AAAAAAAAAAAACAAAAAAAAAA= --shared-files     1MiB |
+-----------------------------------------------------------------------------+

去官方下载runfile来安装cuda

https://developer.nvidia.com/cuda-toolkit-archive

 

安装,选择continue

accept

去除driver选项,然后选择install

安装完成

sudo sh cuda_10.1.243_418.87.00_linux.run
===========
= Summary =
===========

Driver:   Not Selected
Toolkit:  Installed in /usr/local/cuda-10.1/
Samples:  Installed in /home/lintong/, but missing recommended libraries

Please make sure that
 -   PATH includes /usr/local/cuda-10.1/bin
 -   LD_LIBRARY_PATH includes /usr/local/cuda-10.1/lib64, or, add /usr/local/cuda-10.1/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-10.1/bin

Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-10.1/doc/pdf for detailed information on setting up CUDA.
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 418.00 is required for CUDA 10.1 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
    sudo <CudaInstaller>.run --silent --driver

Logfile is /var/log/cuda-installer.log

 在~/.bashrc或者/etc/profile中添加,然后source

# cuda
export PATH=/usr/local/cuda-10.1/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-10.1/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

验证是否安装成功

lintong@master:~/下载$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

禁用 Nouveau,编译/etc/modprobe.d/blacklist.conf,添加

blacklist nouveau
options nouveau modeset=0

若下面命令没有任何输出,则说明禁用成功

lsmod | grep nouveau

更新并重启

sudo update-initramfs -u
sudo reboot

2.安装nvidia驱动

重启后发现nvidia驱动掉了,nvidia-smi命令无法正常工作,导致ubuntu的图形界面无法登入,所以要使用terminal再次安装nvidia驱动

关闭图形界面

sudo service lightdm stop

卸载原有的驱动

sudo apt-get remove nvidia-*

下载最新的nvidia驱动,这里的版本是515.76

https://www.nvidia.com/Download/index.aspx

下载和安装nvidia驱动

wget https://us.download.nvidia.cn/XFree86/Linux-x86_64/515.76/NVIDIA-Linux-x86_64-515.76.run
sudo chmod +x ./NVIDIA-Linux-x86_64-515.76.run
sudo ./NVIDIA-Linux-x86_64-515.76.run -no-x-check -no-nouveau-check -no-opengl-files

安装过程如何选择

1. There appears to already be a driver installed on your system (version:      
  515.76).  As part of installing this driver (version: 515.76), the existing  
  driver will be uninstalled.  Are you sure you want to continue? 
  Continue installation      Abort installation 
(选择 Coninue,如果是重装的话)
2. The distribution-provided pre-install script failed!  Are you sure you want
  to continue?                                                                 
 Continue installation      Abort installation       
(选择 Cotinue)
3. Would you like to register the kernel module sources with DKMS? This will    
  allow DKMS to automatically build a new module, if you install a different   
  kernel later.
Yes                       No  
(这里选 No)
4. Install NVIDIA's 32-bit compatibility libraries?
    Yes                       No  
(这里选 No)
5. Installation of the kernel module for the NVIDIA Accelerated Graphics Driver
  for Linux-x86_64 (version 515.76) is now complete.                           
  OK
6.Would you like to run the nvidia-xconfigutility to automatically update your x configuration so that the NVIDIA x driver will be used when you restart x? Any pre-existing x confile will be backed up. 
    Yes                       No  
(这里选 Yes)

reboot重启或者启动图形界面

sudo service lightdm start

安装成功,ubuntu图形界面也恢复正常

nvidia-smi
Tue Oct 25 23:51:22 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.76       Driver Version: 515.76       CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  N/A |
| 40%   36C    P0    N/A / 100W |    371MiB /  4096MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1106      G   /usr/lib/xorg/Xorg                369MiB |
+-----------------------------------------------------------------------------+

 

3.安装cuDNN

参考:https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html#installlinux

cuDNN是GPU 加速的深度神经网络基元库,官网地址,下载的时候需要注册nvidia账号

https://developer.nvidia.com/rdp/cudnn-archive

下载的文件:cudnn-linux-x86_64-8.5.0.96_cuda10-archive.tar.xz

安装

sudo tar  -xvf cudnn-linux-x86_64-8.5.0.96_cuda10-archive.tar.xz
sudo cp cudnn-*-archive/include/cudnn*.h /usr/local/cuda/include
sudo cp -P cudnn-*-archive/lib/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn*

下载的文件:cudnn-local-repo-ubuntu1604-8.5.0.96_1.0-1_amd64.deb

安装

sudo dpkg -i ./cudnn-local-repo-ubuntu1604-8.5.0.96_1.0-1_amd64.deb

验证cudnn是否安装成功

python3.6
Python 3.6.13 (default, Feb 20 2021, 21:42:50)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from torch.backends import cudnn
>>> print(cudnn.is_available())
True

4.安装pytorch

pytorch官方安装文档

https://pytorch.org/get-started/locally/

使用pytorch进行验证GPU是否可用

如果遇到下面报错的话,说明nvidia驱动的版本过低,则需要重新安装最新的版本,这里是由于安装了430.64的低版本,重新安装515.76的最新版本后就不会报错了

python3.6
Python 3.6.13 (default, Feb 20 2021, 21:42:50)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print(torch.cuda.is_available())
/home/lintong/.local/lib/python3.6/site-packages/torch/cuda/__init__.py:80: UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 10010). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at  ../c10/cuda/CUDAFunctions.cpp:112.)
  return torch._C._cuda_getDeviceCount() > 0
False

安装515.76版本后

python3.6
Python 3.6.13 (default, Feb 20 2021, 21:42:50)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print(torch.cuda.is_available())
True
>>> print(torch.cuda.device_count())
1
>>> print(torch.cuda.get_device_name(0))
'NVIDIA GeForce GTX 1050 Ti'

  

posted @ 2016-08-04 15:33  tonglin0325  阅读(446)  评论(0编辑  收藏  举报