Ubuntu 20.04安装tensorflow GPU版本(NVIDIA GTX-1060)

1 安装nvidia驱动

$ ubuntu-drivers devices
== /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0 ==
modalias : pci:v000010DEd00001C03sv00001B4Csd000011D7bc03sc00i00
vendor   : NVIDIA Corporation
model    : GP106 [GeForce GTX 1060 6GB]
driver   : nvidia-driver-450-server - distro non-free
driver   : nvidia-driver-450 - distro non-free
driver   : nvidia-driver-390 - distro non-free
driver   : nvidia-driver-460 - distro non-free recommended
driver   : nvidia-driver-418-server - distro non-free
driver   : xserver-xorg-video-nouveau - distro free builtin

安装指定版本的驱动,一般安装推荐的版本(recommended)即可,我此处安装的是450版本。
sudo apt install nvidia-driver-450

安装后重启
sudo reboot

进入系统后,输入nvidia-smi查看当前GPU的基础信息,确认该版本驱动是否安装成功

$ nvidia-smi
Sun Feb 21 16:58:51 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.102.04   Driver Version: 450.102.04   CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 106...  Off  | 00000000:01:00.0  On |                  N/A |
|  0%   57C    P8    10W / 120W |    567MiB /  6075MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A       875      G   /usr/lib/xorg/Xorg                194MiB |
|    0   N/A  N/A      1188      G   /usr/bin/kwin_x11                 116MiB |
|    0   N/A  N/A      1190      G   /usr/bin/plasmashell               41MiB |
|    0   N/A  N/A      1492      G   /usr/bin/plasma-discover           16MiB |
|    0   N/A  N/A      3595      G   /usr/lib/firefox/firefox            1MiB |
|    0   N/A  N/A      3719      G   /usr/lib/firefox/firefox            1MiB |
|    0   N/A  N/A      4053      G   ...gAAAAAAAAA --shared-files      188MiB |
+-----------------------------------------------------------------------------+

2 安装CUDA 10.1

具体安装过程如下:

sudo apt install nvidia-cuda-toolkit

$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

需要注意的是,在Ubuntu 20.04里,CUDA安装在不同的目录下。

$ whereis cuda
cuda: /usr/lib/cuda /usr/include/cuda.h

3 安装与CUDA 10.1兼容版本的cuDNN

下载压缩包cudnn-10.1-linux-x64-v7.6.5.32.tgz:
https://developer.nvidia.com/rdp/cudnn-archive
下载需要登录nvidia账户,并选择版本cuDNN 7.6.5(其他版本cuDNN可能失败,已尝试安装8.0.5,tensorflow运行失败)

$ sudo cp cuda/include/cudnn.h /usr/lib/cuda/include/
$ sudo cp cuda/lib64/libcudnn* /usr/lib/cuda/lib64/
$ sudo chmod a+r /usr/lib/cuda/include/cudnn.h /usr/lib/cuda/lib64/libcudnn*

4 设置CUDA环境变量

$ echo 'export LD_LIBRARY_PATH=/usr/lib/cuda/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
$ echo 'export LD_LIBRARY_PATH=/usr/lib/cuda/include:$LD_LIBRARY_PATH' >> ~/.bashrc
$ source ~/.bashrc

5 验证已安装

$ python3              
Python 3.8.5 (default, Sep  4 2020, 07:30:14) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf               
>>> tf.config.list_physical_devices("GPU")
2021-02-21 17:43:50.205210: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2021-02-21 17:43:50.234635: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-21 17:43:50.234911: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce GTX 1060 6GB computeCapability: 6.1
coreClock: 1.7335GHz coreCount: 10 deviceMemorySize: 5.93GiB deviceMemoryBandwidth: 178.99GiB/s
2021-02-21 17:43:50.235095: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2021-02-21 17:43:50.236187: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2021-02-21 17:43:50.237281: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2021-02-21 17:43:50.237489: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2021-02-21 17:43:50.238605: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2021-02-21 17:43:50.239236: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2021-02-21 17:43:50.241550: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-02-21 17:43:50.241657: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-21 17:43:50.241960: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-21 17:43:50.242156: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

参考资料:
https://towardsdatascience.com/installing-tensorflow-gpu-in-ubuntu-20-04-4ee3ca4cb75d
https://cyfeng.science/2020/05/02/ubuntu-install-nvidia-driver-cuda-cudnn-suits/

posted @ 2021-02-21 08:55  codeRhythm  阅读(1558)  评论(0编辑  收藏  举报