Loading

cuda-cudnn

查看服务器GPU信息



##安装lspci
yum   -y install   pciutils-3.5.1-3.el7.x86_64

Linux查看显卡信息,gpu型号:
lspci | grep -i vga
17:00.0 VGA compatible controller: NVIDIA Corporation Device 1e04 (rev a1)
65:00.0 VGA compatible controller: NVIDIA Corporation Device 1e04 (rev a1)

lspci -v -s   17:00.0
17:00.0 VGA compatible controller: NVIDIA Corporation Device 1e04 (rev a1) (prog-if 00 [VGA controller])
        Subsystem: ZOTAC International (MCO) Ltd. Device 2503
        Flags: bus master, fast devsel, latency 0, IRQ 68, NUMA node 0
        Memory at b4000000 (32-bit, non-prefetchable) [size=16M]
        Memory at 380060000000 (64-bit, prefetchable) [size=256M]
        Memory at 380070000000 (64-bit, prefetchable) [size=32M]
        I/O ports at 7000 [size=128]
        [virtual] Expansion ROM at b5000000 [disabled] [size=512K]
        Capabilities: <access denied>
        Kernel driver in use: nvidia
        Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia


使用nvidia GPU可以:
lspci | grep -i nvidia
驱动版本(可能不正确,和nvidia-smi 不一至):
dpkg --list | grep nvidia-* 



根据pci 号查gpu的型号

lspci | grep -i vga
17:00.0 VGA compatible controller: NVIDIA Corporation Device 1e04 (rev a1)
65:00.0 VGA compatible controller: NVIDIA Corporation Device 1e04 (rev a1)

http://pci-ids.ucw.cz/mods/PC/10de?action=help?help=pci

nvidia驱动

https://download.nvidia.com/XFree86/Linux-x86_64/435.21/

根据驱动适配的cuda版本

https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html

下载cuda及cudnn

cuda
https://developer.nvidia.com/cuda-toolkit-archive

cudnn
https://developer.download.nvidia.cn/compute/machine-learning/repos/

cuda cudnn 版本

cat /usr/local/cuda/version.txt

cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2

进行 cudn的测试:

  1. 编译samples例子 
    进入到Samples安装目录,然后在该目录下终端输入make,等待十来分钟。
  2. 编译完成后测试 
    可以在Samples里面找到bin/x86_64/linux/release/目录,并切换到该目录 
    运行deviceQuery程序,sudo ./deviceQuery 
    查看输出结果,重点关注最后一行,Pass表示通过测试

tensorflow中GPU的测试,python3:

import tensorflow as tf
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
import tensorflow as tf
print('tensorflow version: %s \n' %(tf.__version__))
print('tensorflow path: %s \n' %(tf.__path__))
print("GPU Available: %s \n" %( tf.test.is_gpu_available()))

卸载驱动

deb 安装
sudo apt-get remove --auto-remove nvidia-cuda-toolkit
sudo apt-get remove --auto-remove  cudnn*

cuDNN卸载
sudo rm -rf /usr/local/cuda/include/cudnn.h
sudo rm -rf /usr/local/cuda/lib64/libcudnn*

run 安装
sudo /usr/local/cuda-8.0/bin/uninstall_cuda_8.0.pl
sudo rm -rf /usr/local/cuda-8.0/

cudaxxxxx.run 安装

(是否同意条款,必须同意才能继续安装)
accept/decline/quit: accept

(这里不要安装驱动,因为已经安装最新的驱动了,否则可能会安装旧版本的显卡驱动,导致重复登录的情况)
Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 410.48?
(y)es/(n)o/(q)uit: n

Install the CUDA 10.0 Toolkit?(是否安装CUDA 10 ,这里必须要安装)
(y)es/(n)o/(q)uit: y

Enter Toolkit Location(安装路径,使用默认,直接回车就行)
 [ default is /usr/local/cuda-10.0 ]:  

Do you want to install a symbolic link at /usr/local/cuda?(同意创建软链接)
(y)es/(n)o/(q)uit: y

Install the CUDA 10.0 Samples?(不用安装测试,本身就有了)
(y)es/(n)o/(q)uit: n

Installing the CUDA Toolkit in /usr/local/cuda-10.0 ...(开始安装)


安装完成之后,可以配置他们的环境变量,在vim ~/.bashrc的最后加上以下配置信息:

export CUDA_HOME=/usr/local/cuda-10.0
export LD_LIBRARY_PATH=${CUDA_HOME}/lib64
export PATH=${CUDA_HOME}/bin:${PATH}
最后使用命令source ~/.bashrc使它生效。

可以使用命令nvcc -V查看安装的版本信息:

test@test:~$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130





测试安装是否成功
执行以下几条命令:
cd /usr/local/cuda-10.0/samples/1_Utilities/deviceQuery
make
./deviceQuery
正常情况下输出:

./deviceQuery Starting...

cudnn

cudnn-10.0-linux-x64-v7.4.2.24.tgz 
然后对它进行解压,命令如下:

tar -zxvf cudnn-10.0-linux-x64-v7.4.2.24.tgz 
解压之后可以得到以下文件:

cuda/include/cudnn.h
cuda/NVIDIA_SLA_cuDNN_Support.txt
cuda/lib64/libcudnn.so
cuda/lib64/libcudnn.so.7
cuda/lib64/libcudnn.so.7.4.2
cuda/lib64/libcudnn_static.a

使用以下两条命令复制这些文件到CUDA目录下:

cp cuda/lib64/* /usr/local/cuda-10.0/lib64/
cp cuda/include/* /usr/local/cuda-10.0/include/

拷贝完成之后,可以使用以下命令查看CUDNN的版本信息:

cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2


https://cloud.tencent.com/developer/article/1382703

cuda 安装完测试

cd  /usr/local/cuda/samples
sudo   make  

cd  /usr/local/cuda/samples/bin/x86_64/linux/release

sudo  ./deviceQuery
Result = PASS

sudo  ./bandwidthTest
Result = PASS

或者测试

cd /usr/local/cuda-10.0/samples/1_Utilities/deviceQuery
make
./deviceQuery

检测cuda 版本

nvcc --version  #或
nvcc -V  #或
cat /usr/local/cuda/version.txt

cudnn

cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2

卸载cuda

sudo /usr/local/cuda-8.0/bin/uninstall_cuda_8.0.pl
sudo rm -rf /usr/local/cuda-8.0/

全流程搭建深度学习环境:cuda cudnn nvidia驱动安装
https://www.linuxidc.com/Linux/2017-12/149577.htm

posted @ 2019-11-06 18:11  Lust4Life  阅读(779)  评论(0编辑  收藏  举报