一、环境信息

1、操作系统:CentOS Linux release 7.4.1708 (Core);

2、显卡:NVIDIA GTX1080Ti 11G;

二、安装NVIDIA显卡驱动

1、在官网上http://www.geforce.cn/drivers搜索到对应型号的显卡驱动并下载,下载到的驱动文件是一个后缀名为.run的文件(例如NVIDIA-Linux-x86_64-384.98.run);

2、安装gcc编译环境以及内核相关的包:

yum install kernel-devel kernel-doc kernel-headers gcc\* glibc\*  glibc-\*

    注意:安装内核包时需要先检查一下当前内核版本是否与所要安装的kernel-devel/kernel-doc/kernel-headers的版本一致,请务必保持两者版本一致,否则后续的编译过程会出问题。

# 查看当前内核版本

[root@localhost ~]# uname -a
Linux localhost.localdomain 3.10.0-693.11.1.el7.x86_64 #1 SMP Mon Dec 4 23:52:40 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

[root@localhost ~]# yum list | grep kernel-
kernel-devel.x86_64 3.10.0-693.11.1.el7 @updates
kernel-doc.noarch 3.10.0-693.11.1.el7 @updates
kernel-headers.x86_64 3.10.0-693.11.1.el7 @updates
kernel-tools.x86_64 3.10.0-693.11.1.el7 @updates

    两种方法可以解决版本不一致的问题:

    方法一、升级内核版本,具体升级方法请自行百度;

    方法二、安装与内核版本一致的kernel-devel/kernel-doc/kernel-headers,例如:

yum install "kernel-devel-uname-r == $(uname -r)"

3、禁用系统默认安装的 nouveau 驱动,修改/etc/modprobe.d/blacklist.conf 文件:

# 修改配置
echo -e "blacklist nouveau\noptions nouveau modeset=0" > /etc/modprobe.d/blacklist.conf

# 备份原来的镜像文件
mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak

# 重建新镜像文件
dracut /boot/initramfs-$(uname -r).img $(uname -r)

# 重启
reboot

# 查看nouveau是否启动,如果结果为空即为禁用成功
lsmod | grep nouveau

4、安装DKMS模块

DKMS全称是DynamicKernel ModuleSupport,它可以帮我们维护内核外的驱动程序,在内核版本变动之后可以自动重新生成新的模块。

# 下载安装包
wget http://rpmfind.net/linux/fedora-secondary/releases/25/Everything/aarch64/os/Packages/d/dkms-2.2.0.3-34.git.9e0394d.fc25.noarch.rpm

# 安装
rpm -ivh dkms-2.2.0.3-34.git.9e0394d.fc25.noarch.rpm

5、执行显卡驱动安装脚本(如果内核版本一致,就不需要指定--kernel-source-path和-k)

./NVIDIA-Linux-x86_64-384.98.run --kernel-source-path=/usr/src/kernels/3.10.0-693.11.1.el7.x86_64/ -k $(uname -r) --dkms -s

6、若步骤5执行过程中没报错,则安装成功。重启,执行nvidia-smi可查看相关信息。

7、遇到的问题:

ERROR: Unable to find the kernel source tree for the currently running kernel.  Please make sure you have installed the kernel source files for your kernel and that they are properly configured; on Red Hat Linux systems, for example, be sure you have the 'kernel-source' or 'kernel-devel' RPM installed.  If you know the correct kernel source files are installed, you may specify the kernel source path with the '--kernel-source-path' command line option.

# 解决方法
指定--kernel-source-path选项,例如:
./NVIDIA-Linux-x86_64-384.98.run --kernel-source-path=/usr/src/kernels/3.10.0-693.11.1.el7.x86_64/
ERROR: Unable to load the kernel module 'nvidia.ko'.  This happens most frequently when this kernel module was built against the wrong or improperly configured kernel sources, with a version of gcc that differs from the one used to build the target kernel, or if a driver such as rivafb, nvidiafb, or nouveau is present and prevents the NVIDIA kernel module from obtaining ownership of the NVIDIA graphics device(s), or no NVIDIA GPU installed in this system is supported by this NVIDIA Linux graphics driver release.

# 解决方法
指定-k选项 $(uname -r),例如:
./NVIDIA-Linux-x86_64-384.98.run --kernel-source-path=/usr/src/kernels/3.10.0-693.11.1.el7.x86_64/ -k $(uname -r)
ERROR: The Nouveau kernel driver is currently in use by your system.  This driver is incompatible with the NVIDIA driver, and must be disabled before proceeding.  Please consult the ow to correctly disable the Nouveau kernel driver.

# 解决方法
禁用Nouveau,参见步骤3。
ERROR: Failed to find dkms on the system!

ERROR: Failed to install the kernel module through DKMS. No kernel module was installed; please try installing again without DKMS, or check the DKMS logs for more information.

# 解决方法
安装DKMS模块,参见步骤4。

三、安装CUDA8.0

1、在官网上(https://developer.nvidia.com/cuda-80-ga2-download-archive)下载CUDA,三种方式任选,我选择rpm包的方式:

 

2、安装

rpm -i cuda-repo-rhel7-8-0-local-ga2-8.0.61-1.x86_64.rpm
yum clean all
yum install cuda

3、配置环境变量

vi  ~/.bash_profile

# 添加下面语句
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-8.0/lib64

# 使配置生效
source ~/.bash_profile

4、验证是否安装成功

# 进入CUDA Sample目录
cd /usr/local/cuda-8.0/samples/

# 编译
make

# 运行示例脚本
cd /usr/local/cuda-8.0/samples/1_Utilities/deviceQuery
./deviceQuery

如果CUDA安装成功,并且配置正确,则会打印出如下图所示的显卡相关信息:

四、安装cuDNN5.1

    cuDNN(CUDA Deep Neural Network),是专门针对深度学习框架设计的一套GPU计算加速方案,相比标准的CUDA,它在一些常用的神经网络操作上进行了性能的优化,比如卷积,pooling,归一化,以及激活层等等,详细可以参考官网上的介绍。

1、从官网上(https://developer.nvidia.com/cudnn)下载相关版本的CUDNN(需要先注册账号才能下载):

    注意:要选择CUDA相对应版本的。

 

2、解压并拷贝到系统目录下:

tar xzvf cudnn-8.0-linux-x64-v5.1.tgz
cp cuda/include/cudnn.h /usr/local/cuda/include
cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*

3、验证是否安装成功

    从官网上下载示例程序Code Samples并解压:

# 进入示例目录
cd /usr/src/cudnn_samples_v5/mnistCUDNN

# 编译示例程序
make clean && make

# 运行
./mnistCUDNN 

    如果安装成功,则会看到打印一些相关信息(太长就不贴出来了),最后会显示Test passed!

posted on 2017-12-12 14:19  海韵听涛  阅读(14212)  评论(1编辑  收藏  举报