centos 安装cuda

零 修订记录

序号 修订内容 修订时间
1 新增 2021/1/20

一 摘要

本文主要介绍centos 8.1 安装cuda

二 环境信息

(一) 操作系统

[root@ussuritest004 ~]# cat /etc/centos-release
CentOS Linux release 8.1.1911 (Core)
[root@ussuritest004 ~]#

(二) cuda 版本

我这里用的是
cuda_10.2.89_440.33.01_linux.run

三 实施

(一)准备工作

3.1.1 检查机器是否装有支持cuda的gpu

[root@ussuritest004 software]# lspci | grep -i nvidia
af:00.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)
[root@ussuritest004 software]#

3.1.2 下载

此处先略

(二) runfile 安装

3.2.1 安装基础依赖

[root@ussuritest004 yum.repos.d]# yum install gcc

这个可以不要

[root@ussuritest004 yum.repos.d]# yum install libglu1-mesa libxi-dev libxmu-dev libglu1-mesa-dev freeglut3-dev

3.2.2 关闭 the Nouveau drivers

3.2.2.1 检查nouveau 驱动是否启动

[root@ussuritest004 log]#  lsmod | grep nouveau
nouveau              2215936  1
mxm_wmi                16384  1 nouveau
video                  45056  1 nouveau
wmi                    32768  2 mxm_wmi,nouveau
i2c_algo_bit           16384  2 ast,nouveau
drm_kms_helper        217088  2 ast,nouveau
ttm                   110592  2 ast,nouveau
drm                   524288  7 drm_kms_helper,ast,ttm,nouveau
[root@ussuritest004 log]#

有输出表示启动了。

3.2.2.2 关闭nouveau 驱动

3.2.2.2.1 新增黑名单

To disable the Nouveau drivers, creating a file at "/usr/lib/modprobe.d/blacklist-nouveau.conf" with following content:

blacklist nouveau

options nouveau modeset=0
[root@ussuritest004 ~]# ll /usr/lib/modprobe.d/blacklist-nouveau.conf
ls: cannot access '/usr/lib/modprobe.d/blacklist-nouveau.conf': No such file or directory
[root@ussuritest004 ~]# vim /usr/lib/modprobe.d/blacklist-nouveau.conf
[root@ussuritest004 ~]#

[root@ussuritest004 ~]# cat /usr/lib/modprobe.d/blacklist-nouveau.conf
blacklist nouveau
options nouveau modeset=0
[root@ussuritest004 ~]#

3.2.2.2.2 重新生成 kernel inittramfs

先备份

[root@ussuritest004 boot]# uname -r
4.18.0-147.el8.x86_64
[root@ussuritest004 boot]# cp /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak.orig
[root@ussuritest004 boot]# ll
total 167724
-rw-------. 1 root root  3838259 Dec  5  2019 System.map-4.18.0-147.el8.x86_64
-rw-r--r--. 1 root root   184613 Dec  5  2019 config-4.18.0-147.el8.x86_64
drwxr-xr-x. 3 root root     4096 Jan 19 11:44 efi
drwx------. 4 root root     4096 Jan 19 15:02 grub2
-rw-------. 1 root root 71694380 Jan 19 11:49 initramfs-0-rescue-c7dcb861dc20453f8e275d6036842581.img
-rw-------. 1 root root 30310567 Jan 19 11:50 initramfs-4.18.0-147.el8.x86_64.img
-rw-------  1 root root 30310567 Jan 20 13:50 initramfs-4.18.0-147.el8.x86_64.img.bak.orig
-rw-------. 1 root root 19141009 Jan 19 11:57 initramfs-4.18.0-147.el8.x86_64kdump.img
drwxr-xr-x. 3 root root     4096 Jan 19 11:47 loader
drwx------. 2 root root    16384 Jan 19 11:28 lost+found
-rwxr-xr-x. 1 root root  8106744 Jan 19 11:48 vmlinuz-0-rescue-c7dcb861dc20453f8e275d6036842581
-rwxr-xr-x. 1 root root  8106744 Dec  5  2019 vmlinuz-4.18.0-147.el8.x86_64
[root@ussuritest004 boot]#

再重新生成

[root@ussuritest004 boot]# dracut  /boot/initramfs-$(uname -r).img --force
[root@ussuritest004 boot]# ll
total 166988
-rw-------. 1 root root  3838259 Dec  5  2019 System.map-4.18.0-147.el8.x86_64
-rw-r--r--. 1 root root   184613 Dec  5  2019 config-4.18.0-147.el8.x86_64
drwxr-xr-x. 3 root root     4096 Jan 19 11:44 efi
drwx------. 4 root root     4096 Jan 19 15:02 grub2
-rw-------. 1 root root 71694380 Jan 19 11:49 initramfs-0-rescue-c7dcb861dc20453f8e275d6036842581.img
-rw-------. 1 root root 29560525 Jan 20 13:53 initramfs-4.18.0-147.el8.x86_64.img
-rw-------  1 root root 30310567 Jan 20 13:50 initramfs-4.18.0-147.el8.x86_64.img.bak.orig
-rw-------. 1 root root 19141009 Jan 19 11:57 initramfs-4.18.0-147.el8.x86_64kdump.img
drwxr-xr-x. 3 root root     4096 Jan 19 11:47 loader
drwx------. 2 root root    16384 Jan 19 11:28 lost+found
-rwxr-xr-x. 1 root root  8106744 Jan 19 11:48 vmlinuz-0-rescue-c7dcb861dc20453f8e275d6036842581
-rwxr-xr-x. 1 root root  8106744 Dec  5  2019 vmlinuz-4.18.0-147.el8.x86_64
[root@ussuritest004 boot]#

3.2.2.3 运行级别修改为文本模式

[root@ussuritest004 boot]# systemctl set-default multi-user.target
Removed /etc/systemd/system/default.target.
Created symlink /etc/systemd/system/default.target → /usr/lib/systemd/system/multi-user.target.
[root@ussuritest004 boot]#

修改完重启机器

3.2.3 安装cuda_10.2.89_440.33.01_linux.run

3.2.3.1 step by step

[root@ussuritest004 software]# sh cuda_10.2.89_440.33.01_linux.run

该命令执行后需要等一段时间
输入accept

选择install

装失败了
报错日志

[root@ussuritest004 log]# cat nvidia-installer.log
nvidia-installer log file '/var/log/nvidia-installer.log'
creation time: Wed Jan 20 14:59:43 2021
installer version: 440.33.01

PATH: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin

nvidia-installer command line:
    ./nvidia-installer
    --ui=none
    --no-questions
    --accept-license
    --disable-nouveau
    --no-cc-version-check
    --install-libglvnd

Using built-in stream user interface
-> Detected 48 CPUs online; setting concurrency level to 32.
-> Installing NVIDIA driver version 440.33.01.
WARNING: One or more modprobe configuration files to disable Nouveau are already present at: /usr/lib/modprobe.d/nvidia-installer-disable-nouveau.conf, /etc/modprobe.d/nvidia-installer-disable-nouveau.conf.  Please be sure you have rebooted your system since these files were written.  If you have rebooted, then Nouveau may be enabled for other reasons, such as being included in the system initial ramdisk or in your X configuration file.  Please consult the NVIDIA driver README and your Linux distribution's documentation for details on how to correctly disable the Nouveau kernel driver.
-> For some distributions, Nouveau can be disabled by adding a file in the modprobe configuration directory.  Would you like nvidia-installer to attempt to create this modprobe file for you? (Answer: Yes)
-> One or more modprobe configuration files to disable Nouveau have been written.  For some distributions, this may be sufficient to disable Nouveau; other distributions may require modification of the initial ramdisk.  Please reboot your system and attempt NVIDIA driver installation again.  Note if you later wish to reenable Nouveau, you will need to delete these files: /usr/lib/modprobe.d/nvidia-installer-disable-nouveau.conf, /etc/modprobe.d/nvidia-installer-disable-nouveau.conf
ERROR: Unable to find the development tool `make` in your path; please make sure that you have the package 'make' installed.  If make is installed on your system, then please check that `make` is in your PATH.
ERROR: Installation has failed.  Please see the file '/var/log/nvidia-installer.log' for details.  You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.
[root@ussuritest004 log]#

posted on 2021-01-20 13:58  weiwei2021  阅读(1987)  评论(0编辑  收藏  举报