centos 安装cuda

零 修订记录

序号 修订内容 修订时间
1 新增 2021/1/20

一 摘要

本文主要介绍centos 8.1 安装cuda

二 环境信息

(一) 操作系统

[root@ussuritest004 ~]# cat /etc/centos-release
CentOS Linux release 8.1.1911 (Core)
[root@ussuritest004 ~]#

(二) cuda 版本

我这里用的是
cuda_10.2.89_440.33.01_linux.run

三 实施

(一)准备工作

3.1.1 检查机器是否装有支持cuda的gpu

[root@ussuritest004 software]# lspci | grep -i nvidia
af:00.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)
[root@ussuritest004 software]#

3.1.2 下载

此处先略

(二) runfile 安装

3.2.1 安装基础依赖

[root@ussuritest004 yum.repos.d]# yum install gcc

这个可以不要

[root@ussuritest004 yum.repos.d]# yum install libglu1-mesa libxi-dev libxmu-dev libglu1-mesa-dev freeglut3-dev

3.2.2 关闭 the Nouveau drivers

3.2.2.1 检查nouveau 驱动是否启动

[root@ussuritest004 log]#  lsmod | grep nouveau
nouveau              2215936  1
mxm_wmi                16384  1 nouveau
video                  45056  1 nouveau
wmi                    32768  2 mxm_wmi,nouveau
i2c_algo_bit           16384  2 ast,nouveau
drm_kms_helper        217088  2 ast,nouveau
ttm                   110592  2 ast,nouveau
drm                   524288  7 drm_kms_helper,ast,ttm,nouveau
[root@ussuritest004 log]#

有输出表示启动了。

3.2.2.2 关闭nouveau 驱动

3.2.2.2.1 新增黑名单

To disable the Nouveau drivers, creating a file at "/usr/lib/modprobe.d/blacklist-nouveau.conf" with following content:

blacklist nouveau

options nouveau modeset=0
[root@ussuritest004 ~]# ll /usr/lib/modprobe.d/blacklist-nouveau.conf
ls: cannot access '/usr/lib/modprobe.d/blacklist-nouveau.conf': No such file or directory
[root@ussuritest004 ~]# vim /usr/lib/modprobe.d/blacklist-nouveau.conf
[root@ussuritest004 ~]#

[root@ussuritest004 ~]# cat /usr/lib/modprobe.d/blacklist-nouveau.conf
blacklist nouveau
options nouveau modeset=0
[root@ussuritest004 ~]#

3.2.2.2.2 重新生成 kernel inittramfs

先备份

[root@ussuritest004 boot]# uname -r
4.18.0-147.el8.x86_64
[root@ussuritest004 boot]# cp /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak.orig
[root@ussuritest004 boot]# ll
total 167724
-rw-------. 1 root root  3838259 Dec  5  2019 System.map-4.18.0-147.el8.x86_64
-rw-r--r--. 1 root root   184613 Dec  5  2019 config-4.18.0-147.el8.x86_64
drwxr-xr-x. 3 root root     4096 Jan 19 11:44 efi
drwx------. 4 root root     4096 Jan 19 15:02 grub2
-rw-------. 1 root root 71694380 Jan 19 11:49 initramfs-0-rescue-c7dcb861dc20453f8e275d6036842581.img
-rw-------. 1 root root 30310567 Jan 19 11:50 initramfs-4.18.0-147.el8.x86_64.img
-rw-------  1 root root 30310567 Jan 20 13:50 initramfs-4.18.0-147.el8.x86_64.img.bak.orig
-rw-------. 1 root root 19141009 Jan 19 11:57 initramfs-4.18.0-147.el8.x86_64kdump.img
drwxr-xr-x. 3 root root     4096 Jan 19 11:47 loader
drwx------. 2 root root    16384 Jan 19 11:28 lost+found
-rwxr-xr-x. 1 root root  8106744 Jan 19 11:48 vmlinuz-0-rescue-c7dcb861dc20453f8e275d6036842581
-rwxr-xr-x. 1 root root  8106744 Dec  5  2019 vmlinuz-4.18.0-147.el8.x86_64
[root@ussuritest004 boot]#

再重新生成

[root@ussuritest004 boot]# dracut  /boot/initramfs-$(uname -r).img --force
[root@ussuritest004 boot]# ll
total 166988
-rw-------. 1 root root  3838259 Dec  5  2019 System.map-4.18.0-147.el8.x86_64
-rw-r--r--. 1 root root   184613 Dec  5  2019 config-4.18.0-147.el8.x86_64
drwxr-xr-x. 3 root root     4096 Jan 19 11:44 efi
drwx------. 4 root root     4096 Jan 19 15:02 grub2
-rw-------. 1 root root 71694380 Jan 19 11:49 initramfs-0-rescue-c7dcb861dc20453f8e275d6036842581.img
-rw-------. 1 root root 29560525 Jan 20 13:53 initramfs-4.18.0-147.el8.x86_64.img
-rw-------  1 root root 30310567 Jan 20 13:50 initramfs-4.18.0-147.el8.x86_64.img.bak.orig
-rw-------. 1 root root 19141009 Jan 19 11:57 initramfs-4.18.0-147.el8.x86_64kdump.img
drwxr-xr-x. 3 root root     4096 Jan 19 11:47 loader
drwx------. 2 root root    16384 Jan 19 11:28 lost+found
-rwxr-xr-x. 1 root root  8106744 Jan 19 11:48 vmlinuz-0-rescue-c7dcb861dc20453f8e275d6036842581
-rwxr-xr-x. 1 root root  8106744 Dec  5  2019 vmlinuz-4.18.0-147.el8.x86_64
[root@ussuritest004 boot]#

3.2.2.3 运行级别修改为文本模式

[root@ussuritest004 boot]# systemctl set-default multi-user.target
Removed /etc/systemd/system/default.target.
Created symlink /etc/systemd/system/default.target → /usr/lib/systemd/system/multi-user.target.
[root@ussuritest004 boot]#

修改完重启机器

3.2.3 安装cuda_10.2.89_440.33.01_linux.run

3.2.3.1 step by step

[root@ussuritest004 software]# sh cuda_10.2.89_440.33.01_linux.run

该命令执行后需要等一段时间
输入accept

选择install

装失败了
报错日志

[root@ussuritest004 log]# cat nvidia-installer.log
nvidia-installer log file '/var/log/nvidia-installer.log'
creation time: Wed Jan 20 14:59:43 2021
installer version: 440.33.01

PATH: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin

nvidia-installer command line:
    ./nvidia-installer
    --ui=none
    --no-questions
    --accept-license
    --disable-nouveau
    --no-cc-version-check
    --install-libglvnd

Using built-in stream user interface
-> Detected 48 CPUs online; setting concurrency level to 32.
-> Installing NVIDIA driver version 440.33.01.
WARNING: One or more modprobe configuration files to disable Nouveau are already present at: /usr/lib/modprobe.d/nvidia-installer-disable-nouveau.conf, /etc/modprobe.d/nvidia-installer-disable-nouveau.conf.  Please be sure you have rebooted your system since these files were written.  If you have rebooted, then Nouveau may be enabled for other reasons, such as being included in the system initial ramdisk or in your X configuration file.  Please consult the NVIDIA driver README and your Linux distribution's documentation for details on how to correctly disable the Nouveau kernel driver.
-> For some distributions, Nouveau can be disabled by adding a file in the modprobe configuration directory.  Would you like nvidia-installer to attempt to create this modprobe file for you? (Answer: Yes)
-> One or more modprobe configuration files to disable Nouveau have been written.  For some distributions, this may be sufficient to disable Nouveau; other distributions may require modification of the initial ramdisk.  Please reboot your system and attempt NVIDIA driver installation again.  Note if you later wish to reenable Nouveau, you will need to delete these files: /usr/lib/modprobe.d/nvidia-installer-disable-nouveau.conf, /etc/modprobe.d/nvidia-installer-disable-nouveau.conf
ERROR: Unable to find the development tool `make` in your path; please make sure that you have the package 'make' installed.  If make is installed on your system, then please check that `make` is in your PATH.
ERROR: Installation has failed.  Please see the file '/var/log/nvidia-installer.log' for details.  You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.
[root@ussuritest004 log]#

posted on   weiwei2021  阅读(2066)  评论(0编辑  收藏  举报

编辑推荐:
· .NET Core 中如何实现缓存的预热?
· 从 HTTP 原因短语缺失研究 HTTP/2 和 HTTP/3 的设计差异
· AI与.NET技术实操系列:向量存储与相似性搜索在 .NET 中的实现
· 基于Microsoft.Extensions.AI核心库实现RAG应用
· Linux系列:如何用heaptrack跟踪.NET程序的非托管内存泄露
阅读排行:
· TypeScript + Deepseek 打造卜卦网站:技术与玄学的结合
· 阿里巴巴 QwQ-32B真的超越了 DeepSeek R-1吗?
· 【译】Visual Studio 中新的强大生产力特性
· 10年+ .NET Coder 心语 ── 封装的思维:从隐藏、稳定开始理解其本质意义
· 【设计模式】告别冗长if-else语句:使用策略模式优化代码结构
< 2025年3月 >
23 24 25 26 27 28 1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30 31 1 2 3 4 5

统计

点击右上角即可分享
微信分享提示