ubuntu18.04开机后NVIDIA显卡驱动加载失败

1.开机按esc进入ubuntu高级选项,选择内核版本,之后回车

注意:记住此版本号

2.接下来按照如下操作

# If you change this file, run 'update-grub' afterwards to update
# /boot/grub/grub.cfg.
# For full documentation of the options in this file, see:
#   info -f grub -n 'Simple configuration'

# GRUB_DEFAULT=0
GRUB_DEFAULT="1> 2"  # 修改后的配置
GRUB_TIMEOUT_STYLE=hidden
GRUB_TIMEOUT=0
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"
GRUB_CMDLINE_LINUX=""

# Uncomment to enable BadRAM filtering, modify to suit your needs
# This works with Linux (no patch required) and with any kernel that obtains
# the memory map information from GRUB (GNU Mach, kernel of FreeBSD ...)
#GRUB_BADRAM="0x01234567,0xfefefefe,0x89abcdef,0xefefefef"

# Uncomment to disable graphical terminal (grub-pc only)
#GRUB_TERMINAL=console

# The resolution used on graphical terminal
# note that you can use only modes which your graphic card supports via VBE
# you can see them in real GRUB with the command `vbeinfo'
#GRUB_GFXMODE=640x480

# Uncomment if you don't want GRUB to pass "root=UUID=xxx" parameter to Linux
#GRUB_DISABLE_LINUX_UUID=true

# Uncomment to disable generation of recovery mode menu entries
#GRUB_DISABLE_RECOVERY="true"

# Uncomment to get a beep at grub start
#GRUB_INIT_TUNE="480 440 1"


3.重启之后采用uname -r 查看当前的内核版本

参考链接:https://www.toutiao.com/i7023555532728353294/

执行darknet下面的文件后出现新的问题

(yolov4) waq@waq-MS-7885:~/Downloads/ai/Vitis-AI-1.3.2/yolo_dploy/darknet-master$ ./darknet detector train  cfg/voc.data cfg/yolov4.cfg  yolov4.weights -map
CUDA status Error: file: ./src/dark_cuda.c : () : line: 38 : build time: Nov 22 2021 - 20:42:38 

 CUDA Error: unknown error
Darknet error location: ./src/dark_cuda.c, check_error, line #69
CUDA Error: unknown error: Bad file descriptor
(yolov4) waq@waq-MS-7885:~/Downloads/ai/Vitis-AI-1.3.2/yolo_dploy/darknet-master$ 

搜集资料发现是cuda的问题,哎,再重新装一次cuda!!!
1.官网下载安装文件,我这里下载的是run文件,安装之前卸载掉之前安装的旧版本cuda10.1(具体版本查看nvcc --version)
一般默认安装路径是/usr/local/ 下面,卸载旧版本 进入到 /usr/local/cuda-10.1/bin下面,执行 sudo ./cuda-uninstaller文件,最后成功卸载,可以紧接着删除文件夹即可
2.安装新下载的run文件

注意选择安装的时候不要勾选驱动,去掉x即可,其他的勾选,安装完之后会有一个summary:

  ===========
= Summary =
===========

Driver:   Not Selected
Toolkit:  Installed in /usr/local/cuda-10.1/
Samples:  Installed in /home/waq/, but missing recommended libraries

Please make sure that
 -   PATH includes /usr/local/cuda-10.1/bin
 -   LD_LIBRARY_PATH includes /usr/local/cuda-10.1/lib64, or, add /usr/local/cuda-10.1/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-10.1/bin

Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-10.1/doc/pdf for detailed information on setting up CUDA.
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 418.00 is required for CUDA 10.1 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
    sudo <CudaInstaller>.run --silent --driver

Logfile is /var/log/cuda-installer.log

3.安装完成之后添加环境变量,在home目录下,ctrl+H打开隐藏的文件,找到.bashrc,打开添加路径(vi ~/vim .bashrc)
4.完成之后测试官方的例子,一直失败

  
  ./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 999
-> unknown error
Result = FAIL

5.只好重新装一遍驱动了。。。。。。。。。。
https://www.jianshu.com/p/8594771c7d5e
Loading new nvidia-495.44 DKMS files...
Building for 4.15.0-162-generic 4.15.0-163-generic
Building for architecture x86_64
Building initial module for 4.15.0-162-generic
Error! Bad return status for module build on kernel: 4.15.0-162-generic (x86_64)
Consult /var/lib/dkms/nvidia/495.44/build/make.log for more information.
Setting up nvidia-compute-utils-495 (495.44-0ubuntu0.18.04.1) ...
Warning: The home dir /nonexistent you specified can't be accessed: No such file or directory
Adding system user nvidia-persistenced' (UID 121) ... Adding new group nvidia-persistenced' (GID 127) ...
Adding new user nvidia-persistenced' (UID 121) with group nvidia-persistenced' ...
pam_tally2: /var/log/tallylog is either world writable or not a normal file
pam_tally2: Authentication error
useradd: failed to reset the tallylog entry of user "nvidia-persistenced"
Not creating home directory `/nonexistent'.
(哎,可能是上次跑一个程序的时候由于修改了gcc版本,导致安装失败),接下里重新修改gcc版本。。。。
参考:
https://blog.csdn.net/JerryZhang__/article/details/108865176
https://forum.xanmod.org/thread-3635.html

sudo apt-get update
sudo apt-get install gcc-8
sudo apt-get install g++-8
cd /usr/bin
sudo rm gcc g++
sudo ln -s gcc-8 gcc
sudo ln -s g++-8 g++
https://blog.csdn.net/weixin_44128857/article/details/108554751

3.修改完gcc版本之后,安装cuda,然后再添加环境变量,最后测试

注意cuda版本需要对应,我截图中版本不同,所以需要将环境变量里面的版本修改为具体安装的版本


4.安装cudnn
安装cudnn
进入https://developer.nvidia.com/cudnn 下载对应的文件夹,解压,
注意:一定要和cuda的版本对应!!!

下载完成后解压并进入文件夹:

sudo cp cuda/include/cudnn.h /usr/local/cuda/include/ 
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64/ 
sudo chmod a+r /usr/local/cuda/include/cudnn.h 
sudo chmod a+r /usr/local/cuda/lib64/libcudnn*

在终端查看CUDNN版本:

cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
输出结果:

  #define CUDNN_MAJOR 7
#define CUDNN_MINOR 5
#define CUDNN_PATCHLEVEL 0
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)

#include "driver_types.h"

posted @ 2021-12-10 14:52  冰峰漫步  阅读(2407)  评论(0编辑  收藏  举报