返回总目录页

nvidia-smi版本驱动不匹配,以及 cuda不可用两个问题处理

重装了nvidai cuda ,启动了nvidia-fabricmanager

CUDA initialization: Unexpected error from cudaGetDeviceCount()解决方法

 

$ python mcw.py
/home/mcw/mambaforge/envs/ailme/lib/python3.11/site-packages/torch/cuda/__init__.py:118: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 802: system not yet initialized (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
  return torch._C._cuda_getDeviceCount() > 0
torch.cuda.is_available(): False

 

复制代码
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
dpkg -i cuda-keyring_1.1-1_all.deb

apt-get install libnvidia-common-525=525.125.06-0ubuntu1
apt-get install nvidia-kernel-common-525=525.125.06-0ubuntu1


apt-get install --no-install-recommends cuda-drivers-525=525.125.06-1 nvidia-driver-525=525.125.06-0ubuntu1 nvidia-dkms-525=525.125.06-0ubuntu1 nvidia-kernel-source-525=525.125.06-0ubuntu1 libnvidia-gl-525=525.125.06-0ubuntu1 libnvidia-compute-525=525.125.06-0ubuntu1 libnvidia-decode-525=525.125.06-0ubuntu1 libnvidia-extra-525=525.125.06-0ubuntu1 nvidia-compute-utils-525=525.125.06-0ubuntu1 libnvidia-encode-525=525.125.06-0ubuntu1 nvidia-utils-525=525.125.06-0ubuntu1 xserver-xorg-video-nvidia-525=525.125.06-0ubuntu1 libnvidia-cfg1-525=525.125.06-0ubuntu1 libnvidia-fbc1-525=525.125.06-0ubuntu1
nvidia-smi
wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run
sh cuda_11.8.0_520.61.05_linux.run
ls /usr/local/cuda
apt-get install nvidia-fabricmanager-525=525.125.06-1
systemctl enable nvidia-fabricmanager
systemctl start nvidia-fabricmanager
复制代码

 

$ python
Python 3.11.6 | packaged by conda-forge | (main, Oct  3 2023, 10:40:35) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
True
>>> 

 

注意:在使用A100-80G服务器时,不用轻易使用apt-get更新或者开启Ubuntu系统更新。

 nvidia-fabricmanager 这个包某些原因更新了,如在系统自动更新或者apt-get update、apt-get upgrade等过程中被更新了。而这个包必须和驱动版本一致才能正常使用

 

 

参考链接:

https://blog.csdn.net/k_wenry/article/details/138350564
https://bbs.huaweicloud.com/blogs/401682
卸载已经有的包:https://blog.csdn.net/qq_41076797/article/details/124909408

CUDA initialization: Unexpected error from cudaGetDeviceCount()解决方法 https://www.cnblogs.com/huadongw/p/16504137.html
https://developer.nvidia.com/cuda-11-8-0-download-archive?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=22.04&target_type=runfile_local
https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/
https://mirrors.cloud.tencent.com/nvidia-cuda/ubuntu2204/x86_64/

 

降低系统内核:https://blog.csdn.net/qq_62368277/article/details/134273919

 

posted @   马昌伟  阅读(1384)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· 无需6万激活码!GitHub神秘组织3小时极速复刻Manus,手把手教你使用OpenManus搭建本
· C#/.NET/.NET Core优秀项目和框架2025年2月简报
· 什么是nginx的强缓存和协商缓存
· 一文读懂知识蒸馏
· Manus爆火,是硬核还是营销?
历史上的今天:
2019-05-30 css详解3
博主链接地址:https://www.cnblogs.com/machangwei-8/
点击右上角即可分享
微信分享提示