ubuntu 采用.run文件安装的cuda,升级版本,安装cudnn

cuda下载

https://developer.nvidia.com/cuda-downloads

cudnn下载

https://developer.nvidia.com/rdp/cudnn-archive

关闭docker服务

sudo systemctl stop docker.socket
sudo systemctl stop docker
sudo systemctl stop docker.service

卸载,采用nvidia-smi官方给的卸载方法:

Ubuntu 和 Debian

  • 要删除 CUDA 工具包:
sudo apt-get --purge remove "*cuda*" "*cublas*" "*cufft*" "*cufile*" "*curand*" \
 "*cusolver*" "*cusparse*" "*gds-tools*" "*npp*" "*nvjpeg*" "nsight*" "*nvvm*"
  • 要删除 NVIDIA 驱动程序:
sudo apt-get remove --purge "*nvidia-driver*" "libxnvctrl*"
  • 要清理卸载:
sudo apt-get autoremove --purge -V

安装

chmod a+x cuda_12.1.1_530.30.02_linux.run
./cuda_12.1.1_530.30.02_linux.run

中途写 accept 和

环境变量

vim .bashrc 
在最后加
export CUDA_HOME=/usr/local/cuda-12.1
export LD_LIBRARY_PATH=/usr/local/cuda-12.1/lib64:$LD_LIBRARY_PATH
export PATH=/usr/local/cuda-12.1/bin:$PATH

然后执行
source ~/.bashrc

查看链接的对不对,不对自己重新做软连接(安装过程选覆盖后,应该是没问题)

ll /usr/local/cuda
lrwxrwxrwx 1 root root 21 Sep 12 09:41 /usr/local/cuda -> /usr/local/cuda-12.1//

安装cudnn

执行完看有没有提示补充执行的

dpkg -i cudnn-local-repo-ubuntu2204-8.9.7.29_1.0-1_amd64.deb

查看是否安装成功

dpkg -l | grep cudnn

使用pytorch环境验证cudnn是否生效

python

import torch
print(torch.backends.cudnn.enabled)
print(torch.cuda.is_available())

print(torch.backends.cudnn.version())

使用TensorFlow环境验证cudnn是否生效

python

import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

from tensorflow.python.platform import build_info as tf_build_info
print(tf_build_info.build_info)

启动docker服务

sudo systemctl start docker.socket
sudo systemctl start docker
sudo systemctl start docker.service
posted @ 2024-09-13 14:13  河在谈  阅读(16)  评论(0编辑  收藏  举报