ubuntu 采用.run文件安装的cuda,升级版本,安装cudnn
cuda下载
https://developer.nvidia.com/cuda-downloads
cudnn下载
https://developer.nvidia.com/rdp/cudnn-archive
关闭docker服务
sudo systemctl stop docker.socket
sudo systemctl stop docker
sudo systemctl stop docker.service
卸载,采用nvidia-smi官方给的卸载方法:
Ubuntu 和 Debian
- 要删除 CUDA 工具包:
sudo apt-get --purge remove "*cuda*" "*cublas*" "*cufft*" "*cufile*" "*curand*" \
"*cusolver*" "*cusparse*" "*gds-tools*" "*npp*" "*nvjpeg*" "nsight*" "*nvvm*"
- 要删除 NVIDIA 驱动程序:
sudo apt-get remove --purge "*nvidia-driver*" "libxnvctrl*"
- 要清理卸载:
sudo apt-get autoremove --purge -V
安装
chmod a+x cuda_12.1.1_530.30.02_linux.run
./cuda_12.1.1_530.30.02_linux.run
中途写 accept 和
环境变量
vim .bashrc
在最后加
export CUDA_HOME=/usr/local/cuda-12.1
export LD_LIBRARY_PATH=/usr/local/cuda-12.1/lib64:$LD_LIBRARY_PATH
export PATH=/usr/local/cuda-12.1/bin:$PATH
然后执行
source ~/.bashrc
查看链接的对不对,不对自己重新做软连接(安装过程选覆盖后,应该是没问题)
ll /usr/local/cuda
lrwxrwxrwx 1 root root 21 Sep 12 09:41 /usr/local/cuda -> /usr/local/cuda-12.1//
安装cudnn
执行完看有没有提示补充执行的
dpkg -i cudnn-local-repo-ubuntu2204-8.9.7.29_1.0-1_amd64.deb
查看是否安装成功
dpkg -l | grep cudnn
使用pytorch环境验证cudnn是否生效
python
import torch
print(torch.backends.cudnn.enabled)
print(torch.cuda.is_available())
print(torch.backends.cudnn.version())
使用TensorFlow环境验证cudnn是否生效
python
import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
from tensorflow.python.platform import build_info as tf_build_info
print(tf_build_info.build_info)
启动docker服务
sudo systemctl start docker.socket
sudo systemctl start docker
sudo systemctl start docker.service