【软硬件环境及工具安装】nvidia驱动/CUDA版本关系及CUDA安装

前言

 

一、nvidia与cuda版本之间的关系

CUDA12.0.x开始支持的最低驱动版本是525.60.13;

CUDA11.0.1开始支持的最低驱动版本是450.80.02;

二、CUDA安装

安装edgeai-torchvision环境的过程中,一直出错,后来深入理解源码,发现主要原因是源码编译安装torchvision时,是从CUDA_HOME/NVCC中获取CUDA版本,故虚拟环境的CUDA版本需要和系统的CUDA版本一致,而目前系统版本是CUDA11.7,虚拟环境安装pytorch使用的是cuda11.3,故需要重新安装系统版本CUDA11.3;

edgeai-torchvision/torchvision/extension.py

def _check_cuda_version():
    """
    Make sure that CUDA versions match between the pytorch install and torchvision install
    """
    if not _HAS_OPS:
        return -1
    import torch
    _version = torch.ops.torchvision._cuda_version()
    if _version != -1 and torch.version.cuda is not None:
        tv_version = str(_version)
        if int(tv_version) < 10000:
            tv_major = int(tv_version[0])
            tv_minor = int(tv_version[2])
        else:
            tv_major = int(tv_version[0:2])
            tv_minor = int(tv_version[3])
        t_version = torch.version.cuda
        t_version = t_version.split('.')
        t_major = int(t_version[0])
        t_minor = int(t_version[1])
        if t_major != tv_major or t_minor != tv_minor:
            raise RuntimeError("Detected that PyTorch and torchvision were compiled with different CUDA versions. "
                               "PyTorch has CUDA Version={}.{} and torchvision has CUDA Version={}.{}. "
                               "Please reinstall the torchvision that matches your PyTorch install."
                               .format(t_major, t_minor, tv_major, tv_minor))
    return _version
 
/home/xxx/miniconda3/envs/edgeaitv/lib/python3.8/site-packages/torch/utils/cpp_extension.py
def _check_cuda_version(self):
        if CUDA_HOME:
            nvcc = os.path.join(CUDA_HOME, 'bin', 'nvcc')
            cuda_version_str = subprocess.check_output([nvcc, '--version']).strip().decode(*SUBPROCESS_DECODE_ARGS)
            cuda_version = re.search(r'release (\d+[.]\d+)', cuda_version_str)
            if cuda_version is not None:
                cuda_str_version = cuda_version.group(1)
                cuda_ver = packaging.version.parse(cuda_str_version)
                torch_cuda_version = packaging.version.parse(torch.version.cuda)
                if cuda_ver != torch_cuda_version:
                    # major/minor attributes are only available in setuptools>=49.6.0
                    if getattr(cuda_ver, "major", float("nan")) != getattr(torch_cuda_version, "major", float("nan")):
                        raise RuntimeError(CUDA_MISMATCH_MESSAGE.format(cuda_str_version, torch.version.cuda))
                    warnings.warn(CUDA_MISMATCH_WARN.format(cuda_str_version, torch.version.cuda))

        else:
            raise RuntimeError(CUDA_NOT_FOUND_MESSAGE)

 

wget https://developer.download.nvidia.com/compute/cuda/11.3.1/local_installers/cuda_11.3.1_465.19.01_linux.run
sudo chmod 755 cuda_11.3.1_465.19.01_linux.run
sudo sh cuda_11.3.1_465.19.01_linux.run
 
1) 执行上述命令后,等待1分钟左右,系统会弹出安装的协议,问你 Do you accept the above EULA? 你需要在后面的光标处,填写 accept,然后敲回车。
2) 然后系统询问安装的内容,注意!!! 一定要把Driver驱动这个给去掉(按空格键可以将 X 去掉),如果[ ]内是X 说明是要安装的;如果[ ] 是空,说明不安装。选择完成后,然后移动至Install 处,敲击回车。如果已经安装driver驱动,一定要选择不安装驱动;
3) 安装完成后,会在 /usr/local 目录下产生cuda版本的目录,比如cuda-11.3 目录;
Driver:   Not Selected
Toolkit:  Installed in /usr/local/cuda-11.3/
Samples:  Installed in /home/uisee/, but missing recommended libraries

Please make sure that
 -   PATH includes /usr/local/cuda-11.3/bin
 -   LD_LIBRARY_PATH includes /usr/local/cuda-11.3/lib64, or, add /usr/local/cuda-11.3/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-11.3/bin
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 465.00 is required for CUDA 11.3 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
    sudo <CudaInstaller>.run --silent --driver

Logfile is /var/log/cuda-installer.log
 
4)配置环境变量;
使用 vim ~/.bashrc 命令进行编辑,在文件末尾添加下列代码;然后执行 source ~/.bashrc 刷新文件使其生效。
# cuda
export LD_LIBRARY_PATH=/usr/local/cuda/lib64
export PATH=$PATH:/usr/local/cuda/bin
5)CUDA多版本管理;
只需要修改软链接即可,将所需要的CUDA版本(CUDA 11.3)的软链接,链接到cuda目录下;
cd /usr/local
sudo rm -rf cuda  # 删除原有的软链接
sudo ln -s /usr/local/cuda-11.3 /usr/local/cuda  # 将cuda-11.3链接到cuda下
 软链接重新生成后,使用 nvcc -V 命令可以查看当前系统的CUDA版本;
 
 

参考

1. nvidia显卡和CUDA版本关系

2. CUDA Toolkit Archive;

3. Ubuntu20.04LTS安装CUDA并支持多版本切换

posted on 2022-03-24 22:26  鹅要长大  阅读(318)  评论(0编辑  收藏  举报

导航