在docker-ubuntu18.04 上安装 cuda 和 cudnn

在docker-ubuntu18.04 上安装 cuda 和 cudnn

其实是安装 nvidia-docker2 然后 pull 已经 安装好 cuda 和 cudnn 的 ubuntu18.04的镜像

环境: Ubuntu18.04 NVIDIA driver build-essential

  • 在主机上安装 NVIDIA driver 官方文档

    • 可执行文件安装

      驱动版本地址

      BASE_URL=https://us.download.nvidia.com/tesla
      DRIVER_VERSION=450.80.02(需要去选择合适的驱动版本)
      curl -fSsl -O $BASE_URL/$DRIVER_VERSION/NVIDIA-Linux-x86_64-$DRIVER_VERSION.run
      

      等待下载完成

      sudo sh NVIDIA-Linux-x86_64-$DRIVER_VERSION.run

    • apt 安装

      安装linux 内核 的头文件

      sudo apt-get install linux-headers-$(uname -r)

      确保CUDA网络存储库(CUDA network repository)上的包优先于规范存储库(Canonical repository)

      distribution=$(. /etc/os-release;echo $ID$VERSION_ID | sed -e 's/\.//g')
      sudo wget https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/cuda-$distribution.pin
      sudo mv cuda-$distribution.pin /etc/apt/preferences.d/cuda-repository-pin-600
      

      安装 GPG公钥(public GPG key)

      sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/7fa2af80.pub
      

      安装 cuda源仓库(CUDA network repository)

      echo "deb http://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64 /" | sudo tee /etc/apt/sources.list.d/cuda.list
      
      sudo apt-get update
      sudo apt-get -y install cuda-drivers
      
  • docker:中安装cuda 和 cudnn

    • Setting up NVIDIA Container Toolkit

      安装nvidia-docker2 替换 docker,docker不用卸载。
      过程中出现的基本都是网络问题,把curl里面的 -s 去掉 可以看到原因。大多数情况下是DNS解析不了主机地址,访问nvidia被拒绝。能够用代理话可能就不会出现这些,没有代理的话尝试修改 DNS ,浏览器可以访问 https://nvidia.github.io/nvidia-docker 基本就不会有问题了。

      distribution=$(. /etc/os-release;echo $ID$VERSION_ID) 
      curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - 
      curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
      

      To get access to experimental features such as CUDA on WSL or the new MIG capability on A100, you may want to add the experimental branch to the repository listing:

      curl -s -L https://nvidia.github.io/nvidia-container-runtime/experimental/$distribution/nvidia-container-runtime.list | sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list
      

      Install the nvidia-docker2 package (and dependencies) after updating the package listing:

      sudo apt-get update
      sudo apt-get install -y nvidia-docker2
      

      按y切换配置,将旧的docker切换为可以调用GPU的nvidia-docker2

      Restart the Docker daemon to complete the installation after setting the default runtime:

      sudo systemctl restart docker

      dockerhub
      pull所需要cuda 和 cudnn 版本

      例如:我需要的环境是 cuda10.0 cudnn7 ubunutu18.04

      docker pull nvidia/cuda:10.0-cudnn7-devel-ubuntu18.04

      进入镜像后 nvidia-smi 显示和下面一样就可以了

      不知道自己版本测试用的话直接执行下面的命令就好了

      At this point, a working setup can be tested by running a base CUDA container:

      sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi

      This should result in a console output shown below:

      +-----------------------------------------------------------------------------+
      | NVIDIA-SMI 450.51.06    Driver Version: 450.51.06    CUDA Version: 11.0     |
      |-------------------------------+----------------------+----------------------+
      | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
      | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
      |                               |                      |               MIG M. |
      |===============================+======================+======================|
      |   0  Tesla T4            On   | 00000000:00:1E.0 Off |                    0 |
      | N/A   34C    P8     9W /  70W |      0MiB / 15109MiB |      0%      Default |
      |                               |                      |                  N/A |
      +-------------------------------+----------------------+----------------------+
      
      +-----------------------------------------------------------------------------+
      | Processes:                                                                  |
      |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
      |        ID   ID                                                   Usage      |
      |=============================================================================|
      |  No running processes found                                                 |
      +-----------------------------------------------------------------------------+
      
      
posted @ 2021-02-20 14:35  blackTree  阅读(2174)  评论(0编辑  收藏  举报