报错:OSError: libnccl.so.2: cannot open shared object file: No such file or directory
因为没有安装nccl,本机为centos7
https://developer.nvidia.com/nccl/nccl-legacy-downloads
需要登录一下
得到nccl-repo-rhel7-2.7.8-ga-cuda10.1-1-1.x86_64.rpm
安装教程 :https://docs.nvidia.com/deeplearning/nccl/install-guide/index.html#down
1)运行:
rpm -ivh nccl-repo-rhel7-2.7.8-ga-cuda10.1-1-1.x86_64.rpm
2)然后运行:
yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo
返回:
Loaded plugins: product-id adding repo from: https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo grabbing file https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo to /etc/yum.repos.d/cuda-rhel7.repo cuda-rhel7.repo | 232 B 00:00:00 repo saved to /etc/yum.repos.d/cuda-rhel7.repo
3)更新:
sudo yum update
4)安装:
yum install libnccl-2.7.8-1+cuda10.1 libnccl-devel-2.7.8-1+cuda10.1 libnccl-static-2.7.8-1+cuda10.1
该错就没了