PaddlePaddle使用paddle.utils.run_check()检测出现PaddlePaddle meets some problem with 8 GPUs
WARNING:root:PaddlePaddle meets some problem with 8 GPUs. This may be caused by:
1. There is not enough GPUs visible on your system
2. Some GPUs are occupied by other process now
3. NVIDIA-NCCL2 is not installed correctly on your system. Please follow instruction on https://github.com/NVIDIA/nccl-tests
to test your NCCL, or reinstall it following https://docs.nvidia.com/deeplearning/sdk/nccl-install-guide/index.html
WARNING:root:
Original Error is: (External) NCCL error(2), unhandled system error.
[Hint: 'ncclSystemError'. A call to the system failed.] (at /paddle/paddle/fluid/platform/device/gpu/nccl_helper.h:155)
解决办法:
创建容器时加上--shm-size 8g参数
docker run --name paddle_docker_v2 --gpus all --shm-size 8g -it -v $PWD:/paddle paddlepaddle/paddle:2.3.1-gpu-cuda11.2-cudnn8 /bin/bash