PaddlePaddle使用paddle.utils.run_check()检测出现PaddlePaddle meets some problem with 8 GPUs
WARNING:root:PaddlePaddle meets some problem with 8 GPUs. This may be caused by:
1. There is not enough GPUs visible on your system
2. Some GPUs are occupied by other process now
3. NVIDIA-NCCL2 is not installed correctly on your system. Please follow instruction on https://github.com/NVIDIA/nccl-tests
to test your NCCL, or reinstall it following https://docs.nvidia.com/deeplearning/sdk/nccl-install-guide/index.html
WARNING:root:
Original Error is: (External) NCCL error(2), unhandled system error.
[Hint: 'ncclSystemError'. A call to the system failed.] (at /paddle/paddle/fluid/platform/device/gpu/nccl_helper.h:155)
解决办法:
创建容器时加上--shm-size 8g参数
docker run --name paddle_docker_v2 --gpus all --shm-size 8g -it -v $PWD:/paddle paddlepaddle/paddle:2.3.1-gpu-cuda11.2-cudnn8 /bin/bash
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 分享一个免费、快速、无限量使用的满血 DeepSeek R1 模型,支持深度思考和联网搜索!
· 基于 Docker 搭建 FRP 内网穿透开源项目(很简单哒)
· ollama系列1:轻松3步本地部署deepseek,普通电脑可用
· 按钮权限的设计及实现
· 【杂谈】分布式事务——高大上的无用知识?