torchrun 分布式训练报错

NotImplementedError: Using RTX 3090 or 4000 series doesn't support faster communication broadband via P2P or IB. Please set NCCL_P2P_DISABLE="1" and NCCL_IB_DISABLE="1" or use accelerate launch` which will do this automatically.

解决:一行一行的执行如下代码:

export NCCL_P2P_DISABLE=1
export NCCL_IB_DISABLE=1

https://blog.csdn.net/ph12345687/article/details/141870275

posted @ 2024-09-20 15:11  Chenyi_li  阅读(59)  评论(0编辑  收藏  举报