torch进行多GPU卡训练时,报错RuntimeError: Address already in use

torch进行GPU卡训练时,报错RuntimeError: Address already in use
参考:https://www.it610.com/article/1279180977062559744.htm
问题在于,TCP的端口被占用,一种解决方法是,运行程序的同时指定端口,端口号随意给出:
--master_port 29501
例如:
nohup python3 -m torch.distributed.launch --nproc_per_node 4 --master_port 29501 main.py >> train.log &

posted @ 2022-10-14 12:03  morein2008  阅读(1478)  评论(0编辑  收藏  举报