FAQ
- 1、分布式训练报错This error indicates that your module has parameters that were not used in producing loss
A:1.使用下面代码打印所有需要计算梯度的参数,确认没有问题 2.如果有问题,修改问题,如果没有问题,passing the keyword argumentfind_unused_parameters=True
totorch.nn.parallel.DistributedDataParallel
enabled = {}
for name, param in model_without_ddp.named_parameters():
if param.requires_grad:
enabled[name] = param.shape
print(f"Parameters to be updated: {enabled.keys()}")