FAQ

  • 1、分布式训练报错This error indicates that your module has parameters that were not used in producing loss
    A:1.使用下面代码打印所有需要计算梯度的参数,确认没有问题 2.如果有问题,修改问题,如果没有问题,passing the keyword argument find_unused_parameters=True to torch.nn.parallel.DistributedDataParallel
    enabled = {}
    for name, param in model_without_ddp.named_parameters():
        if param.requires_grad:
            enabled[name] = param.shape
    print(f"Parameters to be updated: {enabled.keys()}")
posted @ 2022-09-19 11:33  哈哈哈喽喽喽  阅读(16)  评论(0编辑  收藏  举报