pytorch反向传播错误解决：RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.

pytorch反向传播错误解决：

错误： RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.
归因排查：

出现这种错误有可能是反向传播过程中出现了二次传播，千万不要加retain_graph，这个不是原因
经过查看代码，推测是有一部分可学习Tensor在反向传播过程中被遍历了两边导致的，尝试通过分离的手段将一个Tensor分离为两个。
分离后仍然没有解决，尝试在可能出现二次传播，并且不需要传播的地方加入detach()，但是依然报错。
尝试在训练过程中加入torch.autograd.detect_anomaly(True) 检查梯度传播过程，发现其中有一个sigmod函数报错，其无法更新，报错信息为：Warning: Error detected in SigmoidBackward. No forward pass information available. Enable detect anomaly during forward pass for more information. (print_stack at /pytorch/torch/csrc/autograd/python_anomaly_mode.cpp:42
由于模型中只有一个sigmod函数，因此确定报错信息位置，经过和原始模型对比，发现有可能是因为原始模型使用的是@property方法，这可以动态调整，而我的方法是直接赋值。经过调整，解决问题
其代码如下：（部分）

        self.x_lmbda_ = nn.Parameter(torch.ones([]) * 0, requires_grad=True)
        # self.x_lmbda = torch.sigmoid(10 ** self.x_lmbda_ - 6).to(device) #错误代码
	@property
    def x_lmbda(self):  # The one that controls GCN-->GATConv
        return torch.sigmoid(10 ** self.x_lmbda_ - 6) #正确代码

耗费时间： 5个小时

参考文献

https://stackoverflow.com/questions/48274929/pytorch-runtimeerror-trying-to-backward-through-the-graph-a-second-time-but
https://blog.csdn.net/ybacm/article/details/131528711
https://blog.csdn.net/q7w8e9r4/article/details/134620547
https://zhuanlan.zhihu.com/p/675682867
https://chat.openai.com/

posted @ 2024-01-05 17:33 David_Dong 阅读(1516) 评论(0) 收藏举报

刷新页面返回顶部

David-Dong

pytorch反向传播错误解决：RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.

pytorch反向传播错误解决：

参考文献