pytorch编程踩过的坑

1、raise notImplementedError

不是什么大错，先检查一下代码的缩进是否正确。

2、RuntimeError: Tensor: invalid storage offset at /pytorch/aten/src/THC/generic/THCTensor.c:759

哎，这个错就很气人了，在pytorch 中执行loss.backward()出现的问题，正如https://discuss.pytorch.org/t/error-when-using-spectral-norm-on-multiple-gpus/25072 所说的，是reshape函数导致的问题，把它改成view函数就可以了。

3、训练和验证的配置完全一样，而训练的时候正常，测试的时候报如下错误：

RuntimeError: cuda runtime error (2) : out of memory at /data/users/soumith/miniconda2/conda-bld/pytorch-0.1.9_1487346124464/work/torch/lib/THC/generic/THCStorage.cu:66

这是因为验证的时候，没有设置变量属性为 volatile ，在pytorch 0.4.0之后，pytorch取消了volatile这个变量，改为如下写法：

x = Variable(torch.Tensor([34, 54]), requires_grad=True)
with torch.no_grad():
        y = x * 2

4、 Input type (CUDAFloatTensor) and weight type (CPUFloatTensor) should be the same

最初遇到这个问题的时候，网上大家的回答都是参数是在GPU ，而模型没有放在GPU 上面，添加model=model.cuda()语句就能解决。我这个错误出现的比较奇怪，是运行到网络中间才出现这个错误。原因是我的网络结构中存在下面的定义：

layer=[]
for dilation in dilations:
    layer.append(ResidualBlock(channels,dilation))
self.transformlayers=layer

其中 self.transformlayers=layer这句话出了问题，这个定义的网络，出来的结果如下

虽然这样定义在cpu上计算没有问题，但是如果要在GPU上面运算的话，在model=model.cuda()的作用下，网络其他部分都被部署到gpu上面，而 transformlayers 里面的结构却还在cpu上面。正确的定义是：

 self.transformlayers=nn.Sequential(*layer)

这样输出网络可以看到

transformlayers里面是有结构信息的，model=model.cuda()，就能将整个网络部署到GPU上面去了。

posted @ 2018-11-19 16:41 鸵鸟要抬头 Views(7687) Comments(0) Edit 收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

鸵鸟要抬头

pytorch编程踩过的坑

公告