问题记录贴
torch转onnx
在转一个并不复杂的模型的时候出现错误.模型并不存在什么复杂的算子.
RuntimeError: tuple appears in op that does not forward tuples (VisitNode at /pytorch/torch/csrc/jit/passes/lower_tuples.cpp:109)
frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7f27f31b3fe1 in /home/train/.local/lib/python3.5/site-packages/torch/lib/libc10.so)
frame #1: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x2a (0x7f27f31b3dfa in /home/train/.local/lib/python3.5/site-packages/torch/lib/libc10.so)
frame #2: <unknown function> + 0x6da2e1 (0x7f27def7f2e1 in /home/train/.local/lib/python3.5/site-packages/torch/lib/libtorch.so.1)
frame #3: <unknown function> + 0x6da534 (0x7f27def7f534 in /home/train/.local/lib/python3.5/site-packages/torch/lib/libtorch.so.1)
搜索后发现和这个错误类似.
把DataParallel
移除即可.
model = mobilenetv2()
model = torch.nn.DataParallel(model).cuda()
改为
model = mobilenetv2().cuda()
matplotlib绘图问题
https://stackoverflow.com/questions/33676608/pandas-type-error-trying-to-plot
plt.scatter(df.Time, y=df.Value, marker='o')绘图时,如果df.Time是pandas中的datetime类型时会出错.
fig = plt.figure(figsize=(x_size,y_size))
ax = fig.add_subplot(111)
ax.scatter(df.Time, y=df.Value, marker='o')
改为
fig = plt.figure(figsize=(x_size,y_size))
ax = fig.add_subplot(111)
ax.plot_date(x=df.Time, y=df.Value, marker='o')
或者
fig = plt.figure(figsize=(x_size,y_size))
ax = fig.add_subplot(111)
ax.scatter(list(df.Time.values), list(df.Value.values), marker='o')
At least one stride in the given numpy array is negative, and tensors with negative strides are not currently supported
ValueError: At least one stride in the given numpy array is negative, and tensors with negative strides are not currently supported. (You can probably work around this by making a copy of your array with array.copy().)
ndarray转tensor时要求内存连续.
https://www.cnblogs.com/devilmaycry812839668/p/13761613.html
Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
报错在optimizer.step()这一步.
原因:模型保存的参数是gpu显存上的. 在加载权重文件之前,需要先将数据从cpu迁移到gpu.
即:
checkpoint = torch.load('checkpoints/latest2.pt')
yolov3net.load_state_dict(checkpoint['model'])
optimizer = torch.optim.Adam(yolov3net.parameters())
optimizer.load_state_dict(checkpoint['optimizer'])
改为
yolov3net = yolov3net.cuda()
checkpoint = torch.load('checkpoints/latest2.pt')
yolov3net.load_state_dict(checkpoint['model'])
optimizer = torch.optim.Adam(yolov3net.parameters())
optimizer.load_state_dict(checkpoint['optimizer'])
model.train() model.eval()
在用很小的数据集训练的时候,在训练时,模型表现的很好.保存模型. 但是在用model.eval()模式做推理时,表现差异很大.
奇怪的是,在我把自定义layer的forward方法里对training和非traing时的逻辑改为完全一致时,在model.eval()时模型推理结果还是和训练阶段有很大差异.
是因为bn层在训练和推理时,其forward行为是不同的. BatcNormalization.推理模式下,用的均值和方差是全样本方差.
当数据集特别小时,很容易过拟合.导致train模式下和eval模式下的差异很大.
作者:sdu20112013
如果您觉得阅读本文对您有帮助,请点一下“推荐”按钮,您的“推荐”将是我最大的写作动力!欢迎转载,转载请注明出处.