YOLO系列代码调试笔记
环境:Windows10、Python 3.8.5、torch 1.13.0+cu116、torchvision 0.14.0+cu116
工程:https://github.com/abeardear/pytorch-YOLO-v1
bug1:
# resnet = models.resnet50(pretrained=True) resnet = models.resnet50(weights=ResNet50_Weights.DEFAULT)
因为版本原因,加载预训练模型时采用参数“pretrained=True”会报错,可以改为“weights=ResNet50_Weights.DEFAULT”或者类似的其他参数。
bug2:进程池错误
追溯报错信息,看到问题出在 for i, (images, target) in enumerate(train_loader): 这一行,其实只需要把整个迭代循环过程放在主函数下即可:
RuntimeError: An attempt has been made to start a new process before the current process has finished its . This probably means that you are not using fork to start your child processes and you have forgotten to use the proper idiom in the main module: if __name__ == '__main__': freeze_support() ... The "freeze_support()" line can be omitted if the program is not going to be frozen to produce an executable.
修改:
""" num_iter = 0 # vis = Visualizer() best_test_loss = np.inf for epoch in range(num_epochs): """ # 修改后 if __name__ == '__main__': num_iter = 0 # vis = Visualizer() best_test_loss = np.inf for epoch in range(num_epochs):
bug3: opencv报错:包含信息“ (-215:Assertion failed) dims <= 2 && step[0] > 0 in function 'cv::Mat::locateROI' ”
维度错误,如果 tensor/ array的shape为 [0 ,3, 1080, 1920]诸如此类,需要 通过 torch.squeeze(tensor, dim=0) 去掉多余的维度,最后通过 torch.unsqueeze(tensor, dim=0) 变换回去。
bug4:opencv报错:
File "D:/PythonCVWorkspace/pytorch-YOLO-v1/yolodataset.py", line 138, in BGR2HSV return cv2.cvtColor(img, cv2.COLOR_BGR2HSV) cv2.error: OpenCV(4.2.0) c:\projects\opencv-python\opencv\modules\imgproc\src\color.simd_helpers.hpp:92: error: (-2:Unspecified error) in
function '__cdecl cv::impl::`anonymous-namespace'::CvtHelper<struct cv::impl::`anonymous namespace'::Set<3,4,-1>,struct cv::impl::A0x3b52564f
::Set<3,-1,-1>,struct cv::impl::A0x3b52564f::Set<0,5,-1>,2>::CvtHelper(const class cv::_InputArray &,const class cv::_OutputArray &,int)' > Invalid number of channels in input image: > 'VScn::contains(scn)' > where > 'scn' is 1
bug5: 显存爆了
return torch.batch_norm( torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 4.00 GiB total capacity; 2.48 GiB already allocated; 0 bytes free; 2.55 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
纯属显卡太弱,我的显卡 GeForce GTX 只有 4GB 显存,降低 batch_size 到4即可。
bug6:数值类型错误
total_loss += loss.data[0] IndexError: invalid index of a 0-dim tensor. Use `tensor.item()` in Python or `tensor.item<T>()` in C++ to convert a 0-dim tensor to a number
附
total_loss += loss.data.item TypeError: unsupported operand type(s) for +=: 'float' and 'builtin_function_or_method'
修改方法:
loss.data 是一个Tensor 类型的标量, tensor(61.8650, device='cuda:0') <class 'torch.Tensor'>
loss.data.item 是一个方法名,正确的是 loss.data.item() 获取其中数值。
total_loss += loss.data.item()
warning 7:UserWarning: indexing with dtype torch.uint8 is now deprecated, please use a dtype torch.bool instead.
添加屏蔽代码
import warnings warnings.filterwarnings("ignore", category=UserWarning)
至此,除了关于数据增强方面的两个opencv的错误(随机亮度变换、随机色彩空间变换),以及后来显存又爆了之外。通过自己做的150张图片的小数据集测试跑通了源码。