Yolov5-lite 训练报错完整解决【Check the documentation of torch.load to learn more about types accepted by default with weights_only】与【AttributeError: module 'numpy' has no attribute 'int'.】
github官方下载的yolov5lite训练报错如下:
由于本次环境安装的是最新版本Pytorch、Numpy,而v5lite已经很久没更新,之前的语法在最新版本库下执行失效从而大量报错,需要一步步修改。【已验证】
1. 提示【Check the documentation of torch.load to learn more about types accepted by default with weights_only】
github: skipping check (offline)
YOLOv5 🚀 v1.5-16-g9d649a6 torch 2.6.0+cu124 CPU
Namespace(weights='weights/v5Lite-e.pt', cfg='models/v5Lite-e.yaml', data='data/mydata.yaml', hyp='data/hyp.scratch.yaml', epochs=300, batch_size=16, img_size=[320, 320], rect=False, resume=False, nosave=False, notest=False, noautoanchor=False, evolve=False, bucket='', cache_images=False, image_weights=False, device='cpu', multi_scale=False, single_cls=False, adam=False, sync_bn=False, local_rank=-1, workers=8, project='runs/train', entity=None, name='exp', exist_ok=False, quad=False, linear_lr=False, label_smoothing=0.0, upload_dataset=False, bbox_interval=-1, save_period=-1, artifact_alias='latest', world_size=1, global_rank=-1, save_dir='runs/train/exp9', total_batch_size=16)
tensorboard: Start with 'tensorboard --logdir runs/train', view at http://localhost:6006/
hyperparameters: lr0=0.001, lrf=0.2, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=0.2, mixup=0.0
Traceback (most recent call last):
File "/home/abc/Code/Python/Yolov5/YOLOv5-Lite/train.py", line 544, in <module>
train(hyp, opt, device, tb_writer)
File "/home/abc/Code/Python/Yolov5/YOLOv5-Lite/train.py", line 71, in train
run_id = torch.load(weights).get('wandb_id') if weights.endswith('.pt') and os.path.isfile(weights) else None
^^^^^^^^^^^^^^^^^^^
File "/home/abc/.local/lib/python3.12/site-packages/torch/serialization.py", line 1470, in load
raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint.
(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray._reconstruct was not an allowed global by default. Please use `torch.serialization.add_safe_globals([_reconstruct])` or the `torch.serialization.safe_globals([_reconstruct])` context manager to allowlist this global if you trust this class/function.
Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
这是由于pytorch2.6及之后的版本中,torch.load() 中的默认参数weights_only
从 False
改成了True
,所以在需要导入权重参数时,应该重新改为'False'
打开train.py文件,找到第71行,在torch.load添加weights_only=False
run_id = torch.load(weights, weights_only=False).get('wandb_id') if weights.endswith('.pt') and os.path.isfile(weights) else None
找到第88行,同样添加weights_only=False
ckpt = torch.load(weights, weights_only=False)
打开utils文件夹下的datasets.py文件,找到385行,同样修改
cache, exists = torch.load(cache_path, weights_only=False), True # load
打开utils文件夹下的general.py,找到514行,改为
x = torch.load(f, map_location=torch.device('cpu'), weights_only=False)
打开models/experimental.py(这里是模型pt转化onnx时需要用到),找到118行,改为
ckpt = torch.load(w, map_location=map_location, weights_only=False) # load
2. 提示【AttributeError: module 'numpy' has no attribute 'int'. use'np.int32' or 'np.int64'】
Transferred 378/386 items from weights/v5Lite-e.pt
Scaled weight_decay = 0.0005
Optimizer groups: 66 .bias, 66 conv.weight, 63 other
train: Scanning '../work/train/labels' images and labels... 20 found, 0 missing, 0 empty, 0 corrupted: 100%|████████████████████████████████████████████| 20/20 [00:00<00:00, 68.39it/s]
train: New cache created: ../work/train/labels.cache
Traceback (most recent call last):
File "/home/abc/Code/Python/Yolov5/YOLOv5-Lite/train.py", line 544, in <module>
train(hyp, opt, device, tb_writer)
File "/home/abc/Code/Python/Yolov5/YOLOv5-Lite/train.py", line 190, in train
dataloader, dataset = create_dataloader(train_path, imgsz, batch_size, gs, opt,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/abc/Code/Python/Yolov5/YOLOv5-Lite/utils/datasets.py", line 63, in create_dataloader
dataset = LoadImagesAndLabels(path, imgsz, batch_size,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/abc/Code/Python/Yolov5/YOLOv5-Lite/utils/datasets.py", line 411, in __init__
bi = np.floor(np.arange(n) / batch_size).astype(np.int) # batch index
^^^^^^
File "/home/abc/.local/lib/python3.12/site-packages/numpy/__init__.py", line 397, in __getattr__
raise AttributeError(__former_attrs__[attr], name=None)
AttributeError: module 'numpy' has no attribute 'int'.
`np.int` was a deprecated alias for the builtin `int`. To avoid this error in existing code, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
根据提示不难看出,numpy在1.20版本后弃用了np.int这个方法,我们需要替换成np.int64或者np.int32,或者直接将其换成int
打开utils文件夹下的datasets.py文件,搜索全部np.int,并替换成np.int32
运行后发现还是有问题
"...YOLOv5-Lite/utils/general.py", line 222, in labels_to_class_weights
classes = labels[:, 0].astype(np.int) # labels = [class xywh]
^^^^^^
再次打开general.py,并同样替换
3. loss函数报错:RuntimeError: result type Float can't be cast to the desired output type long int
loss, loss_items = compute_loss(pred, targets.to(device)) # loss scaled by batch_size^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "/home/abc/Code/Python/Yolov5/YOLOv5-Lite/utils/loss.py", line 117, in __call__tcls, tbox, indices, anchors = self.build_targets(p, targets) # targets^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "/home/abc/Code/Python/Yolov5/YOLOv5-Lite/utils/loss.py", line 211, in build_targetsindices.append((b, a, gj.clamp_(0, gain[3] - 1), gi.clamp_(0, gain[2] - 1))) # image, anchor, grid indices^^^^^^^^^^^^^^^^^^^^^^^^^RuntimeError: result type Float can't be cast to the desired output type long int
解决办法,修改utils文件夹下的loos.py
(178行)搜索 anchors = self.anchors[i]
并替换为如下语句
anchors, shape = self.anchors[i], p[i].shape
(211行)搜索 indices.append((b, a, gj.clamp_(0, gain[3] - 1), gi.clamp_(0, gain[2] - 1))) # image, anchor, grid indices
并替换为如下语句
indices.append((b, a, gj.clamp_(0, shape[2] - 1), gi.clamp_(0, shape[3] - 1))) # image, anchor, grid
重新训练
修改完毕重新训练不再报错,记得训练前都要删除训练、验证路径下的labels.cache文件。