基于yolo进行目标检测的实验和研究

根据我经验来看，需要我们进行检测的不是自然场景下的任意物体，而是特定场景下一类物体。典型的就是钢管识别，这些照片一般都是在厂区里面拍的、是对着钢管拍的，拍摄的目的是识别出钢管的数量。这里就为YOLO一类目标检测技术提供了空间，通过基于自定义数据集的迁移学习，能够实现一些效果，这里将相关东西整理出来。

基础环境：

标注工具分为本机标注和网络智能标注。LabelImg能够提供不错的操作体验，在windows上我只找到一个老版本的，能够生成voc的数据格式；在linux上的最新版本可以直接生成yolo的格式。

easydate能够提供只能标注，简单来说就是你只需要标注几张图片（不同数据数量不同），它可以先训练一个小模型，辅助你继续进行标注。

这个的价值肯定是不言而喻的。在标注这块，我还编码实现了基于传统算法的标签生成工具。具体来说，就是基于传统算法，生成初略的数据集，将这里的结果作为标签导入模型训练中去。

多措并举，能够有效提高数据标注速度。当然，标注仍然是一个痛苦的过程。

在训练这块，我使用了基于Kaggle的yolo官方notebook（ https://www.kaggle.com/code/jsxyhelu2019/yolov5 ），进行一些删改后使用。这样能够有效使用免费的GPU，而且相对来说也比较方便。

实验一：筷子识别

第一个例子使用了网络上找到的数据集“筷子识别”，他最大的优势就是提供了完整的数据集。

1、图片的采集，可以看到这里的数据是比较集中的、质量是比较高的。数量大概210左右。

2、labelimg标注。原作者提供了标注好的voc结果，可以直接使用，也可以拿过来自己体会一下。

3、智能标注扩充。生成3张图像，使用easydl进行智能进行扩充。注意这里如果无标注数据太少，智能标注会开启失败。

4、修正结果

5、数据格式转换。需要将voc格式转换成为yolo格式。注意这里的label和前面标注的label是一样的。

import os
import xml.etree.ElementTree as ET

classes = ["label"]

# 将x1, y1, x2, y2转换成yolov5所需要的x, y, w, h格式
def xyxy2xywh(size, box):
    dw = 1. / size[0]
    dh = 1. / size[1]
    x = (box[0] + box[2]) / 2 * dw
    y = (box[1] + box[3]) / 2 * dh
    w = (box[2] - box[0]) * dw
    h = (box[3] - box[1]) * dh
    return (x, y, w, h)         # 返回的都是标准化后的值


def voc2yolo(path):
    # 可以打印看看该路径是否正确
    print(len(os.listdir(path)))
    # 遍历每一个xml文件
    for file in os.listdir(path):
        # xml文件的完整路径
        label_file = path + file
        # 最终要改成的txt格式文件,这里我是放在voc2007/labels/下面
        out_file = open(path.replace('Annotations', 'labels') + file.replace('xml', 'txt'), 'w')
        # print(label_file)

        # 开始解析xml文件
        tree = ET.parse(label_file)
        root = tree.getroot()
        size = root.find('size')            # 图片的shape值
        w = int(size.find('width').text)
        h = int(size.find('height').text)

        for obj in root.iter('object'):
            difficult = obj.find('difficult').text
            cls = obj.find('name').text
            if cls not in classes or int(difficult) == 1:
                continue
            # 将名称转换为id下标    
            cls_id = classes.index(cls)
            # 获取整个bounding box框
            bndbox = obj.find('bndbox')
            # xml给出的是x1, y1, x2, y2
            box = [float(bndbox.find('xmin').text), float(bndbox.find('ymin').text), float(bndbox.find('xmax').text),
                float(bndbox.find('ymax').text)]

            # 将x1, y1, x2, y2转换成yolov5所需要的x, y, w, h格式
            bbox = xyxy2xywh((w, h), box)
            # 写入目标文件中，格式为 id x y w h
            out_file.write(str(cls_id) + " " + " ".join(str(x) for x in bbox) + '\n')

if __name__ == '__main__':
    # 这里要改成自己数据集路径的格式
    path = 'E:/DatasetId_1624813_1657795900/Annotations/'
    voc2yolo(path)

我还特地验证一下效果。

6、kaggle训练

数据上传注意需要都是zip格式的。

编写的一些内容：

path: datasets/kuaizi # dataset root dir train: train val: val test: # Classes nc: 1 names: ['label']

使用现有资源，进行训练.

注意这里的iamges和labels是放在一起的。

相关结果查看。

model下载，可以作为infer来使用。最好是能够测试，直接使用infer测试也是不错.

模型训练完成后，将runs/exp/weights下的模型（best.pt）复制在yolov5文件夹下。如下图所示：

实验二：钢管识别

有了前面的经验，可以更加放手做一些工作。

钢管数据的特点就是需要自己标注，所以这里我使用了很多trick来完成这个目标。

1、首先，单模型使用小标签（P）方便显示

2、使用现有工具，能够完成一些东西。那么数据需要重新来做，并且把里面重复的东西确实的去掉。

我现在给出的是这个代码可以生产coco json 的结果

3、使用easyDL进行标注管理。也需要做好多轮标注的准备

数据标注量很大，无法保证数据集的高可用，必须寻找到有效的迭代方法。比如这样图像的标注，实在是太费事了

开启easydl智能标注，需要60张左右；

4、最终easyDL给出了不错的错误观测界面，这个对于我数据分析来说是有用的

这个训练花了4个小时，这个也是需要注意一下的时间。最后就是部署这块

5、easydl提供了一些模式，但是需要和设备进行绑定。对于现在的我来说了，绑定不符合SMD的预期模式，所以先不看。我需要的还是YOLO原生，然后尽可能达到同样的MAP等

这里的mAP到底是否可用？我查了一些资料证明还是可行的

那这么看来目前这个值还是很高的。在这条道路上继续努力，主要是解决yolov5的结果由opencv调用的问题，并且基于GOCW实现csharp对opencv的调用，那么最终实现了新的标注工具，应该说在现有的基础上极大地提高了识别效果：

具体过程可以看附录。

6、方面验证是可行的，但是效果不尽如人意，其根本原因是我数据标注的不够。相关的技术需要研究出来去购买数据，这是另一个维度的问题。此外在软件使用过程中，如果能够将数据标注的过程反馈出来，那就是更上一个层次。

实验三：毛发识别

实际上，毛发识别存在尺度、方向等多个问题，并且需要解决的是3对象，难度是更大的。首先还是以现有的方法来进行处理，然后再思考其他。

说实话，我对使用深度学习解决毛发识别问题不是有完全的信息，因为这不是一个典型的问题，比如对象不是一个矩形框，而应该是自定义四边形。

如果使用举行进行标注，可以发现重叠非常多：

平行思考，如果基于AI去做语义分割的话，意义也不是很大，传统算法已经能够做很好的分割。

毛发识别算法的关键在于去除粘连，特别在根据现有的硬件进行精度测量，这样才能够获得高效的东西。

实验小结：

讨论一下传统方法和AI方法之间的关系：

1、AI能够解决很多传统方法无法解决、解决不好的问题，但AI不是万能的、仍然有很多问题目前无法解决；

2、使用传统方法为AI生产数据集、全流程地参与到AI生产中，可能是未来出路；

3、AI只不过是增加了一种解决问题的、不同维度的方法，它出来了很多模型和工具，但是解决问题的思路仍然关键决定因素。

感谢阅读至此，希望有所帮助。

附录如何生成新的标注工具：

最后我选择了50张质量比较好的数据，其中有一些也进行了一些处理，保留质量比较好的部分。【被删除的部分中，也有一部分经过一些处理后是可以再用的】，现在我想得到的是一个基本的结果。

节点就是：使用现有模型，生成一个新的标注工具。

观察这里的训练过程：

wandb: Currently logged in as: jsxyhelu. Use `wandb login --relogin` to force relogintrain: weights=yolov5s.pt, cfg=, data=piple.yaml, hyp=data/hyps/hyp.scratch-low.yaml, epochs=100, batch_size=16, imgsz=640, rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, noplots=False, evolve=None, bucket=, cache=ram, image_weights=False, device=, multi_scale=False, single_cls=False, optimizer=SGD, sync_bn=False, workers=8, project=runs/train, name=exp, exist_ok=False, quad=False, cos_lr=False, label_smoothing=0.0, patience=100, freeze=[0], save_period=-1, seed=0, local_rank=-1, entity=None, upload_dataset=False, bbox_interval=-1, artifact_alias=latestgithub: up to date with https://github.com/ultralytics/yolov5

✅

YOLOv5 🚀 v6.1-394-gd7bc5d7 Python-3.7.12 torch-1.11.0 CUDA:0 (Tesla P100-PCIE-16GB, 16281MiB)

hyperparameters: lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0ClearML: run 'pip install clearml' to automatically track, visualize and remotely train YOLOv5 🚀 runs in ClearMLTensorBoard: Start with 'tensorboard --logdir runs/train', view at http://localhost:6006/wandb: wandb version 0.13.1 is available! To upgrade, please run:wandb: $ pip install wandb --upgradewandb: Tracking run with wandb version 0.12.18wandb: Run data is saved locally in /kaggle/working/yolov5/wandb/run-20220815_064342-2yzwwnr7wandb: Run `wandb offline` to turn off syncing.wandb: Syncing run gallant-river-17wandb: ⭐️ View project at https://wandb.ai/jsxyhelu/yolov5wandb: 🚀 View run at https://wandb.ai/jsxyhelu/yolov5/runs/2yzwwnr7Downloading https://ultralytics.com/assets/Arial.ttf

to /root/.config/Ultralytics/Arial.ttf...

100%|████████████████████████████████████████| 755k/755k [00:00<00:00, 27.6MB/s]

YOLOv5 temporarily requires wandb version 0.12.10 or below. Some features may not work as expected.

Downloading

https://github.com/ultralytics/yolov5/releases/download/v6.1/yolov5s.pt

to yolov5s.pt...

100%|██████████████████████████████████████| 14.1M/14.1M [00:00<00:00, 46.5MB/s]

Overriding model.yaml nc=80 with nc=1

from n params module arguments

0 -1 1 3520 models.common.Conv [3, 32, 6, 2, 2]

1 -1 1 18560 models.common.Conv [32, 64, 3, 2]

2 -1 1 18816 models.common.C3 [64, 64, 1]

3 -1 1 73984 models.common.Conv [64, 128, 3, 2]

4 -1 2 115712 models.common.C3 [128, 128, 2]

5 -1 1 295424 models.common.Conv [128, 256, 3, 2]

6 -1 3 625152 models.common.C3 [256, 256, 3]

7 -1 1 1180672 models.common.Conv [256, 512, 3, 2]

8 -1 1 1182720 models.common.C3 [512, 512, 1]

9 -1 1 656896 models.common.SPPF [512, 512, 5]

10 -1 1 131584 models.common.Conv [512, 256, 1, 1]

11 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']

12 [-1, 6] 1 0 models.common.Concat [1]

13 -1 1 361984 models.common.C3 [512, 256, 1, False]

14 -1 1 33024 models.common.Conv [256, 128, 1, 1]

15 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']

16 [-1, 4] 1 0 models.common.Concat [1]

17 -1 1 90880 models.common.C3 [256, 128, 1, False]

18 -1 1 147712 models.common.Conv [128, 128, 3, 2]

19 [-1, 14] 1 0 models.common.Concat [1]

20 -1 1 296448 models.common.C3 [256, 256, 1, False]

21 -1 1 590336 models.common.Conv [256, 256, 3, 2]

22 [-1, 10] 1 0 models.common.Concat [1]

23 -1 1 1182720 models.common.C3 [512, 512, 1, False]

24 [17, 20, 23] 1 16182 models.yolo.Detect [1, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]

Model summary: 270 layers, 7022326 parameters, 7022326 gradients, 15.9 GFLOPs

Transferred 343/349 items from yolov5s.pt

AMP: checks passed ✅optimizer: SGD(lr=0.01) with parameter groups 57 weight(decay=0.0), 60 weight(decay=0.0005), 60 biasalbumentations: Blur(always_apply=False, p=0.01, blur_limit=(3, 7)), MedianBlur(always_apply=False, p=0.01, blur_limit=(3, 7)), ToGray(always_apply=False, p=0.01), CLAHE(always_apply=False, p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8))train:

Scanning '/kaggle/working/yolov5/datasets/piple/train' images and labels./opt/conda/lib/python3.7/site-packages/PIL/TiffImagePlugin.py:845: UserWarning: Corrupt EXIF data. Expecting to read 4 bytes but only got 0.

warnings.warn(str(msg))

train: Scanning '/kaggle/working/yolov5/datasets/piple/train' images and labels.train: New cache created: /kaggle/working/yolov5/datasets/piple/train.cachetrain: Caching images (0.0GB ram): 100%|██████████| 45/45 [00:00<00:00, 69.86it/val: Scanning '/kaggle/working/yolov5/datasets/piple/val' images and labels...5 val: New cache created: /kaggle/working/yolov5/datasets/piple/val.cacheval:

Caching images (0.0GB ram): 100%|██████████| 5/5 [00:00<00:00, 7.00it/s]

Plotting labels to runs/train/exp/labels.jpg...

AutoAnchor:

4.06 anchors/target, 1.000 Best Possible Recall (BPR). Current anchors are a good fit to dataset ✅

Image sizes 640 train, 640 val

Using 2 dataloader workers

Logging results to

runs/train/exp

Starting training for 100 epochs...

Epoch gpu_mem box obj cls labels img_size

0/99 3.46G 0.1338 1.312 0 1.079e+04 640: 100%|███

/opt/conda/lib/python3.7/site-packages/torch/optim/lr_scheduler.py:136: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at

https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate

", UserWarning)

Class Images Labels P R mAP@.5 mAP@WARNING: NMS time limit 0.450s exceeded

Class Images Labels P R mAP@.5 mAP@

all 5 1750 0.00111 0.000571 0.000689 0.000138

Epoch gpu_mem box obj cls labels img_size

1/99 3.46G 0.1336 1.211 0 9460 640: 100%|███

Class Images Labels P R mAP@.5 mAP@WARNING: NMS time limit 0.450s exceeded

Class Images Labels P R mAP@.5 mAP@

all 5 1750 0.0025 0.00171 0.00162 0.000242

Epoch gpu_mem box obj cls labels img_size

2/99 3.46G 0.1334 1.156 0 9768 640: 100%|███

Class Images Labels P R mAP@.5 mAP@WARNING: NMS time limit 0.450s exceeded

Class Images Labels P R mAP@.5 mAP@

all 5 1750 0.0111 0.00571 0.00616 0.00106

Epoch gpu_mem box obj cls labels img_size

3/99 3.46G 0.1326 1.078 0 8882 640: 100%|███

Class Images Labels P R mAP@.5 mAP@WARNING: NMS time limit 0.450s exceeded

Class Images Labels P R mAP@.5 mAP@

all 5 1750 0.0167 0.00571 0.00901 0.00217

Epoch gpu_mem box obj cls labels img_size

4/99 3.46G 0.1301 0.8823 0 7590 640: 100%|███

……………………

99/99 3.46G 0.0931 1.02 0 8095 640: 100%|███

Class Images Labels P R mAP@.5 mAP@

all 5 1750 0.761 0.464 0.546 0.265

100 epochs completed in 0.318 hours.

Optimizer stripped from runs/train/exp/weights/last.pt, 14.5MB

Optimizer stripped from runs/train/exp/weights/best.pt, 14.5MB

Validating runs/train/exp/weights/best.pt...

Fusing layers...

Model summary: 213 layers, 7012822 parameters, 0 gradients, 15.8 GFLOPs

Class Images Labels P R mAP@.5 mAP@

all 5 1750 0.773 0.468 0.557 0.275

Results saved to

runs/train/expwandb: Waiting for W&B process to finish... (success).wandb: wandb: wandb: Run history:wandb: metrics/mAP_0.5 ▁▁▂▂▁▁▁▂▂▃▃▄▄▄▄▄▅▅▇▆▆▆▆▇▇▇▇▇██▇████▇████wandb: metrics/mAP_0.5:0.95 ▁▁▁▁▁▁▁▂▂▂▂▂▃▂▃▃▃▃▅▄▅▅▅▅▅▆▆▆▇▇▆▇█▇▇▇██▇█wandb: metrics/precision ▁▁▂▂▁▁▂▃▃▄▄▄▅▄▅▅▅▅▇▆▆▆▆▇▇▇▇▇██▇█████████wandb: metrics/recall ▁▁▂▂▁▁▂▃▃▄▄▄▅▅▅▅▅▆▇▆▇▆▇▇▇▇▇▇████████████wandb: train/box_loss ██▇▇▆▆▅▅▅▅▅▄▄▄▄▄▄▃▃▄▅▃▄▄▃▃▃▃▂▂▃▂▂▂▂▁▁▂▂▂wandb: train/cls_loss ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁wandb: train/obj_loss █▆▂▁▃▂▃▃▅▃▃▄▄▄▄▄▄▄▄▄▅▄▄▅▅▄▆▄▄▅▄▄▄▄▄▄▄▄▄▅wandb: val/box_loss ██▇▇▇▆▆▆▅▅▄▄▄▄▃▃▃▃▄▄▃▃▃▃▃▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁wandb: val/cls_loss ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁wandb: val/obj_loss ▅▂▃▁▄██▂▇▂▅▄▂▂▂▁▂▂▃▅▂▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂wandb: x/lr0 ██▇▇▆▆▅▅▄▄▃▃▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁wandb: x/lr1 ▁▂▃▃▄▄▅▆▆▆▇▇████▇▇▇▇▆▆▆▅▅▅▅▄▄▄▄▃▃▃▂▂▂▂▁▁wandb: x/lr2 ▁▂▃▃▄▄▅▆▆▆▇▇████▇▇▇▇▆▆▆▅▅▅▅▄▄▄▄▃▃▃▂▂▂▂▁▁wandb: wandb: Run summary:wandb: best/epoch 82wandb: best/mAP_0.5 0.55454wandb: best/mAP_0.5:0.95 0.2751wandb: best/precision 0.77001wandb: best/recall 0.46571wandb: metrics/mAP_0.5 0.55652wandb: metrics/mAP_0.5:0.95 0.27521wandb: metrics/precision 0.77288wandb: metrics/recall 0.468wandb: train/box_loss 0.0931wandb: train/cls_loss 0.0wandb: train/obj_loss 1.01991wandb: val/box_loss 0.09157wandb: val/cls_loss 0.0wandb: val/obj_loss 0.52895wandb: x/lr0 0.0003wandb: x/lr1 0.0003wandb: x/lr2 0.0003wandb: wandb: Synced gallant-river-17: https://wandb.ai/jsxyhelu/yolov5/runs/2yzwwnr7wandb: Synced 5 W&B file(s), 109 media file(s), 1 artifact file(s) and 0 other file(s)wandb: Find logs at: ./wandb/run-20220815_064342-2yzwwnr7/logs

Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7f147b33e710>

Traceback (most recent call last):

File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1358, in __del__

File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1283, in _shutdown_workers

AttributeError: 'NoneType' object has no attribute 'python_exit_status'

Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7f147b33e710>

Traceback (most recent call last):

File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1358, in __del__

File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1283, in _shutdown_workers

AttributeError: 'NoneType' object has no attribute 'python_exit_status'

已经推测结果：

detect:

weights=['/kaggle/working/yolov5/runs/train/exp/weights/best.pt'], source=data/images, data=data/coco128.yaml, imgsz=[640, 640], conf_thres=0.25, iou_thres=0.45, max_det=1000, device=, view_img=False, save_txt=False, save_conf=False, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=False, visualize=False, update=False, project=runs/detect, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False, dnn=False

YOLOv5 🚀 v6.1-394-gd7bc5d7 Python-3.7.12 torch-1.11.0 CUDA:0 (Tesla P100-PCIE-16GB, 16281MiB)

Fusing layers...

Model summary: 213 layers, 7012822 parameters, 0 gradients, 15.8 GFLOPs

image 1/3 /kaggle/working/yolov5/data/images/111.jpg: 640x480 180 ps, Done. (0.010s)

image 2/3 /kaggle/working/yolov5/data/images/bus.jpg: 640x480 16 ps, Done. (0.006s)

image 3/3 /kaggle/working/yolov5/data/images/zidane.jpg: 384x640 6 ps, Done. (0.009s)

Speed: 0.4ms pre-process, 8.7ms inference, 1.0ms NMS per image at shape (1, 3, 640, 640)

Results saved to

runs/detect/exp2

[40]:

生成一个新的标注工具****，在现有工作基础上，研究实现opencv调用yolov5，实现本地cpu infer的实际，取得突破性进展。

最新的yolov5在网络结构上针对infer的具体实际进行了较大的优化，有利于调用的具体实施。

#include <iostream>
#include "opencv2/dnn.hpp"
#include "opencv2/imgproc.hpp"
#include "opencv2/highgui.hpp"
#include<math.h>
using namespace std;
using namespace cv;
using namespace dnn;
#define YOLO_P6 false //是否使用P6模型
struct Output
{
    int id;             //结果类别id
    float confidence;   //结果置信度
    cv::Rect box;       //矩形框
};
bool readModel(cv::dnn::Net& net, std::string& netPath, bool isCuda);
bool Detect(cv::Mat& SrcImg, cv::dnn::Net& net, std::vector<Output>& output);
void drawPred(cv::Mat& img, std::vector<Output> result, std::vector<cv::Scalar> color);
#if(defined YOLO_P6 && YOLO_P6==true)
    const float netAnchors[4][6] = { { 19,27, 44,40, 38,94 },{ 96,68, 86,152, 180,137 },{ 140,301, 303,264, 238,542 },{ 436,615, 739,380, 925,792 } };
    const int netWidth = 1280;  //ONNX图片输入宽度
    const int netHeight = 1280; //ONNX图片输入高度
    const int strideSize = 4;  //stride size
#else
    const float netAnchors[3][6] = { { 10,13, 16,30, 33,23 },{ 30,61, 62,45, 59,119 },{ 116,90, 156,198, 373,326 } };
    const int netWidth = 640;   //ONNX图片输入宽度
    const int netHeight = 640;  //ONNX图片输入高度
    const int strideSize = 3;   //stride size
#endif // YOLO_P6
const float netStride[4] = { 8, 16.0,32,64 };
float boxThreshold = 0.25;
float classThreshold = 0.25;
float nmsThreshold = 0.45;
float nmsScoreThreshold = boxThreshold * classThreshold;
std::vector<std::string> className = { "p" };
bool readModel(Net& net, string& netPath, bool isCuda = false) {
    try {
        net = readNet(netPath);
    }
    catch (const std::exception&) {
        return false;
    }
    //cuda
    if (isCuda) {
        net.setPreferableBackend(cv::dnn::DNN_BACKEND_CUDA);
        net.setPreferableTarget(cv::dnn::DNN_TARGET_CUDA_FP16);
    }
    //cpu
    else {
        net.setPreferableBackend(cv::dnn::DNN_BACKEND_DEFAULT);
        net.setPreferableTarget(cv::dnn::DNN_TARGET_CPU);
    }
    return true;
}
bool Detect(Mat& SrcImg, Net& net, vector<Output>& output) {
    Mat blob;
    int col = SrcImg.cols;
    int row = SrcImg.rows;
    int maxLen = MAX(col, row);
    Mat netInputImg = SrcImg.clone();
    if (maxLen > 1.2 * col || maxLen > 1.2 * row) {
        Mat resizeImg = Mat::zeros(maxLen, maxLen, CV_8UC3);
        SrcImg.copyTo(resizeImg(Rect(0, 0, col, row)));
        netInputImg = resizeImg;
    }
    blobFromImage(netInputImg, blob, 1 / 255.0, cv::Size(netWidth, netHeight), cv::Scalar(0, 0, 0), true, false);
    //如果在其他设置没有问题的情况下但是结果偏差很大，可以尝试下用下面两句语句
    //blobFromImage(netInputImg, blob, 1 / 255.0, cv::Size(netWidth, netHeight), cv::Scalar(104, 117, 123), true, false);
    //blobFromImage(netInputImg, blob, 1 / 255.0, cv::Size(netWidth, netHeight), cv::Scalar(114, 114,114), true, false);
    net.setInput(blob);
    std::vector<cv::Mat> netOutputImg;
    net.forward(netOutputImg, net.getUnconnectedOutLayersNames());
    std::vector<int> classIds;//结果id数组
    std::vector<float> confidences;//结果每个id对应置信度数组
    std::vector<cv::Rect> boxes;//每个id矩形框
    float ratio_h = (float)netInputImg.rows / netHeight;
    float ratio_w = (float)netInputImg.cols / netWidth;
    int net_width = className.size() + 5;  //输出的网络宽度是类别数+5
    float* pdata = (float*)netOutputImg[0].data;
    for (int stride = 0; stride < strideSize; stride++) {    //stride
        int grid_x = (int)(netWidth / netStride[stride]);
        int grid_y = (int)(netHeight / netStride[stride]);
        for (int anchor = 0; anchor < 3; anchor++) {    //anchors
            const float anchor_w = netAnchors[stride][anchor * 2];
            const float anchor_h = netAnchors[stride][anchor * 2 + 1];
            for (int i = 0; i < grid_y; i++) {
                for (int j = 0; j < grid_x; j++) {
                    float box_score = pdata[4]; ;//获取每一行的box框中含有某个物体的概率
                    if (box_score >= boxThreshold) {
                        cv::Mat scores(1, className.size(), CV_32FC1, pdata + 5);
                        Point classIdPoint;
                        double max_class_socre;
                        minMaxLoc(scores, 0, &max_class_socre, 0, &classIdPoint);
                        max_class_socre = (float)max_class_socre;
                        if (max_class_socre >= classThreshold) {
                            //rect [x,y,w,h]
                            float x = pdata[0];  //x
                            float y = pdata[1];  //y
                            float w = pdata[2];  //w
                            float h = pdata[3];  //h
                            int left = (x - 0.5 * w) * ratio_w;
                            int top = (y - 0.5 * h) * ratio_h;
                            classIds.push_back(classIdPoint.x);
                            confidences.push_back(max_class_socre * box_score);
                            boxes.push_back(Rect(left, top, int(w * ratio_w), int(h * ratio_h)));
                        }
                    }
                    pdata += net_width;//下一行
                }
            }
        }
    }
    //执行非最大抑制以消除具有较低置信度的冗余重叠框（NMS）
    vector<int> nms_result;
    NMSBoxes(boxes, confidences, nmsScoreThreshold, nmsThreshold, nms_result);
    for (int i = 0; i < nms_result.size(); i++) {
        int idx = nms_result[i];
        Output result;
        result.id = classIds[idx];
        result.confidence = confidences[idx];
        result.box = boxes[idx];
        output.push_back(result);
    }
    if (output.size())
        return true;
    else
        return false;
}
void drawPred(Mat& img, vector<Output> result, vector<Scalar> color) {
    for (int i = 0; i < result.size(); i++) {
        int left, top;
        left = result[i].box.x;
        top = result[i].box.y;
        int color_num = i;
        rectangle(img, result[i].box,Scalar(0,255,0), 1, 8);
        string label = className[result[i].id] + ":" + to_string(result[i].confidence);
        int baseLine;
        Size labelSize = getTextSize(label, FONT_HERSHEY_SIMPLEX, 0.5, 1, &baseLine);
        top = max(top, labelSize.height);
        //rectangle(frame, Point(left, top - int(1.5 * labelSize.height)), Point(left + int(1.5 * labelSize.width), top + baseLine), Scalar(0, 255, 0), FILLED);
        //putText(img, label, Point(left, top), FONT_HERSHEY_SIMPLEX, 1, Scalar(0, 255, 0), 2);
    }
    imshow("drawPred", img);
    waitKey();
}
int main()
{
    string img_path = "E:/sandbox/ppp.jpg";
    string model_path = "e:/template/best.onnx";
    Net net;
    if (readModel(net, model_path, false)) {
        cout << "read net ok!" << endl;
    }
    else {
        return -1;
    }
    //生成随机颜色
    vector<Scalar> color;
    srand(time(0));
    for (int i = 0; i < 80; i++) {
        int b = rand() % 256;
        int g = rand() % 256;
        int r = rand() % 256;
        color.push_back(Scalar(b, g, r));
    }
    vector<Output> result;
    Mat img = imread(img_path);
    if (Detect(img, net, result)) {
        drawPred(img, result, color);
    }
    else {
        cout << "Detect Failed!" << endl;
    }
    system("pause");
    return 0;
}

posted on 2022-08-20 15:37 jsxyhelu 阅读(3332) 评论(0) 收藏举报

刷新页面返回顶部

GreenOpen专注LLM和CV

基于yolo进行目标检测的实验和研究

基于yolo进行目标检测的实验和研究

导航

公告