u版yolov3详解 --->> 网络部分

网络部分
网络在/models/yolov3.yaml里面定义,如下:

# parameters
nc: 80  # number of classes
depth_multiple: 1.0  # model depth multiple
width_multiple: 1.0  # layer channel multiple

# anchors
anchors:
  - [10,13, 16,30, 33,23]  # P3/8
  - [30,61, 62,45, 59,119]  # P4/16
  - [116,90, 156,198, 373,326]  # P5/32

# darknet53 backbone
backbone:
  # [from, number, module, args]
  [[-1, 1, Conv, [32, 3, 1]],  # 0                   ## 0 
   [-1, 1, Conv, [64, 3, 2]],  # 1-P1/2              ## 1 
   [-1, 1, Bottleneck, [64]],                        ## 2
   [-1, 1, Conv, [128, 3, 2]],  # 3-P2/4             ## 3 
   [-1, 2, Bottleneck, [128]],                       ## 4 
   [-1, 1, Conv, [256, 3, 2]],  # 5-P3/8             ## 5 
   [-1, 8, Bottleneck, [256]],                       ## 6 
   [-1, 1, Conv, [512, 3, 2]],  # 7-P4/16            ## 7 
   [-1, 8, Bottleneck, [512]],                       ## 8 
   [-1, 1, Conv, [1024, 3, 2]],  # 9-P5/32           ## 9 
   [-1, 4, Bottleneck, [1024]],  # 10                ## 10 
  ]

# YOLOv3 head
head:
  [[-1, 1, Bottleneck, [1024, False]],               ## 11
   [-1, 1, Conv, [512, [1, 1]]],
   [-1, 1, Conv, [1024, 3, 1]],
   [-1, 1, Conv, [512, 1, 1]],
   [-1, 1, Conv, [1024, 3, 1]],  # 15 (P5/32-large)

   [-2, 1, Conv, [256, 1, 1]],                              ## 16
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],               ## 17
   [[-1, 8], 1, Concat, [1]],  # cat backbone P4             ## 18
   [-1, 1, Bottleneck, [512, False]],                        ## 19
   [-1, 1, Bottleneck, [512, False]],                         ## 20
   [-1, 1, Conv, [256, 1, 1]],                                 ## 21
   [-1, 1, Conv, [512, 3, 1]],  # 22 (P4/16-medium)              ## 22

   [-2, 1, Conv, [128, 1, 1]],                                 ## 23
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],                 ## 24
   [[-1, 6], 1, Concat, [1]],  # cat backbone P3                ## 25
   [-1, 1, Bottleneck, [256, False]],                          ## 26
   [-1, 2, Bottleneck, [256, False]],  # 27 (P3/8-small)      ## 27

   [[27, 22, 15], 1, Detect, [nc, anchors]],   # Detect(P3, P4, P5)
  ]

一开始看一头雾水,然后耐下心结合代码看还是很清晰的。
要注意# [from, number, module, args]。
from是从哪里接,-1就是代表上一层,-2就是上上层,具体数字就是具体哪一层。
层数就是我后面注释的##部分数字,就是从0排下来的。
number就是重复来几次,8, Bottleneck就是重复8次Bottleneck,和resnet里面的残差类似。
args就是module的参数。
解析yolov3.yaml代码如下:

def parse_model(d, ch):  # model_dict, input_channels(3)
    logger.info('\n%3s%18s%3s%10s  %-40s%-30s' % ('', 'from', 'n', 'params', 'module', 'arguments'))
    anchors, nc, gd, gw = d['anchors'], d['nc'], d['depth_multiple'], d['width_multiple']
    na = (len(anchors[0]) // 2) if isinstance(anchors, list) else anchors  # number of anchors
    no = na * (nc + 5)  # number of outputs = anchors * (classes + 5)

    layers, save, c2 = [], [], ch[-1]  # layers, savelist, ch out
    # tmp_1 = d['backbone'] + d['head']
    for i, (f, n, m, args) in enumerate(d['backbone'] + d['head']):  # from, number, module, args
        m = eval(m) if isinstance(m, str) else m  # eval strings
        for j, a in enumerate(args):
            try:
                args[j] = eval(a) if isinstance(a, str) else a  # eval strings
            except:
                pass

        n = max(round(n * gd), 1) if n > 1 else n  # depth gain
        if m in [Conv, GhostConv, Bottleneck, GhostBottleneck, SPP, DWConv, MixConv2d, Focus, CrossConv, BottleneckCSP,
                 C3, C3TR]:
            c1, c2 = ch[f], args[0]
            if c2 != no:  # if not output
                c2 = make_divisible(c2 * gw, 8)

            args = [c1, c2, *args[1:]]
            if m in [BottleneckCSP, C3, C3TR]:
                args.insert(2, n)  # number of repeats
                n = 1
        elif m is nn.BatchNorm2d:
            args = [ch[f]]
        elif m is Concat:
            c2 = sum([ch[x] for x in f])
        elif m is Detect:
            args.append([ch[x] for x in f])
            if isinstance(args[1], int):  # number of anchors
                args[1] = [list(range(args[1] * 2))] * len(f)
        elif m is Contract:
            c2 = ch[f] * args[0] ** 2
        elif m is Expand:
            c2 = ch[f] // args[0] ** 2
        else:
            c2 = ch[f]
        m_ = nn.Sequential(*[m(*args) for _ in range(n)]) if n > 1 else m(*args)  # module
        t = str(m)[8:-2].replace('__main__.', '')  # module type
        np = sum([x.numel() for x in m_.parameters()])  # number params
        m_.i, m_.f, m_.type, m_.np = i, f, t, np  # attach index, 'from' index, type, number params
        logger.info('%3s%18s%3s%10.0f  %-40s%-30s' % (i, f, n, np, t, args))  # print
        save.extend(x % i for x in ([f] if isinstance(f, int) else f) if x != -1)  # append to savelist
        # if len(save) != 0:
        #     ii = i
        #     tmp = -2 % i   #  -2 % 16 =14
        #     aa = 0
        layers.append(m_)
        if i == 0:
            ch = []
        ch.append(c2)
    return nn.Sequential(*layers), sorted(save)

save保存了需要保存的feature map的序号。
https://blog.csdn.net/dz4543/article/details/90049377

上面这张图大体显示了yolov3的网络,只不过它输入是256大小的。我列出了640大小的数据流表格如下:

作者在3条预测支路采用的也是全卷积的结构,其中最后一个卷积层的卷积核个数是255,是针对COCO数据集的80类:3*(80+4+1)=255,3表示一个grid cell包含3个bounding box,4表示框的4个坐标信息,1表示objectness score。

        for i in range(self.nl):
            x[i] = self.m[i](x[i])  # conv
            bs, _, ny, nx = x[i].shape  # x(bs,255,20,20) to x(bs,3,20,20,85)
            x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()

[bs,255,80,80] ---> [bs,3,85,80,80] ---> [bs,3,80,80,85]

[bs,255,40,40] ---> [bs,3,85,40,40] ---> [bs,3,40,40,85]

[bs,255,20,20] ---> [bs,3,85,20,20] ---> [bs,3,20,20,85]

in_num out_num k s out_shape
input
backbone 0 Conv 3 32
1 Conv 32 64
2 Bottleneck(×1) 64 64
3 Conv 64 128
4 Bottleneck(×2) 128 128
5 Conv 128 256
6 Bottleneck(×8) 256 256
7 Conv 256 512
8 Bottleneck(×8) 512 512
9 Conv 512 1024
10 Bottleneck(×4) 1024 1024
head 11 Bottleneck(×1) 1024 1024
12 Conv 1024 512
13 Conv 512 1024
14 Conv 1024 512
15 Conv 512 1024
head 16 [-2]Conv 512 256
17 nn.Upsample 256 256
18 [-1,8]Concat [256,40,40] + [512,40,40]
19 Bottleneck(×1) 768 512
20 Bottleneck(×1) 512 512
21 Conv 512 256
22 Conv 265 512
head 23 [-2]Conv 256 128
24 nn.Upsample 128 128
25 [-1,6]Concat [128,80,80] + [256,80,80]
26 Bottleneck(×1) 384 256
27 Bottleneck(×2) 256 256
Detect 28 [27]Conv 256 255
[22]Conv 512 255
[15]Conv 1024 255

网上还有一个网络图,随便看看:

posted @ 2021-09-18 17:19  无左无右  阅读(783)  评论(0编辑  收藏  举报