u版yolov3详解 --->> 网络部分
网络部分
网络在/models/yolov3.yaml里面定义,如下:
# parameters
nc: 80 # number of classes
depth_multiple: 1.0 # model depth multiple
width_multiple: 1.0 # layer channel multiple
# anchors
anchors:
- [10,13, 16,30, 33,23] # P3/8
- [30,61, 62,45, 59,119] # P4/16
- [116,90, 156,198, 373,326] # P5/32
# darknet53 backbone
backbone:
# [from, number, module, args]
[[-1, 1, Conv, [32, 3, 1]], # 0 ## 0
[-1, 1, Conv, [64, 3, 2]], # 1-P1/2 ## 1
[-1, 1, Bottleneck, [64]], ## 2
[-1, 1, Conv, [128, 3, 2]], # 3-P2/4 ## 3
[-1, 2, Bottleneck, [128]], ## 4
[-1, 1, Conv, [256, 3, 2]], # 5-P3/8 ## 5
[-1, 8, Bottleneck, [256]], ## 6
[-1, 1, Conv, [512, 3, 2]], # 7-P4/16 ## 7
[-1, 8, Bottleneck, [512]], ## 8
[-1, 1, Conv, [1024, 3, 2]], # 9-P5/32 ## 9
[-1, 4, Bottleneck, [1024]], # 10 ## 10
]
# YOLOv3 head
head:
[[-1, 1, Bottleneck, [1024, False]], ## 11
[-1, 1, Conv, [512, [1, 1]]],
[-1, 1, Conv, [1024, 3, 1]],
[-1, 1, Conv, [512, 1, 1]],
[-1, 1, Conv, [1024, 3, 1]], # 15 (P5/32-large)
[-2, 1, Conv, [256, 1, 1]], ## 16
[-1, 1, nn.Upsample, [None, 2, 'nearest']], ## 17
[[-1, 8], 1, Concat, [1]], # cat backbone P4 ## 18
[-1, 1, Bottleneck, [512, False]], ## 19
[-1, 1, Bottleneck, [512, False]], ## 20
[-1, 1, Conv, [256, 1, 1]], ## 21
[-1, 1, Conv, [512, 3, 1]], # 22 (P4/16-medium) ## 22
[-2, 1, Conv, [128, 1, 1]], ## 23
[-1, 1, nn.Upsample, [None, 2, 'nearest']], ## 24
[[-1, 6], 1, Concat, [1]], # cat backbone P3 ## 25
[-1, 1, Bottleneck, [256, False]], ## 26
[-1, 2, Bottleneck, [256, False]], # 27 (P3/8-small) ## 27
[[27, 22, 15], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5)
]
一开始看一头雾水,然后耐下心结合代码看还是很清晰的。
要注意# [from, number, module, args]。
from是从哪里接,-1就是代表上一层,-2就是上上层,具体数字就是具体哪一层。
层数就是我后面注释的##部分数字,就是从0排下来的。
number就是重复来几次,8, Bottleneck就是重复8次Bottleneck,和resnet里面的残差类似。
args就是module的参数。
解析yolov3.yaml代码如下:
def parse_model(d, ch): # model_dict, input_channels(3)
logger.info('\n%3s%18s%3s%10s %-40s%-30s' % ('', 'from', 'n', 'params', 'module', 'arguments'))
anchors, nc, gd, gw = d['anchors'], d['nc'], d['depth_multiple'], d['width_multiple']
na = (len(anchors[0]) // 2) if isinstance(anchors, list) else anchors # number of anchors
no = na * (nc + 5) # number of outputs = anchors * (classes + 5)
layers, save, c2 = [], [], ch[-1] # layers, savelist, ch out
# tmp_1 = d['backbone'] + d['head']
for i, (f, n, m, args) in enumerate(d['backbone'] + d['head']): # from, number, module, args
m = eval(m) if isinstance(m, str) else m # eval strings
for j, a in enumerate(args):
try:
args[j] = eval(a) if isinstance(a, str) else a # eval strings
except:
pass
n = max(round(n * gd), 1) if n > 1 else n # depth gain
if m in [Conv, GhostConv, Bottleneck, GhostBottleneck, SPP, DWConv, MixConv2d, Focus, CrossConv, BottleneckCSP,
C3, C3TR]:
c1, c2 = ch[f], args[0]
if c2 != no: # if not output
c2 = make_divisible(c2 * gw, 8)
args = [c1, c2, *args[1:]]
if m in [BottleneckCSP, C3, C3TR]:
args.insert(2, n) # number of repeats
n = 1
elif m is nn.BatchNorm2d:
args = [ch[f]]
elif m is Concat:
c2 = sum([ch[x] for x in f])
elif m is Detect:
args.append([ch[x] for x in f])
if isinstance(args[1], int): # number of anchors
args[1] = [list(range(args[1] * 2))] * len(f)
elif m is Contract:
c2 = ch[f] * args[0] ** 2
elif m is Expand:
c2 = ch[f] // args[0] ** 2
else:
c2 = ch[f]
m_ = nn.Sequential(*[m(*args) for _ in range(n)]) if n > 1 else m(*args) # module
t = str(m)[8:-2].replace('__main__.', '') # module type
np = sum([x.numel() for x in m_.parameters()]) # number params
m_.i, m_.f, m_.type, m_.np = i, f, t, np # attach index, 'from' index, type, number params
logger.info('%3s%18s%3s%10.0f %-40s%-30s' % (i, f, n, np, t, args)) # print
save.extend(x % i for x in ([f] if isinstance(f, int) else f) if x != -1) # append to savelist
# if len(save) != 0:
# ii = i
# tmp = -2 % i # -2 % 16 =14
# aa = 0
layers.append(m_)
if i == 0:
ch = []
ch.append(c2)
return nn.Sequential(*layers), sorted(save)
save保存了需要保存的feature map的序号。
https://blog.csdn.net/dz4543/article/details/90049377
上面这张图大体显示了yolov3的网络,只不过它输入是256大小的。我列出了640大小的数据流表格如下:
作者在3条预测支路采用的也是全卷积的结构,其中最后一个卷积层的卷积核个数是255,是针对COCO数据集的80类:3*(80+4+1)=255,3表示一个grid cell包含3个bounding box,4表示框的4个坐标信息,1表示objectness score。
for i in range(self.nl):
x[i] = self.m[i](x[i]) # conv
bs, _, ny, nx = x[i].shape # x(bs,255,20,20) to x(bs,3,20,20,85)
x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()
[bs,255,80,80] ---> [bs,3,85,80,80] ---> [bs,3,80,80,85]
[bs,255,40,40] ---> [bs,3,85,40,40] ---> [bs,3,40,40,85]
[bs,255,20,20] ---> [bs,3,85,20,20] ---> [bs,3,20,20,85]
in_num | out_num | k | s | out_shape |
---|---|---|---|---|
input | ||||
backbone | 0 | Conv | 3 | 32 |
1 | Conv | 32 | 64 | |
2 | Bottleneck(×1) | 64 | 64 | |
3 | Conv | 64 | 128 | |
4 | Bottleneck(×2) | 128 | 128 | |
5 | Conv | 128 | 256 | |
6 | Bottleneck(×8) | 256 | 256 | |
7 | Conv | 256 | 512 | |
8 | Bottleneck(×8) | 512 | 512 | |
9 | Conv | 512 | 1024 | |
10 | Bottleneck(×4) | 1024 | 1024 | |
head | 11 | Bottleneck(×1) | 1024 | 1024 |
12 | Conv | 1024 | 512 | |
13 | Conv | 512 | 1024 | |
14 | Conv | 1024 | 512 | |
15 | Conv | 512 | 1024 | |
head | 16 | [-2]Conv | 512 | 256 |
17 | nn.Upsample | 256 | 256 | |
18 | [-1,8]Concat | [256,40,40] + [512,40,40] | ||
19 | Bottleneck(×1) | 768 | 512 | |
20 | Bottleneck(×1) | 512 | 512 | |
21 | Conv | 512 | 256 | |
22 | Conv | 265 | 512 | |
head | 23 | [-2]Conv | 256 | 128 |
24 | nn.Upsample | 128 | 128 | |
25 | [-1,6]Concat | [128,80,80] + [256,80,80] | ||
26 | Bottleneck(×1) | 384 | 256 | |
27 | Bottleneck(×2) | 256 | 256 | |
Detect | 28 | [27]Conv | 256 | 255 |
[22]Conv | 512 | 255 | ||
[15]Conv | 1024 | 255 |
网上还有一个网络图,随便看看:
好记性不如烂键盘---点滴、积累、进步!