pytorch_SSD300

SSD 网络结构图

原文采用了VGG16作为基础网络,也可以选择其他网络,比如RestNet50等

  • '300': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'C', 512, 512, 512, 'M',512, 512, 512],

VGG 网络结构

  • input image = (300,300, 3)
  • conv1_1 (3, 64) + Relu feature map = (300,300, 64)
  • conv1_2 (64, 64) + Relu feature map = (300,300, 64)
  • MaxPool_1 feature map = (150,150, 64)
  • conv2_1 (64, 128) + Relu feature map = (150,150,128)
  • conv2_2 (128, 128) + Relu feature map = (150,150,128)
  • MaxPool_2 feature map = (75 ,75 ,128)
  • conv3_1 (128, 256)+ Relu feature map = (75 ,75 ,256)
  • conv3_2 (256, 256)+ Relu feature map = (75 ,75 ,256)
  • conv3_3 (256, 256)+ Relu feature map = (75 ,75 ,256)
  • MaxPool_3 feature map = (38 ,38 ,256) ceil_mode=True
  • conv4_1 (256, 512)+ Relu feature map = (38 ,38 ,512)
  • conv4_2 (512, 512)+ Relu feature map = (38 ,38 ,512) ------>>>>>>>Detecte
  • conv4_3 (512, 512)+ Relu feature map = (38 ,38 ,512)
  • MaxPool_4 feature map = (19 ,19 ,512)
  • conv5_1 (512, 512)+ Relu feature map = (19 ,19 ,512)
  • conv5_2 (512, 512)+ Relu feature map = (19 ,19 ,512)
  • conv5_3 (512, 512)+ Relu feature map = (19 ,19 ,512)
  • MaxPool_5 feature map = (19 ,19 ,512) 不改变大小
  • conv6 (512, 1024)+ Relu feature map = (19 ,19 ,1024)
  • conv7 (1024,1024)+ Relu feature map = (19 ,19 ,1024) ------>>>>>>>Detecte

Extras 网络结构

  • cfg --> '300': [256, 'S', 512, 128, 'S', 256, 128, 256, 128, 256]

  • conv8_1 (1024,256) feature map = (19 ,19 ,256)

  • conv8_2 (256, 512) feature map = (10 ,10 ,512) ------>>>>>>>Detecte

  • conv9_1 (512, 128) feature map = (10 ,10 ,128)

  • conv9_2 (128, 256) feature map = (5 , 5 ,256) ------>>>>>>>Detecte

  • conv10_1 (256, 128) feature map = (5 , 5 ,128)

  • conv10_2 (128, 256) feature map = (3 , 3 ,256) ------>>>>>>>Detecte

  • conv11_1 (256, 128) feature map = (3 , 3 ,128)

  • conv11_2 (128, 256) feature map = (1 , 1 ,256) ------>>>>>>>Detecte

Prior Box 的计算

至此 两个正方形的大小已经确定 另外两个矩形根据以下选取

中心点怎么选取 最后需要乘以原图大小*300得到原图位置

损失函数 分类损失 和 定位损失

Smooth L1 loss

  • 对于下图公式中的g和l,对应在代码中分别代表已经encode完成的offset。(g代表一对匹配的真实框和默认框的offset,对应代码中的loc_t,l代表预测框和默认框之间的offset,对应代码中的loc_p)。

import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(0.01, 2, 100) 
y = np.log10(x)
y2 = x-1
# y = np.log(x)

# 移动坐标轴 ax 就是整张图
ax = plt.gca()                      #获取整张图
ax.spines['right'].set_color('none')#右边的轴设置成没有
ax.spines['top'].set_color('none')
ax.xaxis.set_ticks_position('bottom')
# 设置x轴的位置在 y = 0处
ax.spines['bottom'].set_position(('data',0))
ax.spines['left'].set_position(('data',0))
plt.plot(x,y)
# plt.plot(x,y2)
plt.title('log')
plt.show()

Hard Negative Mining(难例挖掘)

训练过程

  • 在VOC数据集上进行训练 16551张图片 迭代10000次 约20次Epoch

测试精度 mAP in VOC

batch_size=64 之后精度有所提升

【参考文章】

posted on 2020-07-17 16:05  wangxiaobei2019  阅读(589)  评论(0编辑  收藏  举报

导航