@InProceedings{
author = {Qijie Zhao, Tao Sheng1,Yongtao Wang, Zhi Tang, Ying Chen, Ling Cai and Haibin Ling},
title = {M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network},
booktitle = {The AAAI Conference on Artificial Intelligence},
year = {2018}
}
Problem
Two strategies to solve the Scale Variation.
- 使用image pyramid at the testing time,增加memory, computational complexity
- 使用feature pyramid at both training and testing phases, more efficient
Limitation
simply construct the feature pyramid according to the inherent multi-scale.
pyramidal architecture of the backbones which are actually designed for object classification task.
分类网络提取特征对检测任务的表达能力不够
金字塔每层特征主要由backbone的single-level layers构建,语义信息不足
高层特征:分类、simple appearances。低层特征:位置回归、complex appearances。
Solution
提出新的构建feature pyramid的方法来提取feature,Multi-Level Feature Pyramid Network.
金字塔的每层feature map都由来自多个level的layers构建。6 scales and 8 levels
FFMv1 enriches semantic information into base features by fusing feature maps of the backbone.
Each TUM generates a group of multi-scale features, and then the alternating joint TUMs and FFMv2s extract multi-level multiscale features.
SFAM aggregates the features into the multi-level feature pyramid through a scale-wise feature concatenation operation and an adaptive attention mechanism. 将TUMs生成的多级多尺度特征集合成一个多级特征金字塔
At the detection stage, 每个pyramid feature后接一个location regression conv + 一个classification conv。Feature map 每个pixel有6 anchors with 3 ratios。最后soft-NMS。
Discussion
- detection accuracy improvement of M2Det is mainly brought by the proposed MLFPN.
(1) 使用更深层feature构建金字塔,more representative
(2) 用于最后detection的每种scale 的feature map都由多个层来构建, better for handling appearance-complexity variation across object instances.
It learns very effective features to handle scale variation and appearance-complexity variation across object instances;
It is necessary to use multi-level features to detect objects with similar size - 虽然结构花哨,尤其是TUM,但speed不低。TUM轻量
- 增加TUM的个数比增加TUM的channel都能带来的提升,但增TUM个数更有效