足迹

能看不尽景,始是不凡人

 

深度学习专题 - 计算机视觉中的目标检测

深度学习专题 - 计算机视觉中的目标检测

 

姚伟峰
[yaoweifeng0301@126.com]
http://www.cnblogs.com/Matrix_Yao/

量变引起质变,质变又引起新的量变。
最佳设计都是折衷的艺术。

目标检测解决什么问题

Alt text
目标检测从图像中找出所有感兴趣的目标的instances (detecting instances of semantic objects of certain classes),并用bounding box框住它们。
从上图可见,目标检测场景有三个特点:

  • 多目标
  • 在哪里? -> 定位
  • 是什么? -> 分类

目标检测的评价标准

Accuracy

mAP (mean Average Precision)

Alt text
对每个类:

对所有类:

 

Speed

Throughput

  • fps: frames per second

Latency

  • how long it takes to process one image

目标检测怎么做

Two Shots Solution: Region Proposal and then Classification

Alt text

Viola-Jones Detector

  • Year
    • 2000
  • Author

Alt text

  • Main Idea
    • Haar-like 特征是一个表达能力很强的人脸描述子。
    • Haar-like特征可以用积分图来共享计算,大大加快特征提取速度。
    • Cascade逐级过滤的想法大大降低了分类时间。

Alt text

Alt text

  • How
    • Framework Alt text
    • Scale-invariance is from same sliding window on multiple scale image
  • Pros & Cons
    • Pros
      • 天下武功,唯快不破!
      • 人脸识别第一次走出实验室,可以工业化应用
    • Cons
      • 只适用于刚体, 如人脸,汽车……
      • 特征普适性不强。要么给一个普适的特征,要么给一个普适的特征生成方法。

Deformable Parts Models(DPM)

  • Year
    • 2009
  • Author
    • Ross Girshick

Alt text

  • Main Idea
    使用Pictorial Structures Framework解决非刚体的检测问题。

Alt text

  • How

Alt text

  • Pros & Cons
    • Pros
      • 解决了非刚体问题, 准确率更高。
    • Cons
      • 慢了。
      • 还是没解决普适特征的问题。

RCNN

  • Year
    • 2014.10
  • Author
    • Ross Girshick
  • Main Idea

Alt text

  • How

Alt text

  • Pros & Cons
    • Pros
      • 引入CNN解决普适特征的问题
    • Cons
      • Multi-stage: fine-tune CNN + SVM training + bounding box regression
      • Disk starving: feature file need store to disk firstly
      • Slow while inference: redundant feature extraction per ROI, VGG-16 need 47s w/ GPU

SPP-Net (Spatial Pyramid Pooling)

  • Year
    • 2015.4
  • Author
    • He Kaiming

Alt text

  • Main Idea

Alt text

Alt text

  • How

Alt text

  • Pros & Cons
    • Pros
      • Faster: introduce SPP to share feature extraction across ROIs
    • Cons
      • 没解决RCNN的另外两个问题
      • SPP-NET在微调网络的时候固定了卷积层,只对全连接层进行fine-tune,而对于一个新的任务,有必要对卷积层也进行微调。(分类的模型提取的特征更注重高层语义,而目标检测任务除了语义信息还需要目标的位置信息)

Fast RCNN

  • Year
    • 2015.9
  • Author
    • Ross Girshick
  • Main Idea

Alt text

  • How
    • inherit SPP‘s ROI pooling idea w/ single scale

Alt text

  • Pros & Cons
    • Pros
      • joint optimize classification & regressionAlt text
    • Cons
      • Now, speed bottleneck is in selective search. [WHY? People say because it’s done by CPU, :( ]

Faster RCNN - 里程碑式的工作

  • Year
    • 2016.6
  • Author
    • Ke Kaiming
  • Main Idea
    • 引入anchor的概念

Alt text

  • How

Alt text

  • Pros & Cons
    • Pros
      • End-2-end solution makes faster
      • Less region proposal makes faster (~2k -> ~300)

R-FCN

  • Year
    • 2016.1
  • Author
    • Ke Kaiming (MSRA)
    • Ross Girshick (MSRA)
  • Main Idea
    • Not only share computation in image level, but also share computation across ROIs
    • Propose position-sensitive ROI pooling to attain this target.(both share computation and translation variant.)

Alt text

  • How

Alt text

  • Pros & Cons
    • fatser at the cost of accuracy

Mask R-CNN - New Baseline

  • Year
    • 2017.3
  • Author
    • Ke Kaiming (FAIR)
    • Ross Girshick (FAIR)
  • Main Idea
    • Combine detection + segmentation into one
    • Use skeleton of faster-rcnn
    • Introduce ROIAlign layer to mitigate the misalignment issue brought by quantization in Faster-RCNN
    • Parallelize BB regression and classification to accelerate speed
    • Use binary Sigmoid rather than SoftMax while segmentation.

Alt text

  • How

Alt text

  • Pros & Cons
    • Faster
    • More Accurate

Single Shot Solution

分类和回归一勺烩了。

YOLO (You Only Look Once)

  • Year
    • 2016
  • Author
    • Joseph Redmon
    • Ali Farhadi
    • Ross Girshick
  • Main Idea

Alt text

  • How
    • 图像缩放为448x448
    • 把图像划分为7x7 grid
    • 对每个grid预测两个BB的位置和类别
    • 用NMS去冗余

Alt text

  • Pros & Cons
    • Pros
      • FPS: 45 !!!
    • Cons
      • recall低, 最多只能检测49个目标
      • 小目标比较难搞定

SSD (Single-Shot Multi-Box Detector)

  • Year
    • 2016
  • Author
    • Wei Liu
  • Main Idea
    • 用局部卷积patch而不是全图信息(YOLO)去回归BB位置
    • 重新引入anchor box的概念
    • multi-scale,对小目标是个福音
  • How

Alt text

  • Pros & Cons
    • N/A

YOLO-v2

  • Year
    • 2016.12
  • Author
    • Joseph Redmon
    • Ali Farhadi
  • Main Idea
    • Put all prior best-practice together
  • How
    • Batch Normalization - from GoogleNet
    • High Resolution Classifier, fine-tune classifier network to 448x448 images
    • Convoluiotnal with Anchor Boxes - from SSD
    • Dimension Clusters while choosing bounding boxes - k-means idea
    • Hierarchical classification
  • Pros & Cons

[参考文献]

  1. Corresponding papers
  2. http://blog.csdn.net/g110802008/article/details/52611956
  3. https://zhuanlan.zhihu.com/p/21412911?refer=dlclass
  4. http://www.tuicool.com/wx/YZ3uAze?from=groupmessage&isappinstalled=0
  5. http://blog.csdn.net/standing_on_giant/article/details/60333329
  6. http://homepages.inf.ed.ac.uk/ckiw/postscript/ijcv_voc09.pdf
  7. http://www.qingpingshan.com/bc/jsp/163450.html
  8. http://blog.csdn.net/garfielder007/article/details/51867343
  9. http://www.tuicool.com/articles/eEfm2mv
  10. http://blog.csdn.net/wy250229163/article/details/56837189

posted on 2018-04-27 06:22  姚伟峰  阅读(817)  评论(0编辑  收藏  举报

导航