深度学习专题 - 计算机视觉中的目标检测 - 姚伟峰

深度学习专题 - 计算机视觉中的目标检测

姚伟峰
[yaoweifeng0301@126.com]
http://www.cnblogs.com/Matrix_Yao/

量变引起质变，质变又引起新的量变。
最佳设计都是折衷的艺术。

目标检测解决什么问题

Alt text
目标检测从图像中找出所有感兴趣的目标的instances (detecting instances of semantic objects of certain classes)，并用bounding box框住它们。
从上图可见，目标检测场景有三个特点：

多目标
在哪里？ -> 定位
是什么？ -> 分类

目标检测的评价标准

Accuracy

mAP (mean Average Precision)

Alt text
对每个类：

对所有类：

Speed

Throughput

fps: frames per second

Latency

how long it takes to process one image

目标检测怎么做

Two Shots Solution: Region Proposal and then Classification

Alt text

Viola-Jones Detector

Year
- 2000
Author

Alt text

Main Idea
- Haar-like 特征是一个表达能力很强的人脸描述子。
- Haar-like特征可以用积分图来共享计算，大大加快特征提取速度。
- Cascade逐级过滤的想法大大降低了分类时间。

Alt text

How
- Framework
- Scale-invariance is from same sliding window on multiple scale image
Pros & Cons
- Pros
  - 天下武功，唯快不破！
  - 人脸识别第一次走出实验室，可以工业化应用
- Cons
  - 只适用于刚体, 如人脸，汽车……
  - 特征普适性不强。要么给一个普适的特征，要么给一个普适的特征生成方法。

Deformable Parts Models(DPM)

Year
- 2009
Author
- Ross Girshick

Alt text

Main Idea
使用Pictorial Structures Framework解决非刚体的检测问题。

Alt text

Pros & Cons
- Pros
  - 解决了非刚体问题, 准确率更高。
- Cons
  - 慢了。
  - 还是没解决普适特征的问题。

RCNN

Year
- 2014.10
Author
- Ross Girshick
Main Idea

Alt text

Pros & Cons
- Pros
  - 引入CNN解决普适特征的问题
- Cons
  - Multi-stage: fine-tune CNN + SVM training + bounding box regression
  - Disk starving: feature file need store to disk firstly
  - Slow while inference: redundant feature extraction per ROI, VGG-16 need 47s w/ GPU

SPP-Net (Spatial Pyramid Pooling)

Year
- 2015.4
Author
- He Kaiming

Alt text

Main Idea

Alt text

Pros & Cons
- Pros
  - Faster: introduce SPP to share feature extraction across ROIs
- Cons
  - 没解决RCNN的另外两个问题
  - SPP-NET在微调网络的时候固定了卷积层，只对全连接层进行fine-tune，而对于一个新的任务，有必要对卷积层也进行微调。（分类的模型提取的特征更注重高层语义，而目标检测任务除了语义信息还需要目标的位置信息）

Fast RCNN

Year
- 2015.9
Author
- Ross Girshick
Main Idea

Alt text

How
- inherit SPP‘s ROI pooling idea w/ single scale

Alt text

Pros & Cons
- Pros
  - joint optimize classification & regression
- Cons
  - Now, speed bottleneck is in selective search. [WHY? People say because it’s done by CPU, :( ]

Faster RCNN - 里程碑式的工作

Year
- 2016.6
Author
- Ke Kaiming
Main Idea
- 引入anchor的概念

Alt text

Pros & Cons
- Pros
  - End-2-end solution makes faster
  - Less region proposal makes faster (~2k -> ~300)

R-FCN

Year
- 2016.1
Author
- Ke Kaiming (MSRA)
- Ross Girshick (MSRA)
Main Idea
- Not only share computation in image level, but also share computation across ROIs
- Propose position-sensitive ROI pooling to attain this target.（both share computation and translation variant.）

Alt text

Pros & Cons
- fatser at the cost of accuracy

Mask R-CNN - New Baseline

Year
- 2017.3
Author
- Ke Kaiming (FAIR)
- Ross Girshick (FAIR)
Main Idea
- Combine detection + segmentation into one
- Use skeleton of faster-rcnn
- Introduce ROIAlign layer to mitigate the misalignment issue brought by quantization in Faster-RCNN
- Parallelize BB regression and classification to accelerate speed
- Use binary Sigmoid rather than SoftMax while segmentation.

Alt text

Pros & Cons
- Faster
- More Accurate

Single Shot Solution

分类和回归一勺烩了。

YOLO (You Only Look Once)

Year
- 2016
Author
- Joseph Redmon
- Ali Farhadi
- Ross Girshick
Main Idea

Alt text

How
- 图像缩放为448x448
- 把图像划分为7x7 grid
- 对每个grid预测两个BB的位置和类别
- 用NMS去冗余

Alt text

Pros & Cons
- Pros
  - FPS: 45 !!!
- Cons
  - recall低，最多只能检测49个目标
  - 小目标比较难搞定

SSD (Single-Shot Multi-Box Detector)

Year
- 2016
Author
- Wei Liu
Main Idea
- 用局部卷积patch而不是全图信息（YOLO）去回归BB位置
- 重新引入anchor box的概念
- multi-scale，对小目标是个福音
How

Alt text

Pros & Cons
- N/A

YOLO-v2

Year
- 2016.12
Author
- Joseph Redmon
- Ali Farhadi
Main Idea
- Put all prior best-practice together
How
- Batch Normalization - from GoogleNet
- High Resolution Classifier, fine-tune classifier network to 448x448 images
- Convoluiotnal with Anchor Boxes - from SSD
- Dimension Clusters while choosing bounding boxes - k-means idea
- Hierarchical classification
Pros & Cons

[参考文献]

posted on 2018-04-27 06:22 姚伟峰阅读(817) 评论(0) 编辑收藏举报

刷新页面返回顶部

足迹

深度学习专题 - 计算机视觉中的目标检测

目标检测解决什么问题

目标检测的评价标准

Accuracy

mAP (mean Average Precision)

Speed

Throughput

Latency

目标检测怎么做

Two Shots Solution: Region Proposal and then Classification

Viola-Jones Detector

Deformable Parts Models(DPM)

RCNN

SPP-Net (Spatial Pyramid Pooling)

Fast RCNN

Faster RCNN - 里程碑式的工作

R-FCN

Mask R-CNN - New Baseline

Single Shot Solution

YOLO (You Only Look Once)

SSD (Single-Shot Multi-Box Detector)

YOLO-v2

导航

公告