CornerNet: Detecting Objects as Paired Keypoints

We propose CornerNet, a new approach to object detection where we detect an object bounding box as a pair of keypoints, the top-left corner and the bottom-right corner, using a single convolution neural network

Drawbacks of Anchors boxes

1. A very large set of anchor boxes lead to huge imbalance between positive and negative
  1. how many boxes, what sizes, and what aspect ratios


We detect an object as a pair of keypoints—the top-left corner and bottom-right corner of the bounding box. We use a single convolutional network to predict a heatmap for the top-left corners of all instances of the same object category, a heatmap for all bottom-right corners, and an embedding vector for each detected corner. The embeddings serve to group a pair of corners that belong to the same object

keypoint detect and keypoint group

Three main problem:

  1. How to detect keypoint?
  2. How to group keypoint?
  3. A corner of a bounding box is often outside the object, How to improve the performens?

Detecting Corners

**Backbone**: Hourglass network or other networks for human pose estimation, in this paper is Hourglass.

Output: Two sets of heatmaps, one for top-left corners and one for bottom-right corners. Each set of heatmaps has C channels, where C is the number of categories

Loss: Instead of equally penalizing negative locations, we reduce the penalty given to negative locations within a radius of the positive location. We determine the radius by the size of an object by ensuring that a pair of points within the radius would generate a bounding box with at least 0.7 IoU with the ground-truth annotation

predict offset: A location \(\left ( x,y \right )\) in the image is mapped to the location \(\left ( \left [ \frac{x}{n} \right ],\left [ \frac{y}{n} \right ] \right )\) in the heatmaps, we predict location offsets to slightly adjust the corner locations before remapping them to the input resolution.

Grouping Corners

Multiple objects may appear in an image, and thus multiple top-left and bottom-right corners may be detected. We need to determine if a pair of the top-left corner and bottom-right corner is from the same bounding box.

The network predicts an embedding vector for each detected corner

if top-left and bottom-right belong to the same bounding box, the distance between their embeddings should be small, otherwise should be large.

"push" and "pull" loss

Corner Pooling

There is often no local visual evidence for the presence of corners, we propose corner pooling to better localize the corners by encoding explicit prior knowledge. For example, top-left corner pooling

Finally CornerNet


Effectiveness of corner pooling

Effectiveness of Reducing penalty to negative locations

posted on 2018-08-21 15:23  xiongzihua  阅读(888)  评论(0编辑  收藏  举报