图像处理4 非极大值抑制(Non-Maximum Suppression,NMS)



非极大值抑制(Non-Maximum Suppression,NMS),顾名思义就是抑制不是极大值的元素,可以理解为局部最大搜索。这个局部代表的是一个邻域,邻域有两个参数可变,一是邻域的维数,二是邻域的大小。这里不讨论通用的NMS算法(参考论文《Efficient Non-Maximum Suppression》对1维和2维数据的NMS实现),而是用于目标检测中提取分数最高的窗口的。例如在行人检测中,滑动窗口经提取特征,经分类器分类识别后,每个窗口都会得到一个分数。但是滑动窗口会导致很多窗口与其他窗口存在包含或者大部分交叉的情况。这时就需要用到NMS来选取那些邻域里分数最高(是行人的概率最大),并且抑制那些分数低的窗口。


对于Bounding Box的列表B及其对应的置信度S,采用下面的计算方式.选择具有最大score的检测框M,将其从B集合中移除并加入到最终的检测结果D中.通常将B中剩余检测框中与M的IoU大于阈值Nt的框从B中移除.重复这个过程,直到B为空.

常用的重叠率(重叠区域面积比例IOU)阈值是 0.3 ~ 0.5.

非极大值抑制的方法是:先假设有6个矩形框,根据分类器的类别分类概率做排序,假设从小到大属于车辆的概率 分别为A、B、C、D、E、F。






实现部分有一篇很好的教程——《Non-Maximum Suppression for Object Detection in Python》,介绍了Felzenszwalb的NMS代码和Malisiewicz的快速版代码。




import numpy as np
import cv2
from time import time

#  Felzenszwalb et al.
def non_max_suppression_slow(boxes, overlapThresh):
    # if there are no boxes, return an empty list
    if len(boxes) == 0:
        return []

    # initialize the list of picked indexes
    pick = []

    # grab the coordinates of the bounding boxes
    x1 = boxes[:,0]
    y1 = boxes[:,1]
    x2 = boxes[:,2]
    y2 = boxes[:,3]

    # compute the area of the bounding boxes and sort the bounding
    # boxes by the bottom-right y-coordinate of the bounding box
    area = (x2 - x1 + 1) * (y2 - y1 + 1)
    idxs = np.argsort(y2)

    # keep looping while some indexes still remain in the indexes
    # list
    while len(idxs) > 0:
        # grab the last index in the indexes list, add the index
        # value to the list of picked indexes, then initialize
        # the suppression list (i.e. indexes that will be deleted)
        # using the last index
        last = len(idxs) - 1
        i = idxs[last]
        suppress = [last]

        # loop over all indexes in the indexes list
        for pos in xrange(0, last):
            # grab the current index
            j = idxs[pos]

            # find the largest (x, y) coordinates for the start of
            # the bounding box and the smallest (x, y) coordinates
            # for the end of the bounding box
            xx1 = max(x1[i], x1[j])
            yy1 = max(y1[i], y1[j])
            xx2 = min(x2[i], x2[j])
            yy2 = min(y2[i], y2[j])

            # compute the width and height of the bounding box
            w = max(0, xx2 - xx1 + 1)
            h = max(0, yy2 - yy1 + 1)

            # compute the ratio of overlap between the computed
            # bounding box and the bounding box in the area list
            overlap = float(w * h) / area[j]

            # if there is sufficient overlap, suppress the
            # current bounding box
            if overlap > overlapThresh:

        # delete all indexes from the index list that are in the
        # suppression list
        idxs = np.delete(idxs, suppress)

    # return only the bounding boxes that were picked
    return boxes[pick]

# Malisiewicz et al.
def non_max_suppression_fast(boxes, overlapThresh):
    # if there are no boxes, return an empty list
    if len(boxes) == 0:
        return []

    # if the bounding boxes integers, convert them to floats --
    # this is important since we'll be doing a bunch of divisions
    if boxes.dtype.kind == "i":
        boxes = boxes.astype("float")

    # initialize the list of picked indexes
    pick = []

    # grab the coordinates of the bounding boxes
    x1 = boxes[:, 0]
    y1 = boxes[:, 1]
    x2 = boxes[:, 2]
    y2 = boxes[:, 3]

    # compute the area of the bounding boxes and sort the bounding
    # boxes by the bottom-right y-coordinate of the bounding box
    area = (x2 - x1 + 1) * (y2 - y1 + 1)
    idxs = np.argsort(y2)

    # keep looping while some indexes still remain in the indexes
    # list
    while len(idxs) > 0:
        # grab the last index in the indexes list and add the
        # index value to the list of picked indexes
        last = len(idxs) - 1
        i = idxs[last]

        # find the largest (x, y) coordinates for the start of
        # the bounding box and the smallest (x, y) coordinates
        # for the end of the bounding box
        xx1 = np.maximum(x1[i], x1[idxs[:last]])
        yy1 = np.maximum(y1[i], y1[idxs[:last]])
        xx2 = np.minimum(x2[i], x2[idxs[:last]])
        yy2 = np.minimum(y2[i], y2[idxs[:last]])

        # compute the width and height of the bounding box
        w = np.maximum(0, xx2 - xx1 + 1)
        h = np.maximum(0, yy2 - yy1 + 1)

        # compute the ratio of overlap
        overlap = (w * h) / area[idxs[:last]]

        # delete all indexes from the index list that have
        idxs = np.delete(idxs, np.concatenate(([last],
                                               np.where(overlap > overlapThresh)[0])))

    # return only the bounding boxes that were picked using the
    # integer data type
    return boxes[pick].astype("int")

# construct a list containing the images that will be examined
# along with their respective bounding boxes
images = [
    ("images/audrey.jpg", np.array([
    (12, 84, 140, 212),
    (24, 84, 152, 212),
    (36, 84, 164, 212),
    (12, 96, 140, 224),
    (24, 96, 152, 224),
    (24, 108, 152, 236)])),
    ("images/bksomels.jpg", np.array([
    (114, 60, 178, 124),
    (120, 60, 184, 124),
    (114, 66, 178, 130)])),
    ("images/gpripe.jpg", np.array([
    (12, 30, 76, 94),
    (12, 36, 76, 100),
    (72, 36, 200, 164),
    (84, 48, 212, 176)]))]

# loop over the images
for (imagePath, boundingBoxes) in images:
    # load the image and clone it
    print "[x] %d initial bounding boxes" % (len(boundingBoxes))
    image = cv2.imread(imagePath)
    orig = image.copy()

    # loop over the bounding boxes for each image and draw them
    for (startX, startY, endX, endY) in boundingBoxes:
        cv2.rectangle(orig, (startX, startY), (endX, endY), (0, 0, 255), 2)

    # perform non-maximum suppression on the bounding boxes
    pick = non_max_suppression_slow(boundingBoxes, 0.3)
    print "[x] after applying slow non-maximum, %d bounding boxes, cost time %ss" % (len(pick),time()-start_time)

    # loop over the picked bounding boxes and draw them
    for (startX, startY, endX, endY) in pick:
        cv2.rectangle(image, (startX, startY), (endX, endY), (0, 255, 0), 2)

    # perform non-maximum suppression on the bounding boxes
    start_time = time()
    pick = non_max_suppression_fast(boundingBoxes, 0.3)
    print "[x] after applying fast non-maximum, %d bounding boxes, cost time %ss" % (len(pick),time()-start_time)

    # loop over the picked bounding boxes and draw them
    for (startX, startY, endX, endY) in pick:
        cv2.rectangle(orig1, (startX, startY), (endX, endY), (0, 255, 0), 2)

    # display the images
    cv2.imshow("Original", orig)
    cv2.imshow("After slow NMS", image)
    cv2.imshow("After fast NMS", orig1)



NMS loss





参考论文《Rotated Region Based CNN for Ship Detection》(IEEE2017会议论文)的Multi-task for NMS部分。


上述NMS算法的一个主要问题是当两个ground truth的目标的确重叠度很高时,NMS会将具有较低置信度的框去掉(置信度改成0),参见下图所示.



论文:《Improving Object Detection With One Line of Code




  1. 线性函数




  2. 高斯函数




soft-NMS python代码实现:

ua = float((tx2 - tx1 + 1) * (ty2 - ty1 + 1) + area - iw * ih)
ov = iw * ih / ua #iou between max box and detection box

if method == 1: # linear
    if ov > Nt: 
        weight = 1 - ov
        weight = 1
elif method == 2: # gaussian
    weight = np.exp(-(ov * ov)/sigma)
else: # original NMS
    if ov > Nt: 
        weight = 0
        weight = 1
# re-scoring 修改置信度
boxes[pos, 4] = weight*boxes[pos, 4]


Caffe C++ 版实现: makefile/frcnn


 training datatesting datamAPmAP@0.5mAP@0.75mAP@SmAP@MmAP@LRecall
Baseline D-R-FCN coco trainval coco test-dev 35.7 56.8 38.3 15.2 38.8 51.5  
D-R-FCN, ResNet-v1-101, NMS coco trainval coco test-dev 37.4 59.6 40.2 17.8 40.6 51.4 48.3
D-R-FCN, ResNet-v1-101, SNMS coco trainval coco test-dev 38.4 60.1 41.6 18.5 41.6 52.5 53.8
D-R-FCN, ResNet-v1-101, MST, NMS coco trainval coco test-dev 39.8 62.4 43.3 22.6 42.3 52.2 52.9
D-R-FCN, ResNet-v1-101, MST, SNMS coco trainval coco test-dev 40.9 62.8 45.0 23.3 43.6 53.3 60.4





在R-FCN以及Faster-RCNN模型中的测试阶段运用Soft-NMS,在MS-COCO数据集上mAP@[0.5:0.95]能够获得大约1%的提升(详见这里). 如果应用到训练阶段的proposal选取过程理论上也能获得提升. 在自己的实验中发现确实对易重叠的目标类型有提高(目标不一定真的有像素上的重叠,切斜的目标的矩形边框会有较大的重叠).

