#track#: MXNet中faster R-CNN (2),proposal Op与ground truth 的处理

突然不知rcnn怎么操作的了,花些时间看看程序,mxnet非官方正式版的程序看起来很紧致,nice。主要关注end2end模式,有了些新的发现:

proposal Op

proposal Op后向没有实质操作,

1. 实质性的操作发生在上一步,该步中进行了逐点损失函数的建设(具体来说应该是在上步的loss中调中的ground truth 机制),可以想见,为每个点分配ground truth是一项关键步骤;

2. proposal Op中进行之前所说的IoU相关的操作,仅是整合,没有后向的条件。但会有新的问题:整合的机理应该和上一步的机理保持一定的协调性;

ground truth

先把看板抬出来:

group = mx.symbol.Custom(rois=rois, gt_boxes=gt_boxes_reshape, op_type='proposal_target',
                             num_classes=num_classes, batch_images=config.TRAIN.BATCH_IMAGES,
                             batch_rois=config.TRAIN.BATCH_ROIS, fg_fraction=config.TRAIN.FG_FRACTION)
rois = group[0]
label = group[1]
bbox_target = group[2]      # with from logarithms
bbox_weight = group[3]

# Fast R-CNN
pool5 = mx.symbol.ROIPooling(
name='roi_pool5', data=relu5_3, rois=rois, pooled_size=(7, 7), spatial_scale=1.0 / config.RCNN_FEAT_STRIDE)
# group 6
flatten = mx.symbol.Flatten(data=pool5, name="flatten")
fc6 = mx.symbol.FullyConnected(data=flatten, num_hidden=4096, name="fc6")
relu6 = mx.symbol.Activation(data=fc6, act_type="relu", name="relu6")
drop6 = mx.symbol.Dropout(data=relu6, p=0.5, name="drop6")
# group 7
fc7 = mx.symbol.FullyConnected(data=drop6, num_hidden=4096, name="fc7")
relu7 = mx.symbol.Activation(data=fc7, act_type="relu", name="relu7")
drop7 = mx.symbol.Dropout(data=relu7, p=0.5, name="drop7")
# classification
cls_score = mx.symbol.FullyConnected(name='cls_score', data=drop7, num_hidden=num_classes)
cls_prob = mx.symbol.SoftmaxOutput(name='cls_prob', data=cls_score, label=label, normalization='batch')
# bounding box regression
bbox_pred = mx.symbol.FullyConnected(name='bbox_pred', data=drop7, num_hidden=num_classes * 4)
bbox_loss_ = bbox_weight * mx.symbol.smooth_l1(name='bbox_loss_', scalar=1.0, data=(bbox_pred - bbox_target))

proposal_target Op

之前还没注意到,在proposal Op之后还有这样的一个操作。这个操作考虑了这样一个问题:

如何为后续的(进一步)回归和分类器提供样本?

 
其实仔细想想这个问题还是关键的,要使两者配合紧密,就要提供上一步输出的真实预测,但又要考虑其标签的真实性。具体操作看起来是通过选取与gt交集合适的pred_bbx进行的。

bbox_pred

ROIPooling后,对 bbox_pred的回归的输入是什么?

 # bounding box regression 见上文程序
 bbox_pred = mx.symbol.FullyConnected(name='bbox_pred', data=drop7, num_hidden=num_classes * 4)
 bbox_loss_ = bbox_weight * mx.symbol.smooth_l1(name='bbox_loss_', scalar=1.0, data=(bbox_pred - bbox_target))

从程序来看,回归的输入源是ROIPooling后的,那么如果ROIPooling仅仅输出一个指定尺寸的featureMap,这似乎就有些不妥了(用于分类还可以),回归预测的是全局,输入怎么会是一个局部量?需要查看损失函数输入中bbox_target的由来。
经查看,再次的回归,是对前次预测的 误差 (即drop7)进行预测,路径如下:
proposal_target —>sample_rois —> bbox_transform (nonlinear_transform)

def nonlinear_transform(ex_rois,gt_rois):                                                                              
    assert ex_rois.shape[0] == gt_rois.shape[0], 'inconsistent rois number'
    ex_widths = ex_rois[:, 2] - ex_rois[:, 0] + 1.0
    ex_heights = ex_rois[:, 3] - ex_rois[:, 1] + 1.0
    ex_ctr_x = ex_rois[:, 0] + 0.5 * (ex_widths - 1.0)
    ex_ctr_y = ex_rois[:, 1] + 0.5 * (ex_heights - 1.0)
    gt_widths = gt_rois[:, 2] - gt_rois[:, 0] + 1.0
    gt_heights = gt_rois[:, 3] - gt_rois[:, 1] + 1.0
    gt_ctr_x = gt_rois[:, 0] + 0.5 * (gt_widths - 1.0)
    gt_ctr_y = gt_rois[:, 1] + 0.5 * (gt_heights - 1.0)

    targets_dx = (gt_ctr_x - ex_ctr_x) / (ex_widths + 1e-14)
    targets_dy = (gt_ctr_y - ex_ctr_y) / (ex_heights + 1e-14)
    targets_dw = np.log(gt_widths / ex_widths)
    targets_dh = np.log(gt_heights / ex_heights)

    targets = np.vstack(
                    (targets_dx, targets_dy, targets_dw, targets_dh)).transpose()
    return targets

bbox_target

前面提到了两个变量:
该看看逐点设置ground truth的操作了。
路径:
train_end2end—>AnchorLoader(创建iterator)

# rcnn/core/loader.py
...
if config.TRAIN.END2END:
    self.data_name = ['data', 'im_info', 'gt_boxes']
else:
    self.data_name = ['data']
    self.label_name = ['label', 'bbox_target', 'bbox_weight'] #  与 bbox_target 对应
    ...

########### gt_boxes 来自读取,来看 label 和 bbox_target   #####################

    label = assign_anchor(feat_shape, label['gt_boxes'], data['im_info'],
                             self.feat_stride, self.anchor_scales,
                                  self.anchor_ratios, self.allowed_border)

##############################################
########## 追踪 assign_anchor   ###############
##############################################
        if not config.TRAIN.RPN_CLOBBER_POSITIVES:
                  # assign bg labels first so that positive labels can clobber them
                   labels[max_overlaps < config.TRAIN.RPN_NEGATIVE_OVERLAP] = 0

              # fg label: for each gt, anchor with highest overlap
             labels[gt_argmax_overlaps] = 1


               labels[max_overlaps >= config.TRAIN.RPN_POSITIVE_OVERLAP] = 1

            if config.TRAIN.RPN_CLOBBER_POSITIVES:
               labels[max_overlaps < config.TRAIN.RPN_NEGATIVE_OVERLAP] = 0
        else:
          labels[:] = 0
        ...


        if gt_boxes.size > 0:
            bbox_targets[:] = bbox_transform(anchors, gt_boxes[argmax_overlaps, :4])

Note:用v_anchor指代经网络修正后的anchor

  1. label比较清晰了,先将每个 ground truth 映射到一个v_anchor,然后对达到阈值要求的v_anchor也进行设置;
  2. 但bbox_target却存有疑问,每个v_anchor都被强制附上了一个ground truth,是否会导致那些没有内容的v_anchor影响系统的收敛?一个比较合理的猜测,是在gt_boxes提供时已经吸收了那些没有目标的anchor,这个想法没有找到证据。于是想到了,bbox_weight关于此,发现了相关证据。

先看回顾下bbox_weight出场的地方(参照上文程序):

 # bounding box regression 
 bbox_pred = mx.symbol.FullyConnected(name='bbox_pred', data=drop7, num_hidden=num_classes * 4)
 bbox_loss_ = bbox_weight * mx.symbol.smooth_l1(name='bbox_loss_', scalar=1.0, data=(bbox_pred - bbox_target))

然后来看是怎么产生的:
(路径:
proposal_target—>sample_rois—>expand_bbox_regression_targets):

def expand_bbox_regression_targets(bbox_targets_data, num_classes):
    """ 
    expand from 5 to 4 * num_classes; only the right class has non-zero bbox regression targets
    :param bbox_targets_data: [k * 5]
    :param num_classes: number of classes
    :return: bbox target processed [k * 4 num_classes]
    bbox_weights ! only foreground boxes have bbox regression computation!
    """
    classes = bbox_targets_data[:, 0]
    bbox_targets = np.zeros((classes.size, 4 * num_classes), dtype=np.float32)
    bbox_weights = np.zeros(bbox_targets.shape, dtype=np.float32)
    indexes = np.where(classes > 0)[0]  # >  :  exclude background
    for index in indexes:
        cls = classes[index]
        start = int(4 * cls)
        end = start + 4 
        bbox_targets[index, start:end] = bbox_targets_data[index, 1:]   #  why expanding? -> fullyConnected output (4*num_class)
        bbox_weights[index, start:end] = config.TRAIN.BBOX_WEIGHTS
    return bbox_targets, bbox_weights

注释的第一行就把话说明白了:只有有目标(label>0)的的回归才会有非零权重。

Conclusion

做个小结。

  1. 整个网络有两次回归预测,而每次预测实际上只是预测了一个偏差:第一次的参考物是v_anchor(经一次预测修正后的anchor),第二次是roi,两次的对比物都是gt_boxes;另外,每次预测中,中心坐标的偏移都被视为待预测的一部分,而并不是anchor的中心在第一次修正(预测)中被保留(即使它是按pixel-wise产生的)。
  2. gt_weight用来屏蔽对产生背景box的惩罚。
  3. proposal Op仅进行融作,没有后向操作;proposal_target提供用于后续结构训练的样本,其中包含了label=0的样本。
posted @ 2017-04-28 09:39  rotxin  阅读(1395)  评论(0编辑  收藏  举报