Tensorflow版Faster RCNN源码解析(TFFRCNN) (18) rpn_msr/anchor_target_layer_tf.py
本blog为github上CharlesShang/TFFRCNN版源码解析系列代码笔记
---------------个人学习笔记---------------
----------------本文作者疆--------------
------点击此处链接至博客园原文------
1.anchor_target_layer(rpn_cls_score, gt_boxes, gt_ishard, dontcare_areas, im_info, _feat_stride = [16,], anchor_scales = [4 ,8, 16, 32])代码逻辑
调用(generate_anchors.py中)generate_anchors(...)产生9个base anchors--->
im_info = im_info[0] # 取出第一张图像更新im_info?,存储该图像的宽、高和缩放因子,blobs['im_info']是什么内容?多少张图像???与imdb.roidb相关???--->
计算shifts偏移量,即在conv5_3 feature map各个位置相对于(0,0)位置(在scaled图像上)的距离,如[0,16,0,16],为什么不用2列表示,要用4列表示偏移?--->
在conv5_3 feature map各个位置利用shifts和9个base anchors产生total_anchors个anchors,计算anchors需对base anchors和shifts进行reshape,此处要用到Python的broadcast机制--->
剔除越界的anchors,而不是限制在图像边界(RPN训练时,与原文描述一致),inds_inside为对应anchors的索引,更新anchors = all_anchors[inds_inside, :]--->
产生anchors的labels(0表示负样本、1表示正样本、-1表示dontcare):首先以-1填充labels,调用bbox_overlaps(...)(utils/cython_bbox.so,由C编译)计算anchors与gt_boxes的overlaps,对于各anchor,若其与gt_boxes max IOU<0.3,则label置为0,将与各gt产生max IOU的anchor的label置为1,将各anchor与gt max IOU超过0.7的anchor的label置为1;调用bbox_intersections(...)(utils/cython_bbox.so,由C编译)计算dontcare_areas与anchors的交集,对于各anchor,若其与dontcare_areas交集和超过0.5,将其label置为-1;调用bbox_overlaps(...)(utils/cython_bbox.so,由C编译)计算gt_hardboxes与anchors的overlaps,对于各anchor,若其与gt_hardboxes max IOU超过0.1,将其label置为-1,此外,对于各gt_hardboxes,将与其有max IOU的anchor对应label置为-1--->
对正、负样本RPN进行采样,各取128个,若正样本不足128个,以负样本补足,其余anchors的label置为-1--->
调用_compute_targets(...)函数计算各anchors的回归目标值bbox_targets--->
创建bbox_inside_weights和bbox_outside_weights,对于正样本anchor,其值为[1, 1, 1, 1],其余anchor对应均为[0, 0, 0, 0],未知实际使用意义--->
调用_unmap(...)函数将data(labels、bbox_targets、bbox_inside_weights、bbox_outside_weights)的shape由(inds_inside,None)扩充为(total_anchors,None),扩充位置数据填入无效值--->
# total_anchors为在conv5_3 feature map上产生的所有anchors(含超出图像边界的anchors)的数量 # inds_inside为剔除越界anchors后的anchors索引 labels = _unmap(labels, total_anchors, inds_inside, fill=-1) bbox_targets = _unmap(bbox_targets, total_anchors, inds_inside, fill=0) bbox_inside_weights = _unmap(bbox_inside_weights, total_anchors, inds_inside, fill=0) bbox_outside_weights = _unmap(bbox_outside_weights, total_anchors, inds_inside, fill=0)
将labels、bbox_targets、bbox_inside_weights、bbox_outside_weights reshape成(1, height, width, A)或(1, height, width, A*4)并返回(即rpn_labels, rpn_bbox_targets, rpn_bbox_inside_weights, rpn_bbox_outside_weights)
# blobs['im_info']是什么内容?多少张图像???与imdb.roidb相关??? # 传入anchor_scales = [8, 16, 32] # 为anchors产生标签(-1,0,1)rpn_labels、回归目标值rpn_bbox_targets以及 # rpn_bbox_inside_weights(各anchor的权重系数)、平衡bg/fg anchors的rpn_bbox_outside_weights def anchor_target_layer(rpn_cls_score, gt_boxes, gt_ishard, dontcare_areas, im_info, _feat_stride = [16,], anchor_scales = [4 ,8, 16, 32]): """ Assign anchors to ground-truth targets. Produces anchor classification labels and bounding-box regression targets. Parameters ---------- rpn_cls_score: (1, H, W, Ax2) bg/fg scores of previous conv layer gt_boxes: (G, 5) vstack of [x1, y1, x2, y2, class] gt_ishard: (G, 1), 1 or 0 indicates difficult or not dontcare_areas: (D, 4), some areas may contains small objs but no labelling. D may be 0 im_info: a list of [image_height, image_width, scale_ratios] _feat_stride: the downsampling ratio of feature map to the original input image anchor_scales: the scales to the basic_anchor (basic anchor is [16, 16]) ---------- Returns ---------- rpn_labels : (HxWxA, 1), for each anchor, 0 denotes bg, 1 fg, -1 dontcare rpn_bbox_targets: (HxWxA, 4), distances of the anchors to the gt_boxes(may contains some transform) that are the regression objectives rpn_bbox_inside_weights: (HxWxA, 4) weights of each boxes, mainly accepts hyper param in cfg rpn_bbox_outside_weights: (HxWxA, 4) used to balance the fg/bg, beacuse the numbers of bgs and fgs mays significiantly different """ # 在conv5_3 feature map(0,0)位置上产生9个base anchors _anchors = generate_anchors(scales=np.array(anchor_scales)) _num_anchors = _anchors.shape[0] if DEBUG: print 'anchors:' print _anchors print 'anchor shapes:' print np.hstack(( _anchors[:, 2::4] - _anchors[:, 0::4], _anchors[:, 3::4] - _anchors[:, 1::4], )) _counts = cfg.EPS _sums = np.zeros((1, 4)) _squared_sums = np.zeros((1, 4)) _fg_sum = 0 _bg_sum = 0 _count = 0 # allow boxes to sit over the edge by a small amount _allowed_border = 0 # map of shape (..., H, W) #height, width = rpn_cls_score.shape[1:3] # 取出第一张图像相关信息??? im_info = im_info[0] # Algorithm: # # for each (H, W) location i # generate 9 anchor boxes centered on cell i # apply predicted bbox deltas at cell i to each of the 9 anchors # filter out-of-image anchors !!!!!!RPN训练阶段剔除越界anchor # measure GT overlap assert rpn_cls_score.shape[0] == 1, \ 'Only single item batches are supported' # map of shape (..., H, W) # conv5_3 feature map高和宽 height, width = rpn_cls_score.shape[1:3] if DEBUG: print 'AnchorTargetLayer: height', height, 'width', width print '' print 'im_size: ({}, {})'.format(im_info[0], im_info[1]) print 'scale: {}'.format(im_info[2]) print 'height, width: ({}, {})'.format(height, width) print 'rpn: gt_boxes.shape', gt_boxes.shape print 'rpn: gt_boxes', gt_boxes # 产生conv5 feature map各个位置相对于(0,0)位置(在scaled图像上)的偏移量,如[0,16,0,16] # 1. Generate proposals from bbox deltas and shifted anchors shift_x = np.arange(0, width) * _feat_stride shift_y = np.arange(0, height) * _feat_stride shift_x, shift_y = np.meshgrid(shift_x, shift_y) # in W H order # K is H x W shifts = np.vstack((shift_x.ravel(), shift_y.ravel(), shift_x.ravel(), shift_y.ravel())).transpose() # add A anchors (1, A, 4) to # cell K shifts (K, 1, 4) to get # shift anchors (K, A, 4) # reshape to (K*A, 4) shifted anchors A = _num_anchors K = shifts.shape[0] # 在conv5_3 feature map各个位置上产生所有anchors all_anchors = (_anchors.reshape((1, A, 4)) + shifts.reshape((1, K, 4)).transpose((1, 0, 2))) all_anchors = all_anchors.reshape((K * A, 4)) # 产生的anchors数量 total_anchors = int(K * A) # 训练RPN阶段:剔除越界的anchors # only keep anchors inside the image inds_inside = np.where( (all_anchors[:, 0] >= -_allowed_border) & (all_anchors[:, 1] >= -_allowed_border) & (all_anchors[:, 2] < im_info[1] + _allowed_border) & # width (all_anchors[:, 3] < im_info[0] + _allowed_border) # height )[0] if DEBUG: print 'total_anchors', total_anchors print 'inds_inside', len(inds_inside) # keep only inside anchors anchors = all_anchors[inds_inside, :] if DEBUG: print 'anchors.shape', anchors.shape # label: 1 is positive, 0 is negative, -1 is dont care # 为anchors创建labels labels = np.empty((len(inds_inside), ), dtype=np.float32) labels.fill(-1) # overlaps between the anchors and the gt boxes # overlaps (ex, gt), shape is A x G # gt_boxes: (G, 5) vstack of [x1, y1, x2, y2, class] # bbox_overlaps(...)由C编译 overlaps.shape = (anchors.shape[0],gt_boxes[0]) overlaps = bbox_overlaps( np.ascontiguousarray(anchors, dtype=np.float), np.ascontiguousarray(gt_boxes, dtype=np.float)) argmax_overlaps = overlaps.argmax(axis=1) # A 这里原代码注释的A是指anchor的数量,而不是9! # 取出各anchor与gt max IOU值 max_overlaps = overlaps[np.arange(len(inds_inside)), argmax_overlaps] gt_argmax_overlaps = overlaps.argmax(axis=0) # G # 取出各gt与anchor max IOU值 gt_max_overlaps = overlaps[gt_argmax_overlaps, np.arange(overlaps.shape[1])] # 取出各gt与anchor得到max IOU值对应的anchor索引,该数组内元素可能重复 gt_argmax_overlaps = np.where(overlaps == gt_max_overlaps)[0] # 为各个anchor分配label # 该值表明:If an anchor statisfied by positive and negative conditions set to negative # 默认TRAIN.RPN_CLOBBER_POSITIVES = False if not cfg.TRAIN.RPN_CLOBBER_POSITIVES: # assign bg labels first so that positive labels can clobber重写 them # 默认TRAIN.RPN_NEGATIVE_OVERLAP = 0.3 labels[max_overlaps < cfg.TRAIN.RPN_NEGATIVE_OVERLAP] = 0 # fg label: for each gt, anchor with highest overlap labels[gt_argmax_overlaps] = 1 # fg label: above threshold IOU # 默认TRAIN.RPN_POSITIVE_OVERLAP = 0.7 labels[max_overlaps >= cfg.TRAIN.RPN_POSITIVE_OVERLAP] = 1 # 该值表明:If an anchor statisfied by positive and negative conditions set to negative # 默认TRAIN.RPN_CLOBBER_POSITIVES = False if cfg.TRAIN.RPN_CLOBBER_POSITIVES: # assign bg labels last so that negative labels can clobber positives labels[max_overlaps < cfg.TRAIN.RPN_NEGATIVE_OVERLAP] = 0 # preclude排除 dontcare areas if dontcare_areas is not None and dontcare_areas.shape[0] > 0: # intersec shape is D x A intersecs = bbox_intersections( np.ascontiguousarray(dontcare_areas, dtype=np.float), # D x 4 np.ascontiguousarray(anchors, dtype=np.float) # A x 4 这里原代码注释的A是指anchor的数量,而不是9! ) # 各anchor与所有dontcare_areas交集之和 intersecs_ = intersecs.sum(axis=0) # A x 1 # TRAIN.DONTCARE_AREA_INTERSECTION_HI = 0.5 与dontcare box交集之和阈值 # label为-1表明该anchor为dontcare anchor labels[intersecs_ > cfg.TRAIN.DONTCARE_AREA_INTERSECTION_HI] = -1 # preclude排除 hard samples that are highly occlusioned, truncated or difficult to see # 默认TRAIN.PRECLUDE_HARD_SAMPLES = True if cfg.TRAIN.PRECLUDE_HARD_SAMPLES and gt_ishard is not None and gt_ishard.shape[0] > 0: # 1 or 0 indicates difficult or not assert gt_ishard.shape[0] == gt_boxes.shape[0] gt_ishard = gt_ishard.astype(int) gt_hardboxes = gt_boxes[gt_ishard == 1, :] if gt_hardboxes.shape[0] > 0: # H x A hard_overlaps = bbox_overlaps( np.ascontiguousarray(gt_hardboxes, dtype=np.float), # H x 4 H指gt_hardboxes的数量 np.ascontiguousarray(anchors, dtype=np.float)) # A x 4 这里的A是指anchor的数量,而不是9! # 各anchor与gt_hardboxes max IOU值 hard_max_overlaps = hard_overlaps.max(axis=0) # (A) # TRAIN.RPN_POSITIVE_OVERLAP = 0.7 labels[hard_max_overlaps >= cfg.TRAIN.RPN_POSITIVE_OVERLAP] = -1 # 各gt_hardboxes与anchor得到max IOU对应的anchor索引 max_intersec_label_inds = hard_overlaps.argmax(axis=1) # H x 1 labels[max_intersec_label_inds] = -1 # 默认正(label为1)、负(lable为0)样本1:1采样,由TRAIN.RPN_FG_FRACTION控制 # 若正样本anchor不足128个,则以负样本填充??? # subsample positive labels if we have too many # 默认TRAIN.RPN_FG_FRACTION = 0.5、TRAIN.RPN_BATCHSIZE = 256 num_fg = int(cfg.TRAIN.RPN_FG_FRACTION * cfg.TRAIN.RPN_BATCHSIZE) fg_inds = np.where(labels == 1)[0] if len(fg_inds) > num_fg: disable_inds = npr.choice( fg_inds, size=(len(fg_inds) - num_fg), replace=False) labels[disable_inds] = -1 # subsample negative labels if we have too many num_bg = cfg.TRAIN.RPN_BATCHSIZE - np.sum(labels == 1) bg_inds = np.where(labels == 0)[0] if len(bg_inds) > num_bg: disable_inds = npr.choice( bg_inds, size=(len(bg_inds) - num_bg), replace=False) labels[disable_inds] = -1 #print "was %s inds, disabling %s, now %s inds" % ( #len(bg_inds), len(disable_inds), np.sum(labels == 0)) # inds_inside为anchors数量,即上面提到的A bbox_targets = np.zeros((len(inds_inside), 4), dtype=np.float32) # argmax_overlaps为各anchor与gt得到max IOU对应的gt索引 bbox_targets = _compute_targets(anchors, gt_boxes[argmax_overlaps, :]) # bbox_inside_weights和bbox_outside_weights的实际意义??? bbox_inside_weights = np.zeros((len(inds_inside), 4), dtype=np.float32) # 默认TRAIN.RPN_BBOX_INSIDE_WEIGHTS = (1.0, 1.0, 1.0, 1.0) # 仅正样本有bbox_inside_weights和bbox_outside_weights权重系数,其余全0 bbox_inside_weights[labels == 1, :] = np.array(cfg.TRAIN.RPN_BBOX_INSIDE_WEIGHTS) bbox_outside_weights = np.zeros((len(inds_inside), 4), dtype=np.float32) # 默认TRAIN.RPN_POSITIVE_WEIGHT = -1.0 if cfg.TRAIN.RPN_POSITIVE_WEIGHT < 0: # uniform weighting of examples (given non-uniform sampling) num_examples = np.sum(labels >= 0) + 1 # positive_weights = np.ones((1, 4)) * 1.0 / num_examples # negative_weights = np.ones((1, 4)) * 1.0 / num_examples positive_weights = np.ones((1, 4)) negative_weights = np.zeros((1, 4)) else: assert ((cfg.TRAIN.RPN_POSITIVE_WEIGHT > 0) & (cfg.TRAIN.RPN_POSITIVE_WEIGHT < 1)) positive_weights = (cfg.TRAIN.RPN_POSITIVE_WEIGHT / (np.sum(labels == 1)) + 1) negative_weights = ((1.0 - cfg.TRAIN.RPN_POSITIVE_WEIGHT) / (np.sum(labels == 0)) + 1) # 默认正样本anchor的bbox_outside_weights为[1, 1, 1, 1] 负样本anchor的bbox_outside_weights为[0, 0, 0, 0] bbox_outside_weights[labels == 1, :] = positive_weights bbox_outside_weights[labels == 0, :] = negative_weights if DEBUG: _sums += bbox_targets[labels == 1, :].sum(axis=0) _squared_sums += (bbox_targets[labels == 1, :] ** 2).sum(axis=0) _counts += np.sum(labels == 1) means = _sums / _counts stds = np.sqrt(_squared_sums / _counts - means ** 2) print 'means:' print means print 'stdevs:' print stds # map up to original set of anchors # total_anchors为在conv5_3 feature map上产生的所有anchors(含超出图像边界的anchors)的数量 # inds_inside为剔除越界anchors后的anchors索引 labels = _unmap(labels, total_anchors, inds_inside, fill=-1) bbox_targets = _unmap(bbox_targets, total_anchors, inds_inside, fill=0) bbox_inside_weights = _unmap(bbox_inside_weights, total_anchors, inds_inside, fill=0) bbox_outside_weights = _unmap(bbox_outside_weights, total_anchors, inds_inside, fill=0) if DEBUG: print 'rpn: max max_overlap', np.max(max_overlaps) print 'rpn: num_positive', np.sum(labels == 1) print 'rpn: num_negative', np.sum(labels == 0) _fg_sum += np.sum(labels == 1) _bg_sum += np.sum(labels == 0) _count += 1 print 'rpn: num_positive avg', _fg_sum / _count print 'rpn: num_negative avg', _bg_sum / _count # labels #pdb.set_trace() # 对labels、bbox_targets、bbox_inside_weights、bbox_outside_weights进行reshape成(1, height, width, A或A*4)并返回 # labels(即rpn_labels)、bbox_targets(rpn_bbox_targets)、bbox_inside_weights(即rpn_bbox_inside_weights)、bbox_outside_weights(即rpn_bbox_outside_weights) # 这里的A=9 不同于上面的A!!! labels = labels.reshape((1, height, width, A)) rpn_labels = labels # bbox_targets bbox_targets = bbox_targets \ .reshape((1, height, width, A * 4)) rpn_bbox_targets = bbox_targets # bbox_inside_weights bbox_inside_weights = bbox_inside_weights \ .reshape((1, height, width, A * 4)) #assert bbox_inside_weights.shape[2] == height #assert bbox_inside_weights.shape[3] == width rpn_bbox_inside_weights = bbox_inside_weights # bbox_outside_weights bbox_outside_weights = bbox_outside_weights \ .reshape((1, height, width, A * 4)) #assert bbox_outside_weights.shape[2] == height #assert bbox_outside_weights.shape[3] == width rpn_bbox_outside_weights = bbox_outside_weights return rpn_labels, rpn_bbox_targets, rpn_bbox_inside_weights, rpn_bbox_outside_weights
# -*- coding:utf-8 -*- # Author: WUJiang # 测试功能 # gt_argmax_overlaps = np.where(overlaps == gt_max_overlaps)[0] import numpy as np a = np.array([ [5, 2, 3, 4], [2, 7, 1, 6], [9, 1, 2, 5], [3, 1, 4, 7] ]) b = np.array([2, 2, 2, 7])
# 前面是第1维的坐标,后面是第2维的坐标 # (array([0, 1, 2, 3], dtype=int64), array([1, 0, 2, 3], dtype=int64)) print(np.where(a == b)) # [0 1 2 3] print(np.where(a == b)[0])
# -*- coding:utf-8 -*- # Author: WUJiang # 测试功能 import numpy as np count = 5 data = np.array([ [1, 2, 3, 4], [2, 6, 8, 0] ]) c = (count, ) + data.shape[1:] # (5,4) print(c)
2._unmap(data, count, inds, fill=0)
将data(如labels、bbox_targets、bbox_inside_weights、bbox_outside_weights)的shape由(inds_inside,None)扩充为(total_anchors,None),扩充位置数据填入无效值,被anchor_target_layer(...)函数调用
# 将data的shape由(inds_inside,None)扩充为(total_anchors,None),扩充位置数据填入无效值 # inds_inside(即inds)表示剔除越界anchors的索引、total_anchors(即count)为在conv5_3 feature map上产生的anchors的数量 def _unmap(data, count, inds, fill=0): """ Unmap a subset of item (data) back to the original set of items (of size count) """ if len(data.shape) == 1: ret = np.empty((count, ), dtype=np.float32) ret.fill(fill) ret[inds] = data else: ret = np.empty((count, ) + data.shape[1:], dtype=np.float32) ret.fill(fill) ret[inds, :] = data return ret
3._compute_targets(ex_rois,gt_rois)
根据传入的anchors和gt boxes[:,:4]计算anchors回归目标值,被anchor_target_layer(...)调用
# 根据anchors和gt boxes[:,:4]计算anchors回归目标值 def _compute_targets(ex_rois, gt_rois): """Compute bounding-box regression targets for an image.""" assert ex_rois.shape[0] == gt_rois.shape[0] assert ex_rois.shape[1] == 4 assert gt_rois.shape[1] == 5 return bbox_transform(ex_rois, gt_rois[:, :4]).astype(np.float32, copy=False)