mmdetection:各种各样的anchor生成方式及其标签分配assigner(1)
在anchor_generator.py中,集成了很多中anchor的生成方式,趁阅读源码mmdetection之际,对其进行一下总结。
(先总结下faster rcnn, yolov3, SSD,后续继续补充)
一、anchor生成
生成anchor的总体思路是,首先生成base_anchor,然后网格化(meshgrid)生成其他anchor。
1、faster rcnn
faster rcnn的anchor生成是最经典的,其他anchor生成方式与之相比大同小异。在anchor_generator中,可以
看到它是在没有使用for循环的情况下,如何生成的anchor的。
首先是base_anchor。以(0,0)为左上角,以(stride, stride)为基本(w, h),分别与scale,ratio计算得到的多个
anchor。比如scale = [8, 16, 32](w,h的大小) , ratios=[0.5, 1.0, 2.0](w和h的比例), 那就是生成9个anchor。
1 def gen_single_level_base_anchors(self,
2 base_size,
3 scales,
4 ratios,
5 center=None):
6 """Generate base anchors of a single level.
7
8 Args:
9 base_size (int | float): Basic size of an anchor.
10 scales (torch.Tensor): Scales of the anchor.
11 ratios (torch.Tensor): The ratio between between the height
12 and width of anchors in a single level.
13 center (tuple[float], optional): The center of the base anchor
14 related to a single feature grid. Defaults to None.
15
16 Returns:
17 torch.Tensor: Anchors in a single-level feature maps.
18 """
19 w = base_size
20 h = base_size
21 if center is None:
22 x_center = self.center_offset * w
23 y_center = self.center_offset * h
24 else:
25 x_center, y_center = center
26
27 h_ratios = torch.sqrt(ratios)
28 w_ratios = 1 / h_ratios
29 if self.scale_major:
30 ws = (w * w_ratios[:, None] * scales[None, :]).view(-1)
31 hs = (h * h_ratios[:, None] * scales[None, :]).view(-1)
32 else:
33 ws = (w * scales[:, None] * w_ratios[None, :]).view(-1)
34 hs = (h * scales[:, None] * h_ratios[None, :]).view(-1)
35
36 # use float anchor and the anchor's center is aligned with the
37 # pixel center
38 base_anchors = [
39 x_center - 0.5 * ws, y_center - 0.5 * hs, x_center + 0.5 * ws,
40 y_center + 0.5 * hs
41 ]
42 base_anchors = torch.stack(base_anchors, dim=-1)
43
44 return base_anchors
有了base_anchor,那就只需在其他位置上,对base_anchor进行相应的偏移即可。因此,先通过meshgrid,生成各个位置,然后加上base_anchor。
1 def _meshgrid(self, x, y, row_major=True):
2 """Generate mesh grid of x and y.
3
4 Args:
5 x (torch.Tensor): Grids of x dimension.
6 y (torch.Tensor): Grids of y dimension.
7 row_major (bool, optional): Whether to return y grids first.
8 Defaults to True.
9
10 Returns:
11 tuple[torch.Tensor]: The mesh grids of x and y.
12 """
13 xx = x.repeat(len(y))
14 yy = y.view(-1, 1).repeat(1, len(x)).view(-1)
15 if row_major:
16 return xx, yy
17 else:
18 return yy, xx
19
20 def single_level_grid_anchors(self,
21 base_anchors,
22 featmap_size,
23 stride=(16, 16),
24 device='cuda'):
25 """Generate grid anchors of a single level.
26
27 Note:
28 This function is usually called by method ``self.grid_anchors``.
29
30 Args:
31 base_anchors (torch.Tensor): The base anchors of a feature grid.
32 featmap_size (tuple[int]): Size of the feature maps.
33 stride (tuple[int], optional): Stride of the feature map in order
34 (w, h). Defaults to (16, 16).
35 device (str, optional): Device the tensor will be put on.
36 Defaults to 'cuda'.
37
38 Returns:
39 torch.Tensor: Anchors in the overall feature maps.
40 """
41 feat_h, feat_w = featmap_size
42 # convert Tensor to int, so that we can covert to ONNX correctlly
43 feat_h = int(feat_h)
44 feat_w = int(feat_w)
45 shift_x = torch.arange(0, feat_w, device=device) * stride[0]
46 shift_y = torch.arange(0, feat_h, device=device) * stride[1]
47
48 shift_xx, shift_yy = self._meshgrid(shift_x, shift_y)
49 shifts = torch.stack([shift_xx, shift_yy, shift_xx, shift_yy], dim=-1)
50 shifts = shifts.type_as(base_anchors)
51 # first feat_w elements correspond to the first row of shifts
52 # add A anchors (1, A, 4) to K shifts (K, 1, 4) to get
53 # shifted anchors (K, A, 4), reshape to (K*A, 4)
54
55 all_anchors = base_anchors[None, :, :] + shifts[:, None, :]
56 all_anchors = all_anchors.view(-1, 4)
57 # first A rows correspond to A anchors of (0, 0) in feature map,
58 # then (0, 1), (0, 2), ...
59 return all_anchors
2、yolov2&yolov3
和faster rcnn的不同就是base_anchor。yolo的base_anchor是通过对数据集聚类得到的。如下,可以看到这里不需要在去
计算scale和ratio,剩下的就是网格化生成其余anchor。
1 def gen_single_level_base_anchors(self, base_sizes_per_level, center=None):
2 """Generate base anchors of a single level.
3
4 Args:
5 base_sizes_per_level (list[tuple[int, int]]): Basic sizes of
6 anchors.
7 center (tuple[float], optional): The center of the base anchor
8 related to a single feature grid. Defaults to None.
9
10 Returns:
11 torch.Tensor: Anchors in a single-level feature maps.
12 """
13 x_center, y_center = center
14 base_anchors = []
15 for base_size in base_sizes_per_level:
16 w, h = base_size
17
18 # use float anchor and the anchor's center is aligned with the
19 # pixel center
20 base_anchor = torch.Tensor([
21 x_center - 0.5 * w, y_center - 0.5 * h, x_center + 0.5 * w,
22 y_center + 0.5 * h
23 ])
24 base_anchors.append(base_anchor)
25 base_anchors = torch.stack(base_anchors, dim=0)
26
27 return base_anchors
3、SSD
SSD也类似,不同的地方就是anchor的尺度不再是固定的,而是变化的(参考论文给的尺度公式):随着特征图减小,尺度逐渐增大(感受野大,anchor也要大)。
剩下的就和faster rcnn一样了。参考https://zhuanlan.zhihu.com/p/33544892
1 # 计算出在原图上,anchor大小(单边为60, 111, 162, 213, 264)
2 min_sizes = []
3 max_sizes = []
4 for ratio in range(int(min_ratio), int(max_ratio) + 1, step):
5 min_sizes.append(int(self.input_size * ratio / 100))
6 max_sizes.append(int(self.input_size * (ratio + step) / 100))
7
8 # anchor再增加一个尺度30
9 if self.input_size == 300:
10 if basesize_ratio_range[0] == 0.15: # SSD300 COCO
11 min_sizes.insert(0, int(self.input_size * 7 / 100))
12 max_sizes.insert(0, int(self.input_size * 15 / 100))
13
14 # 计算其 scale ratio
15 anchor_ratios = []
16 anchor_scales = []
17 for k in range(len(self.strides)):
18 scales = [1., np.sqrt(max_sizes[k] / min_sizes[k])]
19 anchor_ratio = [1.]
20 for r in ratios[k]:
21 anchor_ratio += [1 / r, r] # 4 or 6 ratio
22 anchor_ratios.append(torch.Tensor(anchor_ratio))
23 anchor_scales.append(torch.Tensor(scales))
二、anchor assigner
生成了anchor之后,要对其打标签,看看哪些是正样本,哪些是负样本。
1、MaxIouAssigner
SSD和faseter rcnn采用的方式,计算anchor与GT的IOU。对每个anchor,如果其IOU>pos_thread,则为正样本,如果IOU<neg_thread,则为背景类;此外,
在代码中还有一段细节。在上述策略中,有些GT可能没有匹配到任何的anchor,因此添加了一个补救措施来扩充正样本:遍历每一个GT,
查看iou最大的anchor,如果最大iou>pos_min_iou,那就标记为正样本。注意:这个其实并不能保证每个GT一定有anchor(跟GT遍历顺序有关),
并且会引入一些不太好的正样本,因此效果并不一定好。这边博客举了一个不错的例子
目标检测(MMdetection)——Retina(Anchor、Focal Loss) - 知乎 (zhihu.com)
1 if self.match_low_quality:
2 # Low-quality matching will overwirte the assigned_gt_inds assigned
3 # in Step 3. Thus, the assigned gt might not be the best one for
4 # prediction.
5 # For example, if bbox A has 0.9 and 0.8 iou with GT bbox 1 & 2,
6 # bbox 1 will be assigned as the best target for bbox A in step 3.
7 # However, if GT bbox 2's gt_argmax_overlaps = A, bbox A's
8 # assigned_gt_inds will be overwritten to be bbox B.
9 # This might be the reason that it is not used in ROI Heads.
10 for i in range(num_gts):
11 if gt_max_overlaps[i] >= self.min_pos_iou:
12 if self.gt_max_assign_all:
13 max_iou_inds = overlaps[i, :] == gt_max_overlaps[i]
14 assigned_gt_inds[max_iou_inds] = i + 1
15 else:
16 assigned_gt_inds[gt_argmax_overlaps[i]] = i + 1
2、GridAssigner
YOLO采用的方式,同样先计算anchor与GT的IOU。负样本标记方式相同,不同的是正样本。对于每个anchor,其最近网格的IOU>pos_thread并且其中心落入
该网格,则该anchor为正样本;对于每个GT,将其最近的anchor,赋值给该GT最近的格子。这就意味着,每个GT,其实只有一个正样本。
三、box编码
在训练过程中,并非直接使用GT和anchor的坐标直接训练,为了加速收敛,会对其进行编码,编码的方式略有不同。
这部分参考史上最详细的Yolov3边框预测分析_逍遥王的博客-CSDN博客