目标检测——SSD编码真实框

既然要依靠先验框来回归真实框，要确定两个问题，用哪个先验框来回归真实框，如何回归真实框

1. 用哪个先验框回归真实框

哪个先验框与真实框接近，就用哪个先验框来回归真实框，我们用iou来衡量接近，一般取阈值为0.5，

如果先验框与真实框iou大于0.5，我们用这个先验框来回归这个真实框，准确来说时这些先验框

iou = self.iou(box)
encoded_box = np.zeros((self.num_priors, 4 + return_iou))

# 找到每一个真实框，重合程度较高的先验框
assign_mask = iou > self.overlap_threshold
if not assign_mask.any():
    assign_mask[iou.argmax()] = True
if return_iou:
    encoded_box[:, -1][assign_mask] = iou[assign_mask]

# 找到对应的先验框
assigned_priors = self.priors[assign_mask]

2. 如何回归真实框

对于中心点坐标。我们回归真实框和先验框的偏移，对于长宽，我们回归真实框和先验框的缩放比例

assigned_priors = self.priors[assign_mask]
# 逆向编码，将真实框转化为ssd预测结果的格式

# 先计算真实框的中心与长宽
box_center = 0.5 * (box[:2] + box[2:])
box_wh = box[2:] - box[:2]
# 再计算重合度较高的先验框的中心与长宽
assigned_priors_center = 0.5 * (assigned_priors[:, :2] +
                                assigned_priors[:, 2:4])
assigned_priors_wh = (assigned_priors[:, 2:4] -
                      assigned_priors[:, :2])

# 逆向求取ssd应该有的预测结果
# 此时真实框和先验框都是相对于input_shape 大小，取值在（-1-1）
encoded_box[:, :2][assign_mask] = box_center - assigned_priors_center

# 相对于先验框，大框容忍度大，小框容忍度小
encoded_box[:, :2][assign_mask] /= assigned_priors_wh

# 除以0.1（扩大10倍）。处于（0-1）之间，归一化特征
encoded_box[:, :2][assign_mask] /= assigned_priors[:, -4:-2]

# 取log，
encoded_box[:, 2:4][assign_mask] = np.log(box_wh / assigned_priors_wh)
# 除以0.2
encoded_box[:, 2:4][assign_mask] /= assigned_priors[:, -2:]
# （8732*5，） 
return encoded_box.ravel()

一张图像往往有多个目标，所以有多个实际box，所以对一张图像所有的框进行编码

encoded_boxes = np.apply_along_axis(self.encode_box, 1, boxes[:, :4])
# 每一个真实框的编码后的值，和iou
encoded_boxes = encoded_boxes.reshape(-1, self.num_priors, 5)

    def assign_boxes(self, boxes):
        assignment = np.zeros((self.num_priors, 4 + self.num_classes + 8))
        assignment[:, 4] = 1.0
        if len(boxes) == 0:
            return assignment
        # 对每一个真实框都进行iou计算
        encoded_boxes = np.apply_along_axis(self.encode_box, 1, boxes[:, :4])
        # 每一个真实框的编码后的值，和iou
        encoded_boxes = encoded_boxes.reshape(-1, self.num_priors, 5)
        
        # 取重合程度最大的先验框，并且获取这个先验框的index
　　　　　　#（8732）
        best_iou = encoded_boxes[:, :, -1].max(axis=0)

        #idx：第几个框
　　　　　　# (8732)
        best_iou_idx = encoded_boxes[:, :, -1].argmax(axis=0)
        best_iou_mask = best_iou > 0

　　　　　　# (num of prior_iou>0)
        best_iou_idx = best_iou_idx[best_iou_mask]

        assign_num = len(best_iou_idx)
        # 保留重合程度最大的先验框的应该有的预测结果
        encoded_boxes = encoded_boxes[:, best_iou_mask, :]
        assignment[:, :4][best_iou_mask] = encoded_boxes[best_iou_idx,np.arange(assign_num),:4]
        # 4代表为背景的概率，为0
        assignment[:, 4][best_iou_mask] = 0
        assignment[:, 5:-8][best_iou_mask] = boxes[best_iou_idx, 4:]
        assignment[:, -8][best_iou_mask] = 1
        # 通过assign_boxes我们就获得了，输入进来的这张图片，应该有的预测结果是什么样子的
        return assignment

最后得到的编码框维度为（8732， 4 + num_classes + 8）

4：代表下x, y, h, w的偏移值

num_classes：类别数，加背景类，如VOC为21

8：后8位的第以1 代表是否有目标，有目标位1，无目标位0；其余位为0

3.数据前处理

对于图像，进行图像增强，对于框，进行编码

    def generate(self, train=True):
        while True:
            if train:
                # 打乱
                shuffle(self.train_lines)
                lines = self.train_lines
            else:
                shuffle(self.val_lines)
                lines = self.val_lines
            inputs = []
            targets = []
            for annotation_line in lines:  
                img,y=self.get_random_data(annotation_line,self.image_size[0:2])
                if len(y)!=0:
                    boxes = np.array(y[:,:4],dtype=np.float32)
                    boxes[:,0] = boxes[:,0]/self.image_size[1]
                    boxes[:,1] = boxes[:,1]/self.image_size[0]
                    boxes[:,2] = boxes[:,2]/self.image_size[1]
                    boxes[:,3] = boxes[:,3]/self.image_size[0]
                    one_hot_label = np.eye(self.num_classes)[np.array(y[:,4],np.int32)]
                    if ((boxes[:,3]-boxes[:,1])<=0).any() and ((boxes[:,2]-boxes[:,0])<=0).any():
                        continue
                    
                    y = np.concatenate([boxes,one_hot_label],axis=-1)

                y = self.bbox_util.assign_boxes(y)
                inputs.append(img)               
                targets.append(y)
                if len(targets) == self.batch_size:
                    tmp_inp = np.array(inputs)
                    tmp_targets = np.array(targets)
                    inputs = []
                    targets = []
                    yield preprocess_input(tmp_inp), tmp_targets

posted @ 2020-12-14 23:19 learningcaiji 阅读(635) 评论(0) 编辑收藏举报

刷新页面返回顶部

learningcaiji

目标检测——SSD编码真实框

公告