深度学习笔记(十五)目标检测回归损失 GIoU、DIoU、CIoU
论文:Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression
Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression
代码:https://giou.stanford.edu/
https://github.com/Zzh-tju/CIoU
IoU
Intersection over Union (IoU) 是目标检测里一种重要的评价值。上面第一张途中框出了 gt box 和 predict box,IoU 通过计算这两个框 A、B 间的 Intersection Area $I$ 和 Union Area $U$ 的比值来获得:
\begin{equation}
\label{IoU}
IoU = \frac{|A \cap B|}{|A \cup B|} = \frac{|I|}{|U|}
\end{equation}
然而现有的算法都采用 distance losses(例如 SSD 里的 smooth_L1 loss) 来优化这一评价值。讲道理 The optimal objective for a metric is the metric itself. 所以我们可以直接将 IoU 直接作为回归 loss 来使用,令人遗憾的是 IoU 无法优化无重叠的 bboxes。
如果用 IoU 作为 loss($\mathcal{L}_{IoU} = 1 - IoU$) 衡量值的话有两个优点和一个缺点:
1. IoU 可以有效比较两个任意形状之间相似性
2. IoU 具有尺度不变性
3. 任意两个形状 A、B 之间如果没有 overlap,则 IoU 均为 0,此时,IoU 无法分辨两个形状 A、B 是靠的非常近还是非常远
GIoU
GIoU 作为 IoU 的升级版,既继承了 IoU 的两个优点,又弥补了 IoU 无法衡量无重叠框之间的距离的缺点。具体计算方式是在 IoU 计算的基础上寻找一个 smallest convex shapes $C$,具体计算公式是:
\begin{equation}
\label{GIoU}
GIoU = \frac{|A \cap B|}{|A \cup B|} - \frac{|C \setminus (A \cup B)|}{|C|} = IoU - \frac{|C \setminus (A \cup B)|}{|C|}
\end{equation}
下图中有两个不同的检测结果 bad & better,不难看出距离 gt box 越远 $C$ 越大。
如此,损失函数可以写成:$\mathcal{L}_{GIoU} = 1- GIoU$,不难发现 $\mathcal{L}_{GIoU}$ 的值域范围为 $[0, 2)$。
In summary, this generalization keeps the major properties of IoU while rectifying its weakness.
DIoU & CIoU
论文中提出,GIoU loss 仍然存在收敛速度慢、回归不准等问题。
In this paper, we propose a Distance-IoU (DIoU) loss by incorporating the normalized distance between the predicted box and the target box, which converges much faster in training than IoU and GIoU losses. Furthermore, this paper summarizes three geometric factors in bounding box regression, i.e., overlap area, central point distance and aspect ratio, based on which a Complete IoU (CIoU) loss is proposed, thereby leading to faster convergence and better performance. Moreover, DIoU can be easily adopted into non-maximum suppression (NMS) to act as the criterion, further boosting performance improvement.
作者在分析 GIoU loss 时,发现 GIoU 首先会试图通过增加检测框的大小使其与目标 bbox 有重叠,然后利用 IoU loss 项使其与目标 bbox 重叠面积最大,如下左图所示:
同时,但两个框有包含关系是,GIoU loss 就退化成了 IoU loss 了。这时候边界框的对齐变得较困难,收敛较慢。
In Distance-IoU (DIoU) loss, we simply add a penalty term on IoU loss to directly minimize the normalized distance between central points of two bounding boxes, leading to much faster convergence than GIoU loss.
作者认为,一个好的 bbox 回归损失应该考虑三个重要的集合度量:重叠面积、中心点距离和高宽比。结合这些,作者进一步提出了一个 Complete IoU (CIoU) loss。同时 DIoU 还可以引入到 NMS 中来替换里面的 IoU,使得目标在遮挡情况下检测更鲁棒。
DIoU
参考上图,DIoU loss 的公式为:
\begin{equation}
\label{DIoU}
\begin{split}
& \mathcal{R}_{DIoU} = \frac{\rho^2(\bf{b}, \bf{b^{gt}})}{c^2} \\
& \mathcal{L}_{DIoU} = 1 - IoU + \frac{\rho^2(\bf{b}, \bf{b^{gt}})}{c^2} \\
& \mathcal{L}_{DIoU} = 1 - IoU + \frac{d^2}{c^2}
\end{split}
\end{equation}
这里的 $\bf{d}$ 和 $\bf{c}$ 分别代表检测框和真实框的中心点,且 $d$ 代表的是计算两个中心点之间的欧氏距离,$c$ 则代表 GIoU 中提到的 smallest convex shapes 的对角线距离。
优点:
- 与GIoU loss 类似,DIoU loss 在与目标框不重叠时,仍然可以为边界框提供移动方向。
- DIoU loss 可以直接最小化两个目标框的距离,因此比 GIoU loss 收敛快得多。
- 对于包含两个框在水平方向和垂直方向上这种情况,DIoU loss 可以使回归非常快,而 GIoU loss 几乎退化为 IoU loss。
- DIoU 还可以替换普通的 IoU 评价策略,应用于 NMS 中,使得 NMS 得到的结果更加合理和有效。
同 $\mathcal{L}_{GIoU}$ 类似, $\mathcal{L}_{DIoU}$ 的值域范围也为 $[0, 2)$。
CIoU
$\mathcal{L}_{CIoU}$ 在 $\mathcal{L}_{DIoU}$ 的基础上考虑了 aspect ratios:
\begin{equation}
\label{CIoU}
\begin{split}
& \mathcal{R}_{CIoU} = \frac{\rho^2(\bf{b}, \bf{b^{gt}})}{c^2} + \alpha v \\
& v = \frac{4}{{\pi}^2}(arctan \frac{w^{gt}}{h^{gt}} - arctan \frac{w}{h})^2 \\
& \alpha = \frac{v}{(1 - IoU) + v} \\
& \mathcal{L}_{CoU} = 1 - IoU + \frac{d^2}{c^2} + \alpha v
\end{split}
\end{equation}
额,这个。。。看起来复杂的一逼
其中,$v$ 用来衡量高宽比的一致性,$\alpha$ 是一个 positive trade-off parameter, 是不参与求导的。
DIoU-NMS
这个还没试,等着。。。
示例
import numpy as np import matplotlib.pyplot as plt import math epsilon = 1e-5 def IoU(box1, box2, wh=False): if wh: xmin1, ymin1 = box1[0] - box1[2] / 2.0, box1[1] - box1[3] / 2.0 xmax1, ymax1 = box1[0] + box1[2] / 2.0, box1[1] + box1[3] / 2.0 xmin2, ymin2 = box2[0] - box2[2] / 2.0, box2[1] - box2[3] / 2.0 xmax2, ymax2 = box2[0] + box2[2] / 2.0, box2[1] + box2[3] / 2.0 else: xmin1, ymin1, xmax1, ymax1 = box1 xmin2, ymin2, xmax2, ymax2 = box2 # 计算交集部分尺寸 W = min(xmax1, xmax2) - max(xmin1, xmin2) H = min(ymax1, ymax2) - max(ymin1, ymin2) # 计算两个矩形框面积 SA = (xmax1 - xmin1) * (ymax1 - ymin1) SB = (xmax2 - xmin2) * (ymax2 - ymin2) cross = max(0, W) * max(0, H) # 计算交集面积 iou = float(cross) / (SA + SB - cross) return iou def GIoU(box1, box2, wh=False): if wh: xmin1, ymin1 = box1[0] - box1[2] / 2.0, box1[1] - box1[3] / 2.0 xmax1, ymax1 = box1[0] + box1[2] / 2.0, box1[1] + box1[3] / 2.0 xmin2, ymin2 = box2[0] - box2[2] / 2.0, box2[1] - box2[3] / 2.0 xmax2, ymax2 = box2[0] + box2[2] / 2.0, box2[1] + box2[3] / 2.0 else: xmin1, ymin1, xmax1, ymax1 = box1 xmin2, ymin2, xmax2, ymax2 = box2 iou = IoU(box1, box2, wh) SC = (max(xmax1, xmax2) - min(xmin1, xmin2)) * (max(ymax1, ymax2) - min(ymin1, ymin2)) # 计算交集部分尺寸 W = min(xmax1, xmax2) - max(xmin1, xmin2) H = min(ymax1, ymax2) - max(ymin1, ymin2) # 计算两个矩形框面积 SA = (xmax1 - xmin1) * (ymax1 - ymin1) SB = (xmax2 - xmin2) * (ymax2 - ymin2) cross = max(0, W) * max(0, H) # 计算交集面积 add_area = SA + SB - cross # 两矩形并集的面积 end_area = (SC - add_area) / SC # 闭包区域中不属于两个框的区域占闭包区域的比重 giou = iou - end_area return giou def DIoU(box1, box2, wh=False): if wh: inter_diag = (box1[0] - box2[0])**2 + (box1[1] - box2[1])**2 xmin1, ymin1 = box1[0] - box1[2] / 2.0, box1[1] - box1[3] / 2.0 xmax1, ymax1 = box1[0] + box1[2] / 2.0, box1[1] + box1[3] / 2.0 xmin2, ymin2 = box2[0] - box2[2] / 2.0, box2[1] - box2[3] / 2.0 xmax2, ymax2 = box2[0] + box2[2] / 2.0, box2[1] + box2[3] / 2.0 else: xmin1, ymin1, xmax1, ymax1 = box1 xmin2, ymin2, xmax2, ymax2 = box2 center_x1 = (xmax1 + xmin1) / 2 center_y1 = (ymax1 + ymin1) / 2 center_x2 = (xmax2 + xmin2) / 2 center_y2 = (ymax2 + ymin2) / 2 inter_diag = (center_x1 - center_x2)/2 ** 2 + (center_y1 - center_y2) ** 2 iou = IoU(box1, box2, wh) enclose1 = max(max(xmax1, xmax2)-min(xmin1, xmin2), 0.0) enclose2 = max(max(ymax1, ymax2)-min(ymin1, ymin2), 0.0) outer_diag = (enclose1 ** 2) + (enclose2 ** 2) diou = iou - 1.0 * inter_diag / outer_diag return diou def CIoU(box1, box2, wh=False, normaled=False): if wh: w1, h1 = box1[2], box1[3] w2, h2 = box2[2], box2[3] inter_diag = (box1[0] - box2[0])**2 + (box1[1] - box2[1])**2 xmin1, ymin1 = box1[0] - box1[2] / 2.0, box1[1] - box1[3] / 2.0 xmax1, ymax1 = box1[0] + box1[2] / 2.0, box1[1] + box1[3] / 2.0 xmin2, ymin2 = box2[0] - box2[2] / 2.0, box2[1] - box2[3] / 2.0 xmax2, ymax2 = box2[0] + box2[2] / 2.0, box2[1] + box2[3] / 2.0 else: xmin1, ymin1, xmax1, ymax1 = box1 xmin2, ymin2, xmax2, ymax2 = box2 w1, h1 = xmax1-xmin1, ymax1-ymin1 w2, h2 = xmax2-xmin2, ymax2-ymin2 center_x1 = (xmax1 + xmin1) / 2 center_y1 = (ymax1 + ymin1) / 2 center_x2 = (xmax2 + xmin2) / 2 center_y2 = (ymax2 + ymin2) / 2 inter_diag = (center_x1 - center_x2)/2 ** 2 + (center_y1 - center_y2) ** 2 iou = IoU(box1, box2, wh) enclose1 = max(max(xmax1, xmax2)-min(xmin1, xmin2), 0.0) enclose2 = max(max(ymax1, ymax2)-min(ymin1, ymin2), 0.0) outer_diag = (enclose1 ** 2) + (enclose2 ** 2) u = (inter_diag) / outer_diag arctan = math.atan(w2 / h2) - math.atan(w1 / h1) v = (4 / (math.pi ** 2)) * (math.atan(w2 / h2) - math.atan(w1 / h1))**2 S = 1 - iou alpha = v / (S + v) w_temp = 2 * w1 distance = w1 ** 2 + h1 ** 2 ar = (8 / (math.pi ** 2)) * arctan * ((w1 - w_temp) * h1) if not normaled: cious = iou - (u + alpha * ar / distance) else: cious = iou - (u + alpha * ar) cious = np.clip(cious, a_min=-1.0, a_max=1.0) return cious def bbox_giou_np(boxes1, boxes2): # xywh -> xyxy boxes1 = np.concatenate([boxes1[..., :2] - boxes1[..., 2:] * 0.5, boxes1[..., :2] + boxes1[..., 2:] * 0.5], axis=-1) boxes2 = np.concatenate([boxes2[..., :2] - boxes2[..., 2:] * 0.5, boxes2[..., :2] + boxes2[..., 2:] * 0.5], axis=-1) boxes1 = np.concatenate([np.minimum(boxes1[..., :2], boxes1[..., 2:]), np.maximum(boxes1[..., :2], boxes1[..., 2:])], axis=-1) boxes2 = np.concatenate([np.minimum(boxes2[..., :2], boxes2[..., 2:]), np.maximum(boxes2[..., :2], boxes2[..., 2:])], axis=-1) boxes1_area = (boxes1[..., 2] - boxes1[..., 0]) * (boxes1[..., 3] - boxes1[..., 1]) boxes2_area = (boxes2[..., 2] - boxes2[..., 0]) * (boxes2[..., 3] - boxes2[..., 1]) left_up = np.maximum(boxes1[..., :2], boxes2[..., :2]) right_down = np.minimum(boxes1[..., 2:], boxes2[..., 2:]) inter_section = np.maximum(right_down - left_up, 0.0) inter_area = inter_section[..., 0] * inter_section[..., 1] union_area = boxes1_area + boxes2_area - inter_area # 计算两个边界框之间的 iou 值 iou = inter_area / union_area # 计算最小闭合凸面 C 左上角和右下角的坐标 enclose_left_up = np.minimum(boxes1[..., :2], boxes2[..., :2]) enclose_right_down = np.maximum(boxes1[..., 2:], boxes2[..., 2:]) enclose = np.maximum(enclose_right_down - enclose_left_up, 0.0) # 计算最小闭合凸面 C 的面积 enclose_area = enclose[..., 0] * enclose[..., 1] # 根据 GIoU 公式计算 GIoU 值 giou = iou - 1.0 * (enclose_area - union_area) / enclose_area return giou # https://github.com/YunYang1994/TensorFlow2.0-Examples/blob/4d4a403d00e6e887ecb7229719b1407d2e132811/4-Object_Detection/YOLOV3/core/yolov3.py#L121 def bbox_giou_tf(boxes1, boxes2): # pred_xywh, label_xywh -> pred_xyxy, label_xyxy boxes1 = tf.concat([boxes1[..., :2] - boxes1[..., 2:] * 0.5, boxes1[..., :2] + boxes1[..., 2:] * 0.5], axis=-1) boxes2 = tf.concat([boxes2[..., :2] - boxes2[..., 2:] * 0.5, boxes2[..., :2] + boxes2[..., 2:] * 0.5], axis=-1) boxes1 = tf.concat([tf.minimum(boxes1[..., :2], boxes1[..., 2:]), tf.maximum(boxes1[..., :2], boxes1[..., 2:])], axis=-1) boxes2 = tf.concat([tf.minimum(boxes2[..., :2], boxes2[..., 2:]), tf.maximum(boxes2[..., :2], boxes2[..., 2:])], axis=-1) boxes1_area = (boxes1[..., 2] - boxes1[..., 0]) * (boxes1[..., 3] - boxes1[..., 1]) boxes2_area = (boxes2[..., 2] - boxes2[..., 0]) * (boxes2[..., 3] - boxes2[..., 1]) left_up = tf.maximum(boxes1[..., :2], boxes2[..., :2]) right_down = tf.minimum(boxes1[..., 2:], boxes2[..., 2:]) inter_section = tf.maximum(right_down - left_up, 0.0) inter_area = inter_section[..., 0] * inter_section[..., 1] union_area = boxes1_area + boxes2_area - inter_area # 计算两个边界框之间的 iou 值 iou = inter_area / union_area # 计算最小闭合凸面 C 左上角和右下角的坐标 enclose_left_up = tf.minimum(boxes1[..., :2], boxes2[..., :2]) enclose_right_down = tf.maximum(boxes1[..., 2:], boxes2[..., 2:]) enclose = tf.maximum(enclose_right_down - enclose_left_up, 0.0) # 计算最小闭合凸面 C 的面积 enclose_area = enclose[..., 0] * enclose[..., 1] # 根据 GIoU 公式计算 GIoU 值 giou = iou - 1.0 * (enclose_area - union_area) / enclose_area return giou def bbox_giou_torch(boxes1, boxes2): # boxes1, boxes2 = torch.tensor(boxes1, dtype=torch.float32), torch.tensor(boxes2, dtype=torch.float32) boxes1, boxes2 = torch.from_numpy(boxes1).float(), torch.from_numpy(boxes2).float() # pred_xywh, label_xywh -> pred_xyxy, label_xyxy boxes1 = torch.cat([boxes1[..., :2] - boxes1[..., 2:] * 0.5, boxes1[..., :2] + boxes1[..., 2:] * 0.5], dim=-1) boxes2 = torch.cat([boxes2[..., :2] - boxes2[..., 2:] * 0.5, boxes2[..., :2] + boxes2[..., 2:] * 0.5], dim=-1) boxes1 = torch.cat([torch.min(boxes1[..., :2], boxes1[..., 2:]), torch.max(boxes1[..., :2], boxes1[..., 2:])], dim=-1) boxes2 = torch.cat([torch.min(boxes2[..., :2], boxes2[..., 2:]), torch.max(boxes2[..., :2], boxes2[..., 2:])], dim=-1) boxes1_area = (boxes1[..., 2] - boxes1[..., 0]) * (boxes1[..., 3] - boxes1[..., 1]) boxes2_area = (boxes2[..., 2] - boxes2[..., 0]) * (boxes2[..., 3] - boxes2[..., 1]) left_up = torch.max(boxes1[..., :2], boxes2[..., :2]) right_down = torch.min(boxes1[..., 2:], boxes2[..., 2:]) inter_section = torch.max(right_down - left_up, torch.tensor(0.0)) inter_area = inter_section[..., 0] * inter_section[..., 1] union_area = boxes1_area + boxes2_area - inter_area # 计算两个边界框之间的 iou 值 iou = inter_area / union_area # 计算最小闭合凸面 C 左上角和右下角的坐标 enclose_left_up = torch.min(boxes1[..., :2], boxes2[..., :2]) enclose_right_down = torch.max(boxes1[..., 2:], boxes2[..., 2:]) enclose = torch.max(enclose_right_down - enclose_left_up, torch.tensor(0.0)) # 计算最小闭合凸面 C 的面积 enclose_area = enclose[..., 0] * enclose[..., 1] # 根据 GIoU 公式计算 GIoU 值 giou = iou - 1.0 * (enclose_area - union_area) / enclose_area return giou # https://github.com/Zzh-tju/DIoU-SSD-pytorch/blob/65b68b53f73173397937d4950ff916a41545c960/utils/box/box_utils.py#L5 def bbox_diou_torch(bboxes1, bboxes2): bboxes1, bboxes2 = torch.from_numpy(bboxes1).float(), torch.from_numpy(bboxes2).float() rows = bboxes1.shape[0] cols = bboxes2.shape[0] dious = torch.zeros((rows, cols)) if rows * cols == 0: return dious exchange = False if bboxes1.shape[0] > bboxes2.shape[0]: bboxes1, bboxes2 = bboxes2, bboxes1 dious = torch.zeros((cols, rows)) exchange = True w1 = bboxes1[:, 2] - bboxes1[:, 0] h1 = bboxes1[:, 3] - bboxes1[:, 1] w2 = bboxes2[:, 2] - bboxes2[:, 0] h2 = bboxes2[:, 3] - bboxes2[:, 1] area1 = w1 * h1 area2 = w2 * h2 center_x1 = (bboxes1[:, 2] + bboxes1[:, 0]) / 2 center_y1 = (bboxes1[:, 3] + bboxes1[:, 1]) / 2 center_x2 = (bboxes2[:, 2] + bboxes2[:, 0]) / 2 center_y2 = (bboxes2[:, 3] + bboxes2[:, 1]) / 2 inter_max_xy = torch.min(bboxes1[:, 2:], bboxes2[:, 2:]) inter_min_xy = torch.max(bboxes1[:, :2], bboxes2[:, :2]) out_max_xy = torch.max(bboxes1[:, 2:], bboxes2[:, 2:]) out_min_xy = torch.min(bboxes1[:, :2], bboxes2[:, :2]) inter = torch.clamp((inter_max_xy - inter_min_xy), min=0) inter_area = inter[:, 0] * inter[:, 1] # 交集 inter_diag = (center_x2 - center_x1) ** 2 + (center_y2 - center_y1) ** 2 outer = torch.clamp((out_max_xy - out_min_xy), min=0) outer_diag = (outer[:, 0] ** 2) + (outer[:, 1] ** 2) union = area1 + area2 - inter_area # 并集 dious = inter_area / union - (inter_diag) / outer_diag dious = torch.clamp(dious, min=-1.0, max=1.0) if exchange: dious = dious.T return dious def bbox_diou_np(boxes1, boxes2, normaled=False): inter_diag = np.sum(np.square(boxes1[..., :2] - boxes2[..., :2]), axis=1) # pred_xywh, label_xywh -> pred_xyxy, label_xyxy boxes1 = np.concatenate([boxes1[..., :2] - boxes1[..., 2:] * 0.5, boxes1[..., :2] + boxes1[..., 2:] * 0.5], axis=-1) boxes2 = np.concatenate([boxes2[..., :2] - boxes2[..., 2:] * 0.5, boxes2[..., :2] + boxes2[..., 2:] * 0.5], axis=-1) boxes1 = np.concatenate([np.minimum(boxes1[..., :2], boxes1[..., 2:]), np.maximum(boxes1[..., :2], boxes1[..., 2:])], axis=-1) boxes2 = np.concatenate([np.minimum(boxes2[..., :2], boxes2[..., 2:]), np.maximum(boxes2[..., :2], boxes2[..., 2:])], axis=-1) boxes1_area = (boxes1[..., 2] - boxes1[..., 0]) * (boxes1[..., 3] - boxes1[..., 1]) boxes2_area = (boxes2[..., 2] - boxes2[..., 0]) * (boxes2[..., 3] - boxes2[..., 1]) left_up = np.maximum(boxes1[..., :2], boxes2[..., :2]) right_down = np.minimum(boxes1[..., 2:], boxes2[..., 2:]) inter_section = np.maximum(right_down - left_up, 0.0) inter_area = inter_section[..., 0] * inter_section[..., 1] union_area = boxes1_area + boxes2_area - inter_area # 计算两个边界框之间的 iou 值 iou = inter_area / union_area # 计算最小闭合凸面 C 左上角和右下角的坐标 enclose_left_up = np.minimum(boxes1[..., :2], boxes2[..., :2]) enclose_right_down = np.maximum(boxes1[..., 2:], boxes2[..., 2:]) enclose = np.maximum(enclose_right_down - enclose_left_up, 0.0) outer_diag = (enclose[:, 0] ** 2) + (enclose[:, 1] ** 2) # 根据 DIoU 公式计算 DIoU 值 diou = iou - 1.0 * inter_diag / outer_diag diou = np.clip(diou, a_min=-1.0, a_max=1.0) return diou def bbox_diou_tf(boxes1, boxes2): inter_diag = tf.reduce_sum(tf.square(boxes1[..., :2] - boxes2[..., :2]), axis=1) # pred_xywh, label_xywh -> pred_xyxy, label_xyxy boxes1 = tf.concat([boxes1[..., :2] - boxes1[..., 2:] * 0.5, boxes1[..., :2] + boxes1[..., 2:] * 0.5], axis=-1) boxes2 = tf.concat([boxes2[..., :2] - boxes2[..., 2:] * 0.5, boxes2[..., :2] + boxes2[..., 2:] * 0.5], axis=-1) boxes1 = tf.concat([tf.minimum(boxes1[..., :2], boxes1[..., 2:]), tf.maximum(boxes1[..., :2], boxes1[..., 2:])], axis=-1) boxes2 = tf.concat([tf.minimum(boxes2[..., :2], boxes2[..., 2:]), tf.maximum(boxes2[..., :2], boxes2[..., 2:])], axis=-1) boxes1_area = (boxes1[..., 2] - boxes1[..., 0]) * (boxes1[..., 3] - boxes1[..., 1]) boxes2_area = (boxes2[..., 2] - boxes2[..., 0]) * (boxes2[..., 3] - boxes2[..., 1]) left_up = tf.maximum(boxes1[..., :2], boxes2[..., :2]) right_down = tf.minimum(boxes1[..., 2:], boxes2[..., 2:]) inter_section = tf.maximum(right_down - left_up, 0.0) inter_area = inter_section[..., 0] * inter_section[..., 1] union_area = boxes1_area + boxes2_area - inter_area # 计算两个边界框之间的 iou 值 iou = inter_area / union_area # 计算最小闭合凸面 C 左上角和右下角的坐标 # 计算最小闭合凸面 C 左上角和右下角的坐标 enclose_left_up = tf.minimum(boxes1[..., :2], boxes2[..., :2]) enclose_right_down = tf.maximum(boxes1[..., 2:], boxes2[..., 2:]) enclose = tf.maximum(enclose_right_down - enclose_left_up, 0.0) outer_diag = (enclose[:, 0] ** 2) + (enclose[:, 1] ** 2) # 根据 GIoU 公式计算 GIoU 值 diou = iou - 1.0 * inter_diag / outer_diag diou = tf.clip_by_value(diou, clip_value_min=-1.0, clip_value_max=1.0) return diou # https://github.com/Zzh-tju/DIoU-SSD-pytorch/blob/65b68b53f73173397937d4950ff916a41545c960/utils/box/box_utils.py#L47 def bbox_ciou_torch(bboxes1, bboxes2, normaled=False): bboxes1, bboxes2 = torch.from_numpy(bboxes1).float(), torch.from_numpy(bboxes2).float() rows = bboxes1.shape[0] cols = bboxes2.shape[0] cious = torch.zeros((rows, cols)) if rows * cols == 0: return cious exchange = False if bboxes1.shape[0] > bboxes2.shape[0]: bboxes1, bboxes2 = bboxes2, bboxes1 cious = torch.zeros((cols, rows)) exchange = True w1 = bboxes1[:, 2] - bboxes1[:, 0] h1 = bboxes1[:, 3] - bboxes1[:, 1] w2 = bboxes2[:, 2] - bboxes2[:, 0] h2 = bboxes2[:, 3] - bboxes2[:, 1] area1 = w1 * h1 area2 = w2 * h2 center_x1 = (bboxes1[:, 2] + bboxes1[:, 0]) / 2 center_y1 = (bboxes1[:, 3] + bboxes1[:, 1]) / 2 center_x2 = (bboxes2[:, 2] + bboxes2[:, 0]) / 2 center_y2 = (bboxes2[:, 3] + bboxes2[:, 1]) / 2 inter_max_xy = torch.min(bboxes1[:, 2:], bboxes2[:, 2:]) inter_min_xy = torch.max(bboxes1[:, :2], bboxes2[:, :2]) out_max_xy = torch.max(bboxes1[:, 2:], bboxes2[:, 2:]) out_min_xy = torch.min(bboxes1[:, :2], bboxes2[:, :2]) inter = torch.clamp((inter_max_xy - inter_min_xy), min=0) inter_area = inter[:, 0] * inter[:, 1] inter_diag = (center_x2 - center_x1) ** 2 + (center_y2 - center_y1) ** 2 outer = torch.clamp((out_max_xy - out_min_xy), min=0) outer_diag = (outer[:, 0] ** 2) + (outer[:, 1] ** 2) union = area1 + area2 - inter_area u = (inter_diag) / outer_diag iou = inter_area / union with torch.no_grad(): arctan = torch.atan(w2 / h2) - torch.atan(w1 / h1) v = (4 / (math.pi ** 2)) * torch.pow((torch.atan(w2 / h2) - torch.atan(w1 / h1)), 2) S = 1 - iou alpha = v / (S + v) w_temp = 2 * w1 distance = w1 ** 2 + h1 ** 2 ar = (8 / (math.pi ** 2)) * arctan * ((w1 - w_temp) * h1) if not normaled: cious = iou - (u + alpha * ar / distance) else: cious = iou - (u + alpha * ar) cious = torch.clamp(cious, min=-1.0, max=1.0) if exchange: cious = cious.T return cious def bbox_ciou_np(boxes1, boxes2, normaled=False): w1, h1 = boxes1[..., 2], boxes1[..., 3] w2, h2 = boxes2[..., 2], boxes2[..., 3] inter_diag = np.sum(np.square(boxes1[..., :2] - boxes2[..., :2]), axis=-1) # pred_xywh, label_xywh -> pred_xyxy, label_xyxy boxes1 = np.concatenate([boxes1[..., :2] - boxes1[..., 2:] * 0.5, boxes1[..., :2] + boxes1[..., 2:] * 0.5], axis=-1) boxes2 = np.concatenate([boxes2[..., :2] - boxes2[..., 2:] * 0.5, boxes2[..., :2] + boxes2[..., 2:] * 0.5], axis=-1) boxes1 = np.concatenate([np.minimum(boxes1[..., :2], boxes1[..., 2:]), np.maximum(boxes1[..., :2], boxes1[..., 2:])], axis=-1) boxes2 = np.concatenate([np.minimum(boxes2[..., :2], boxes2[..., 2:]), np.maximum(boxes2[..., :2], boxes2[..., 2:])], axis=-1) boxes1_area = (boxes1[..., 2] - boxes1[..., 0]) * (boxes1[..., 3] - boxes1[..., 1]) boxes2_area = (boxes2[..., 2] - boxes2[..., 0]) * (boxes2[..., 3] - boxes2[..., 1]) left_up = np.maximum(boxes1[..., :2], boxes2[..., :2]) right_down = np.minimum(boxes1[..., 2:], boxes2[..., 2:]) inter_section = np.maximum(right_down - left_up, 0.0) inter_area = inter_section[..., 0] * inter_section[..., 1] union_area = boxes1_area + boxes2_area - inter_area # 计算两个边界框之间的 iou 值 iou = inter_area / union_area # 计算最小闭合凸面 C 左上角和右下角的坐标 enclose_left_up = np.minimum(boxes1[..., :2], boxes2[..., :2]) enclose_right_down = np.maximum(boxes1[..., 2:], boxes2[..., 2:]) enclose = np.maximum(enclose_right_down - enclose_left_up, 0.0) outer_diag = (enclose[:, 0] ** 2) + (enclose[:, 1] ** 2) u = (inter_diag) / outer_diag # 根据 CIoU 公式计算 CIoU 值 arctan = np.arctan(w2 / h2) - np.arctan(w1 / h1) v = (4 / (math.pi ** 2)) * np.square(np.arctan(w2 / h2) - np.arctan(w1 / h1)) S = 1 - iou alpha = v / (S + v) w_temp = 2 * w1 distance = w1 ** 2 + h1 ** 2 ar = (8 / (math.pi ** 2)) * arctan * ((w1 - w_temp) * h1) if not normaled: cious = iou - (u + alpha * ar / distance) else: cious = iou - (u + alpha * ar) cious = np.clip(cious, a_min=-1.0, a_max=1.0) return cious def bbox_ciou_tf(boxes1, boxes2, normaled=False): w1, h1 = boxes1[..., 2], boxes1[..., 3] w2, h2 = boxes2[..., 2], boxes2[..., 3] inter_diag = tf.reduce_sum(tf.square(boxes1[..., :2] - boxes2[..., :2]), axis=-1) # pred_xywh, label_xywh -> pred_xyxy, label_xyxy boxes1 = tf.concat([boxes1[..., :2] - boxes1[..., 2:] * 0.5, boxes1[..., :2] + boxes1[..., 2:] * 0.5], axis=-1) boxes2 = tf.concat([boxes2[..., :2] - boxes2[..., 2:] * 0.5, boxes2[..., :2] + boxes2[..., 2:] * 0.5], axis=-1) boxes1 = tf.concat([tf.minimum(boxes1[..., :2], boxes1[..., 2:]), tf.maximum(boxes1[..., :2], boxes1[..., 2:])], axis=-1) boxes2 = tf.concat([tf.minimum(boxes2[..., :2], boxes2[..., 2:]), tf.maximum(boxes2[..., :2], boxes2[..., 2:])], axis=-1) boxes1_area = (boxes1[..., 2] - boxes1[..., 0]) * (boxes1[..., 3] - boxes1[..., 1]) boxes2_area = (boxes2[..., 2] - boxes2[..., 0]) * (boxes2[..., 3] - boxes2[..., 1]) left_up = tf.maximum(boxes1[..., :2], boxes2[..., :2]) right_down = tf.minimum(boxes1[..., 2:], boxes2[..., 2:]) inter_section = tf.maximum(right_down - left_up, 0.0) inter_area = inter_section[..., 0] * inter_section[..., 1] union_area = boxes1_area + boxes2_area - inter_area # 计算两个边界框之间的 iou 值 iou = inter_area / union_area # 计算最小闭合凸面 C 左上角和右下角的坐标 # 计算最小闭合凸面 C 左上角和右下角的坐标 enclose_left_up = tf.minimum(boxes1[..., :2], boxes2[..., :2]) enclose_right_down = tf.maximum(boxes1[..., 2:], boxes2[..., 2:]) enclose = tf.maximum(enclose_right_down - enclose_left_up, 0.0) outer_diag = (enclose[:, 0] ** 2) + (enclose[:, 1] ** 2) u = (inter_diag) / outer_diag # 根据 CIoU 公式计算 CIoU 值 # arctan = tf.atan(w2 / h2) - tf.atan(w1 / h1) # v = (4 / (math.pi ** 2)) * np.square(tf.atan(w2 / h2) - tf.atan(w1 / h1)) arctan = tf.atan(w2 / (h2 + epsilon)) - tf.atan(w1 / (h1 + epsilon)) v = (4 / (math.pi ** 2)) * np.square(tf.atan(w2 / (h2 + epsilon)) - tf.atan(w1 / (h1 + epsilon))) S = 1 - iou alpha = tf.stop_gradient(v / (S + v)) w_temp = tf.stop_gradient(2 * w1) distance = tf.stop_gradient(w1 ** 2 + h1 ** 2 + epsilon) ar = (8 / (math.pi ** 2)) * arctan * ((w1 - w_temp) * h1) if not normaled: cious = iou - (u + alpha * ar / distance) else: cious = iou - (u + alpha * ar) cious = tf.clip_by_value(cious, clip_value_min=-1.0, clip_value_max=1.0) return cious img_width = 480.0 img_height = 320.0 gt_bboxes_xyxy = np.array([[50, 40, 200, 200], [270, 70, 400, 180]]) # xyxy pre_bboxes_xyxy = np.array([[100, 100, 250, 300], [400, 180, 460, 300]]) # xyxy gt_bboxes_xyxy_nomal = np.zeros(shape=gt_bboxes_xyxy.shape, dtype=np.float) pre_bboxes_xyxy_nomal = np.zeros(shape=pre_bboxes_xyxy.shape, dtype=np.float) gt_bboxes_xyxy_nomal[..., 0::2] = gt_bboxes_xyxy[..., 0::2] / img_width gt_bboxes_xyxy_nomal[..., 1::2] = gt_bboxes_xyxy[..., 1::2] / img_height pre_bboxes_xyxy_nomal[..., 0::2] = pre_bboxes_xyxy[..., 0::2] / img_width pre_bboxes_xyxy_nomal[..., 1::2] = pre_bboxes_xyxy[..., 1::2] / img_height gt_bboxes_xywh = np.array([[125, 120, 150, 160], [335, 125, 130, 110]]) # xywh pre_bboxes_xywh = np.array([[175, 200, 150, 200], [430, 240, 60, 120]]) # xywh gt_bboxes_xywh_nomal = np.zeros(shape=gt_bboxes_xywh.shape, dtype=np.float) pre_bboxes_xywh_nomal = np.zeros(shape=pre_bboxes_xywh.shape, dtype=np.float) gt_bboxes_xywh_nomal[..., 0::2] = gt_bboxes_xywh[..., 0::2] / img_width gt_bboxes_xywh_nomal[..., 1::2] = gt_bboxes_xywh[..., 1::2] / img_height pre_bboxes_xywh_nomal[..., 0::2] = pre_bboxes_xywh[..., 0::2] / img_width pre_bboxes_xywh_nomal[..., 1::2] = pre_bboxes_xywh[..., 1::2] / img_height # ================================================================ # fig = plt.figure() ax = fig.add_subplot(111) currentAxis = plt.gca() for idx, (gt, pt) in enumerate(zip(gt_bboxes_xywh, pre_bboxes_xywh)): iou = IoU(gt, pt, True) giou = GIoU(gt, pt, True) diou = DIoU(gt, pt, True) ciou = CIoU(gt, pt, True) currentAxis.text(gt[0] - gt[2] / 2, 20, 'iou={:.4f}, giou={:.4f}'.format(iou, giou), bbox={'facecolor': 'yellow', 'alpha': 0.5}) currentAxis.text(gt[0] - gt[2] / 2, gt[1] + gt[3] / 2 + 20, 'diou={:.4f}, ciou={:.4f}'.format(diou, ciou), bbox={'facecolor': 'yellow', 'alpha': 0.5}) currentAxis.add_patch(plt.Rectangle((gt[0]-gt[2]/2,gt[1]-gt[3]/2),gt[2],gt[3], fill=False, edgecolor='green', linewidth=2)) currentAxis.text(gt[0]-gt[2]/2,gt[1]-gt[3]/2, 'g{}'.format(idx), bbox={'facecolor': 'green', 'alpha': 0.5}) currentAxis.add_patch(plt.Rectangle((pt[0]-pt[2]/2, pt[1]-pt[3]/2), pt[2], pt[3], fill=False, edgecolor='red', linewidth=2)) currentAxis.text(pt[0]-pt[2]/2, pt[1]-pt[3]/2, 'p{}'.format(idx), bbox={'facecolor': 'red', 'alpha': 0.5}) plt.xticks(np.arange(0, img_width+1, 40)) plt.yticks(np.arange(0, img_height+1, 40)) currentAxis.invert_yaxis() plt.show() # ================================================================ # import tensorflow as tf import torch label_bbox = tf.placeholder(dtype=tf.float32, name='label_bbox') predic_bbox = tf.placeholder(dtype=tf.float32, name='predic_bbox') label_bbox_normal = tf.placeholder(dtype=tf.float32, name='label_bbox_normal') predic_bbox_normal = tf.placeholder(dtype=tf.float32, name='predic_bbox_normal') # ================================================================ # # GIoU # # ================================================================ # gious = np.expand_dims(bbox_giou_np(gt_bboxes_xywh, pre_bboxes_xywh), axis=-1) print('numpy publish giou: ', gious) # ================================================================ # gious = tf.expand_dims(bbox_giou_tf(predic_bbox, label_bbox), axis=-1) with tf.Session() as sess: result = sess.run(gious, feed_dict={label_bbox: gt_bboxes_xywh, predic_bbox: pre_bboxes_xywh} ) print('tensorflow publish giou: ', result) # ================================================================ # gious = bbox_giou_torch(gt_bboxes_xywh, pre_bboxes_xywh).unsqueeze(-1) print('pytorch publish goiu: ', gious.numpy()) # ================================================================ # # DIoU # # ================================================================ # dious = np.expand_dims(bbox_diou_np(gt_bboxes_xywh, pre_bboxes_xywh), axis=-1) print('numpy publish diou : ', dious) # ================================================================ dious = bbox_diou_torch(gt_bboxes_xyxy, pre_bboxes_xyxy).unsqueeze(-1) print('pytorch publish diou: ', dious.numpy()) # ================================================================ label_bbox = tf.placeholder(dtype=tf.float32, name='label_bbox') predic_bbox = tf.placeholder(dtype=tf.float32, name='predic_bbox') dious = tf.expand_dims(bbox_diou_tf(label_bbox, predic_bbox), axis=-1) with tf.Session() as sess: result = sess.run(dious, feed_dict={label_bbox: gt_bboxes_xywh, predic_bbox: pre_bboxes_xywh}) print('tensorflow publish diou: ', result) # ================================================================ # # CIoU # # ================================================================ # cious = bbox_ciou_torch(gt_bboxes_xyxy, pre_bboxes_xyxy, False).unsqueeze(-1) print('pytorch publish ciou unnormaled: ', cious.numpy()) cious = bbox_ciou_torch(gt_bboxes_xyxy_nomal, pre_bboxes_xyxy_nomal, True).unsqueeze(-1) print('pytorch publish ciou normaled: ', cious.numpy()) # ================================================================ # cious = np.expand_dims(bbox_ciou_np(gt_bboxes_xywh, pre_bboxes_xywh, False), axis=-1) print('numpy publish ciou unnormaled: ', cious) cious = np.expand_dims(bbox_ciou_np(gt_bboxes_xywh_nomal, pre_bboxes_xywh_nomal, True), axis=-1) print('numpy publish ciou normaled: ', cious) # ================================================================ # cious = tf.expand_dims(bbox_ciou_tf(label_bbox, predic_bbox, False), axis=-1) cious_normal = tf.expand_dims(bbox_ciou_tf(label_bbox_normal, predic_bbox_normal, True), axis=-1) with tf.Session() as sess: cious_tf, cious_tf_normal = sess.run([cious, cious_normal], feed_dict={label_bbox_normal: gt_bboxes_xywh_nomal, predic_bbox_normal: pre_bboxes_xywh_nomal, label_bbox: gt_bboxes_xywh, predic_bbox: pre_bboxes_xywh}) print('tensorflow publish ciou unnormaled:', cious_tf) print('tensorflow publish ciou normaled: ', cious_tf_normal) # ================================================================ #
numpy publish giou: [[ 0.07342657] [-0.50800915]] tensorflow publish giou: [[ 0.07342657] [-0.50800914]] pytorch publish goiu: [[ 0.07342657] [-0.50800914]] numpy publish diou : [[ 0.14455897] [-0.25 ]] pytorch publish diou: [[ 0.14455898] [-0.25 ]] tensorflow publish diou: [[ 0.14455898] [-0.25 ]] pytorch publish ciou unnormaled: [[ 0.14428109] [-0.2600825 ]] pytorch publish ciou normaled: [[ 0.1392411 ] [-0.25120372]] numpy publish ciou unnormaled: [[ 0.14428107] [-0.26008251]] numpy publish ciou normaled: [[ 0.13924112] [-0.25120372]] tensorflow publish ciou unnormaled: [[ 0.14428109] [-0.2600825 ]] tensorflow publish ciou normaled: [[ 0.13924108] [-0.25120363]]
同事实验下来:
method | GIoU | DIoU | CIoU |
mAP | 81.37% | 81.46% | 82.36% |