目标检测算法-YOLO-V1训练代码详解

YOLO-V1网络结构由24个卷积层与2个全连接层构成,网络入口为448×448×3,输出维度:S×S×(B×5+C),S为划分网格数,B为每个网格负责目标个数,C为类别个数。

YOLO-V1是将一副图像分成S×S个网格,如果某个object的中心落在这个网格中,则这个网格就负责预测这个object,每个网格要预测B个bounding box,每个bounding box要预测一个confidence值,这个confidence代表了所预测的bounding box中含有object的置信度和这个bounding box预测的有多准这两个重要信息。

Pr(Object)IoUpredtruth

如果有object落在一个网格中,公式第一项取1,否则取0,第二项是bounding box和真实框的IOU的值(confidence针对每个bounding box,框中有没有网格包含object中心点。YOLO-V1中每个网格有两个bounding box,对于每个bounding box有5个预测值,x,y,w,h,confidence,每一个网格还要预测C条件类别的概率,即在一个网格包含一个object的前提下,它属于某个类别的概率。(x,y)表示bounding box相对于网格单元的边界的offset,归一化到(0,1)范围之内,而w,h表示相对于整个图片的预测宽和高,也被归一化到(0,1)范围内。c代表的是object在某个bounding box的confidence。confidence计算如下:

 Pr(ClassiObject)Pr(Object)IoUpredtruth=Pr(Classi)IoUpredtruth

下面说明如何将预测坐标的x,y用相对于对应网格的offset归一化到0-1和w,h是如何利用图像的宽高归一化到0-1之间。每个单元格预测的B个(x,y,w,h,confidence)向量,假设图片为S×S个网格,S=7,图片宽为w​i高为hi 。

下面引用一张我看过的感觉讲解很详细的一张图片:

 

1.(x,y)是bbox的中心相对于单元格的offset对应于上图中的蓝色单元格,坐标为(xcol=1,yrow=4),加射它的预测输出是红色框bbox,设bbox的中心坐标为(xc,yc),那么最终预测出来的(x,y)是经过归一化处理的,表示的是相对于单元格的offset,公式为:x=wi​ / x∗ Sxcoly=hi / y∗ Syrow

 2.(w,h)是bbox相对于整个图片的比例预测的bbox的宽高为wb,hb,(w,h)表示的是bbox相对于整张图片的占比,公式为:w=wi​ / wb,h=hi / hb

YOLO-V1中需要的参数

 


 

 

 1 def __init__(self):
 2     self.classes = ["aeroplane", "bicycle", "bird", "boat", "bottle",
 3                     "bus", "car", "cat", "chair", "cow", "diningtable",
 4                     "dog", "horse", "motorbike", "person", "pottedplant",
 5                     "sheep", "sofa", "train", "tvmonitor"]
 6     #计算坐标用的
 7     self.x_offset = np.transpose(np.reshape(np.array([np.arange(7)] * 7 * 2, dtype=np.float32), [2, 7, 7]), [1, 2, 0])
 8     self.y_offset = np.transpose(self.x_offset, [1, 0, 2])
 9     #输入图片大小
10     self.img_size = (448, 448)
11     #阈值
12     self.iou_threshold = 0.5
13     self.batch_size = 45
14     #计算loss需要的参数
15     self.class_scale = 2.0
16     self.object_scale = 1.0
17     self.noobject_scale = 1.0
18     self.coord_scale = 5.0

 


网络部分开始


 

 1 def _build_net(self):
 2     x = tf.placeholder(tf.float32, [None, 448, 448, 3])
 3     with tf.variable_scope('yolo'):
 4         net = self.conv_layer(x, 64, 7, 2, 'conv_2')
 5         net = self.max_pool_layer(net, 2, 2)
 6         net = self.conv_layer(net, 192, 3, 1, 'conv_4')
 7         net = self.max_pool_layer(net, 2, 2)
 8         net = self.conv_layer(net, 128, 1, 1, 'conv_6')
 9         net = self.conv_layer(net, 256, 3, 1, 'conv_7')
10         net = self.conv_layer(net, 256, 1, 1, 'conv_8')
11         net = self.conv_layer(net, 512, 3, 1, 'conv_9')
12         net = self.max_pool_layer(net, 2, 2)
13         net = self.conv_layer(net, 256, 1, 1, 'conv_11')
14         net = self.conv_layer(net, 512, 3, 1, 'conv_12')
15         net = self.conv_layer(net, 256, 1, 1, 'conv_13')
16         net = self.conv_layer(net, 512, 3, 1, 'conv_14')
17         net = self.conv_layer(net, 256, 1, 1, 'conv_15')
18         net = self.conv_layer(net, 512, 3, 1, 'conv_16')
19         net = self.conv_layer(net, 256, 1, 1, 'conv_17')
20         net = self.conv_layer(net, 512, 3, 1, 'conv_18')
21         net = self.conv_layer(net, 512, 1, 1, 'conv_19')
22         net = self.conv_layer(net, 1024, 3, 1, 'conv_20')
23         net = self.max_pool_layer(net, 2, 2)
24         net = self.conv_layer(net, 512, 1, 1, 'conv_22')
25         net = self.conv_layer(net, 1024, 3, 1, 'conv_23')
26         net = self.conv_layer(net, 512, 1, 1, 'conv_24')
27         net = self.conv_layer(net, 1024, 3, 1, 'conv_25')
28         net = self.conv_layer(net, 1024, 3, 1, 'conv_26')
29         net = self.conv_layer(net, 1024, 3, 2, 'conv_28')
30         net = self.conv_layer(net, 1024, 3, 1, 'conv_29')
31         net = self.conv_layer(net, 1024, 3, 1, 'conv_30')
32         net = self.flatten_layer(net)
33         net = self.dense_layer(net, 512, activation=self.Leaky_Relu, scope='fc_33')
34         net = self.dense_layer(net, 4096, activation=self.Leaky_Relu, scope='fc_34')
35         net = self.dense_layer(net, 7 * 7 * 30, scope='fc_36')
36     return net

 

需要的一些层

 1 # 激活函数使用Leaky
 2 def Leaky_Relu(self, x):
 3     return tf.maximum(x * 0.1, x)
 4 # 卷积层
 5 def conv_layer(self, x, filter, kernel_size, stride, scope):
 6     channel = x.get_shape().as_list()[-1]
 7     weight = tf.Variable(tf.truncated_normal(shape=[kernel_size, kernel_size, channel, filter], stddev=0.1),
 8                          name="weights")
 9     bias = tf.Variable(tf.zeros([filter, ]), name="biases")
10     pad_size = kernel_size // 2
11     x = tf.pad(x, paddings=[[0, 0], [pad_size, pad_size], [pad_size, pad_size], [0, 0]])
12 
13     conv = tf.nn.conv2d(x, weight, strides=[1, stride, stride, 1], padding="VALID", name=scope)
14     output = self.Leaky_Relu(tf.nn.bias_add(conv, bias))
15     return output
16 # 最大池化层
17 def max_pool_layer(self, x, pool_size, stride):
18     return tf.nn.max_pool(x, [1, pool_size, pool_size, 1], strides=[1, stride, stride, 1], padding="SAME")
19 # 全连接层
20 def dense_layer(self, x, filter, activation=None, scope=None):
21     channel = x.get_shape().as_list()[-1]
22     weight = tf.Variable(tf.truncated_normal(shape=[channel, filter], stddev=0.1), name="weights")
23     bias = tf.Variable(tf.zeros([filter, ]), name="biases")
24     output = tf.nn.xw_plus_b(x, weight, bias, name=scope)
25     if activation:
26         output = activation(output)
27     return output
28 # flatten层
29 def flatten_layer(self, x):
30     x = tf.transpose(x, [0, 3, 1, 2])
31     shape = x.get_shape().as_list()[1:]
32     nums = np.product(shape)
33     return tf.reshape(x, [-1, nums])

网络部分结束


 

 

 

损失函数部分

YOLO-V1损失函数:

 

 

 

(1)只有当某个网格中有object的时候才对类别预测进行惩罚。

(2)只有当某个bounding box对某个真实框负责的时候,才会对box的坐标预测进行惩罚,而对哪个真实框负责就看其bounding box和真实框的IOU是不是在那个网格中的所有box中最大。

 

为什么公式中对w,h开根号呢?


黑的框为bounding box,红色的框跟绿色的框为真实标注框,如果w,h没有平方根,那么bounding box跟两个真实标注的位置loss是相同的,但是从面积来看黑色的框是绿色的25倍,红色的框是黑色的81/25倍,黑色框跟绿色框的大小偏差更大,

不应该得到相同的loss,如果w和h加上平方根,那么才更加符合我们的实际判断。

计算IOU的函数

 1 def calc_iou(self, bboxes1, bboxes2):
 2     # 计算两个box的交集:交集左上角的点取两个box的max,交集右下角的点取两个box的min
 3     int_ymin = np.maximum(bboxes1[..., 0], bboxes2[..., 0])
 4     int_xmin = np.maximum(bboxes1[..., 1], bboxes2[..., 1])
 5     int_ymax = np.minimum(bboxes1[..., 2], bboxes2[..., 2])
 6     int_xmax = np.minimum(bboxes1[..., 3], bboxes2[..., 3])
 7 
 8     # 计算两个box交集的wh:如果两个box没有交集,那么wh为0(按照计算方式wh为负数,跟0比较取最大值)
 9     int_h = np.maximum(int_ymax - int_ymin, 0.)
10     int_w = np.maximum(int_xmax - int_xmin, 0.)
11 
12     # 计算IOU
13     int_vol = int_h * int_w  # 交集面积
14     vol1 = (bboxes1[..., 2] - bboxes1[..., 0]) * (bboxes1[..., 3] - bboxes1[..., 1])  # bboxes1面积
15     vol2 = (bboxes2[..., 2] - bboxes2[..., 0]) * (bboxes2[..., 3] - bboxes2[..., 1])  # bboxes2面积
16     iou = int_vol / (vol1 + vol2 - int_vol)  # IOU=交集/并集
17     return iou
  1 def loss_layer(self, predicts, labels, scope='loss_layer'):
  2     # label为((batch_size,7,7,25))  5个为盒子信息  (x,y,w,h,c)  后20个为类别
  3     with tf.variable_scope(scope):
  4         # 预测值
  5         # class-20
  6         #网络输出是(batch_size,1470)
  7         predict_classes = tf.reshape(
  8             predicts[:, :7 * 7 * 20],
  9             [self.batch_size, 7, 7, 20])
 10         # confidence-2
 11         predict_confidence = tf.reshape(
 12             predicts[:, 7 * 7 * 20:7 * 7 * 20 + 7 * 7 * 2],
 13             [self.batch_size, 7, 7, 2])
 14         # bounding box-2*4
 15         predict_boxes = tf.reshape(
 16             predicts[:, 7 * 7 * 20 + 7 * 7 * 2:],
 17             [self.batch_size, 7, 7, 2, 4])
 18 
 19         # 实际值
 20         # shape(45,7,7,1)
 21         # response中的值为0或者1.对应的网格中存在目标为1,不存在目标为0.
 22         # 存在目标指的是存在目标的中心点,并不是说存在目标的一部分。所以,目标的中心点所在的cell其对应的值才为1,其余的值均为0
 23         response = tf.reshape(
 24             labels[..., 0],
 25             [self.batch_size, 7, 7, 1])
 26         # shape(45,7,7,1,4)
 27         boxes = tf.reshape(
 28             labels[..., 1:5],
 29             [self.batch_size, 7, 7, 1, 4])
 30         # shape(45,7,7,2,4),boxes的四个值,取值范围为0~1
 31         boxes = tf.tile(
 32             boxes, [1, 1, 1, 2, 1]) / self.img_shape[0]
 33         # shape(45,7,7,20)
 34         classes = labels[..., 5:]
 35 
 36         # self.offset shape(7,7,2)
 37         # offset shape(1,7,7,2)
 38 
 39         # shape(45,7,7,2)
 40         x_offset = tf.tile(self.x_offset, [self.batch_size, 1, 1, 1])  # (45,7,7,2)
 41         # shape(45,7,7,2)
 42         y_offset = tf.transpose(x_offset, (0, 2, 1, 3))
 43 
 44 
 45         # shape(45,7,7,2,4)  ->(x,y,w,h)
 46         predict_boxes_tran = tf.stack(
 47             [(predict_boxes[..., 0] + x_offset) / 7,
 48              (predict_boxes[..., 1] + y_offset) / 7,
 49              tf.square(predict_boxes[..., 2]),
 50              tf.square(predict_boxes[..., 3])], axis=-1)
 51 
 52         # 预测box与真实box的IOU,shape(45,7,7,2)
 53         iou_predict_truth = self.calc_iou(predict_boxes_tran, boxes)
 54 
 55         # shape(45,7,7,1)
 56         # 在训练时,如果该单元格内确实存在目标,那么只选择IOU最大的那个边界框来负责预测该目标,而其它边界框认为不存在目标
 57         object_mask = tf.reduce_max(iou_predict_truth, axis=3, keep_dims=True)
 58         # object_mask shape(45,7,7,2)
 59         object_mask = tf.cast(
 60             (iou_predict_truth >= object_mask), tf.float32) * response
 61 
 62         # noobject confidence(45,7,7,2)
 63         #单元格内没有物体的地方为1有物体的地方为0
 64         noobject_probs = tf.ones_like(
 65             object_mask, dtype=tf.float32) - object_mask
 66 
 67         # shape(45,7,7,2,4),对boxes的四个值进行规整,xy为相对于网格左上角,wh为取根号后的值,范围0~1
 68         boxes_tran = tf.stack(
 69             [boxes[..., 0] * 7 - x_offset,
 70              boxes[..., 1] * 7 - y_offset,
 71              tf.sqrt(boxes[..., 2]),
 72              tf.sqrt(boxes[..., 3])], axis=-1)
 73 
 74         # class_loss shape(45,7,7,20)
 75         class_delta = response * (predict_classes - classes)
 76         class_loss = tf.reduce_mean(
 77             tf.reduce_sum(tf.square(class_delta), axis=[1, 2, 3]),
 78             name='class_loss') * self.class_scale
 79 
 80         # object_loss  confidence=iou*p(object)
 81         # p(object)的值为1或0
 82         object_delta = object_mask * (predict_confidence - iou_predict_truth)
 83         object_loss = tf.reduce_mean(
 84             tf.reduce_sum(tf.square(object_delta), axis=[1, 2, 3]),
 85             name='object_loss') * self.object_scale
 86 
 87         # noobject_loss  p(object)的值为0
 88         noobject_delta = noobject_probs * predict_confidence
 89         noobject_loss = tf.reduce_mean(
 90             tf.reduce_sum(tf.square(noobject_delta), axis=[1, 2, 3]),
 91             name='noobject_loss') * self.noobject_scale
 92 
 93         # coord_loss
 94         coord_mask = tf.expand_dims(object_mask, 4)
 95         boxes_delta = coord_mask * (predict_boxes - boxes_tran)
 96         coord_loss = tf.reduce_mean(
 97             tf.reduce_sum(tf.square(boxes_delta), axis=[1, 2, 3, 4]),
 98             name='coord_loss') * self.coord_scale
 99 
100         return class_loss + object_loss + noobject_loss + coord_loss

损失函数部分结束


YOLO_V1缺点
1.每个网格只对应2个bounding box,当物体的长宽比不常见(也就是训练数据覆盖不到时),效果较差。

2.原始图片只划分为7×7的网格,当两个物体考的很近时,效果比较差。

3.最终每个网格只对应一个类别,容易出现漏检(物体没有被识别到) eg:两个物体中心点相同

4.对于图片中比较小的物体,效果比较差。




 

 

 

 

posted @ 2020-04-29 15:31  蓉儿不是小妖女  阅读(832)  评论(0编辑  收藏  举报