Pytorch从0开始实现YOLO V3指南 part2——搭建网络结构层

本节翻译自：https://blog.paperspace.com/how-to-implement-a-yolo-v3-object-detector-from-scratch-in-pytorch-part-2/

必备条件：

此教程part1-YOLO的工作原理
PyTorch的基本工作知识，包括如何使用 nn.Module, nn.Sequential and torch.nn.parameter 等类创建自定义网络结构

下面我将假设你有了一定的PyTorch基础。如果您是一个入门者，我建议您先学习一下这个框架。

开始：

首先创建一个文件夹，我们将检测器的代码放在这个文件夹下。

然后创建一个darknet.py文件。Darknet是YOLO的底层架构。这个文件将包含构建YOLO网络的代码。我们还有一个叫util.py的文件，这个文件包含了一系列工具性代码，可以帮助构建网络。将这两个文件都放在你创建的文件夹下。可以使用git来对改动进行追踪。

配置文件：

官方代码（用C写的）使用配置文件来搭建网络。cfg文件描述了网络每一层的结构。如果你之前用过caffe框架，这个文件就相当于描述网络结构的.protxt文件。

我们也会使用官方的cfg文件来搭建网络。可以从这里下载，然后把它放在一个名为cfg的文件夹下。如果你用的是Linux，cd到你的工作路径并输入：

mkdir cfg
cd cfg
wget https://raw.githubusercontent.com/pjreddie/darknet/master/cfg/yolov3.cfg

打开配置文件，你看到的就是下面这样的：

[convolutional]
batch_normalize=1
filters=64
size=3
stride=2
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=32
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky

[shortcut]
from=-3
activation=linear

上面有4个block。其中前三个描述的是卷积层，后面是一个shortcut层。shortcut层也就是跨层连接，就像ResNet用的那种。YOLO中使用了5种类型的层。如下：

Convolutional

[convolutional]
batch_normalize=1  
filters=64  
size=3  
stride=1  
pad=1  
activation=leaky

Shortcut

[shortcut]
from=-3  
activation=linear

这里有个from参数为-3，它代表shortcut层的输出是将前一层输出与后面第三层的特征图加起来得到的。

Upsample

[upsample]
stride=2

用双线性上采样，以因子stride对上一层的特征图进行上采样。

Route:

[route]
layers = -4

[route]
layers = -1, 61

route层需要解释下。它有一个属性layers,可以是一个值也可以是两个值

当layers属性只有一个值的时候，它输出的是索引处的特征图。在我们实验中，它是-4就代表输出的特征图将来自于Route层后的第4层

当layers属性有两个值时，它返回的是按照索引值连接起来的特征图。在我们实验中为-1，61就代表输出的特征图来自Route层前一层（-1）和第61层在深度维度上的拼接。

YOLO:

[yolo]
mask = 0,1,2
anchors = 10,13,  16,30,  33,23,  30,61,  62,45,  59,119,  116,90,  156,198,  373,326
classes=80
num=9
jitter=.3
ignore_thresh = .5
truth_thresh = 1
random=1

YOLO层就对应着part1中描述的检测层。anchors属性定义了9个锚，但是只会使用mask属性指定索引位置处的那些锚。这里mask的值为0,1,2就代表第一个，第二个和第三个锚被使用。这是有道理的，因为每个检测层的每个cell会预测三个边界框。我们有三种不同尺寸的锚，所以总共会有9个锚盒。

Net:

[net]
# Testing
batch=1
subdivisions=1
# Training
# batch=64
# subdivisions=16
width= 320
height = 320
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

这是cfg中的一种block类型叫做net，但是我们不把他当作一个层，因为它只描述了网络的输入和训练参数的信息。在YOLO的前向传播过程中不会用到。但是它提供给我们网络输入尺寸的信息，我们用它来调整前向传播过程中的锚。

解析权重文件：

开始之前，我们在darknet.py文件的开头导入几个必要的库。

from __future__ import division

import torch 
import torch.nn as nn
import torch.nn.functional as F 
from torch.autograd import Variable
import numpy as np

我们定义一个parse_cfg函数，它接收配置文件的路径作为输入。

def parse_cfg(cfgfile):
    """
    Takes a configuration file
    
    Returns a list of blocks. Each blocks describes a block in the neural
    network to be built. Block is represented as a dictionary in the list
    
    """

我们的思路是解析cfg文件，将每个block存为一个字典。也就是bolock的属性与它们的值通过键值对的方式存储在字典中。当我们解析cfg时，字典不断增加，最终形成一个blocks列表。在函数的最后返回这个列表。

我们首先将cfg文件的内容存放到一个字符串列表中。下面的代码来执行这个过程：

file = open(cfgfile, 'r')
lines = file.read().split('\n')                        # store the lines in a list
lines = [x for x in lines if len(x) > 0]               # get read of the empty lines 
lines = [x for x in lines if x[0] != '#']              # get rid of comments
lines = [x.rstrip().lstrip() for x in lines]           # get rid of fringe whitespaces

然后循环遍历结果列表得到blocks

block = {}
blocks = []

for line in lines:
    if line[0] == "[":               # This marks the start of a new block
        if len(block) != 0:          # If block is not empty, implies it is storing values of previous block.
            blocks.append(block)     # add it the blocks list
            block = {}               # re-init the block
        block["type"] = line[1:-1].rstrip()     
    else:
        key,value = line.split("=") 
        block[key.rstrip()] = value.lstrip()
blocks.append(block)

return blocks

创建构建块：

现在我们将使用上面parse_cfg函数返回的列表来构建配置文件中所展示的Pytorch模块。

上面我们提到了列表中有5种类型的层。Pytorch对于convolutional和upsample层已经有了预定义的实现。剩下来的层我们需要通过nn.Module类进行构建。

create_modules函数将接收parse_cfg函数返回的blocks列表作为输入。

def create_modules(blocks):
    net_info = blocks[0]     #Captures the information about the input and pre-processing    
    module_list = nn.ModuleList()
    prev_filters = 3
    output_filters = []

在迭代blocks列表之前，我们需要定义一个变量net_info来保存网络信息。

nn.ModuleList:

我们的函数将会返回一个 nn.ModuleList。这个类就好比包含 nn.Module 对象的普通列表。但是在我们向nn.ModuleList里增加成员nn.Moudle对象的时候（也就是在往我们的网络中增加模块），nn.ModuleList里面所有nn.Module对象的参数也会被加入到网络参数中去。

当定义新卷积层的时候需要指定它的卷积核的维度。cfg文件已经提供了卷积核的高度与宽度，卷积核的深度就是上一层卷积核的数目。这就意味着我们需要记录之前已经应用的卷积层的卷积核数目，我们使用变量prev_filter来做这个。初始数目为3，对应着RGB三个通道。

route层会从之前层获得特征映射，如果route前面正好有一个卷积层，那就需要在之前层的特征图上做卷积。因此我们需要记录的不止是前一层的卷积核数目，而是前面每一层的。当我们开始迭代时，我们将每个block的输出滤波器的数目加到列表output_filters中。

现在的想法就是迭代blocks列表，然后为每个block创建一个PyTorch模块。

   for index, x in enumerate(blocks[1:]):
        module = nn.Sequential()

        #check the type of block
        #create a new module for the block
        #append to module_list

nn.Sequential类是用来按顺序执行一系列nn.Module对象。如果你看看cfg,你就知道了每个block或许不止一层。举例来说，convolutional这种block类型除了卷积层之外还有batch norm层以及leakey ReLU激活层。我们使用 nn.Sequential 和它的 add_module 函数将这些层串在一起。以下举例说明了如何构建convolutional层和upsample层。

  if (x["type"] == "convolutional"):
         #Get the info about the layer
         activation = x["activation"]
         try:
             batch_normalize = int(x["batch_normalize"])
             bias = False
         except:
             batch_normalize = 0
             bias = True

         filters= int(x["filters"])
         padding = int(x["pad"])
         kernel_size = int(x["size"])
         stride = int(x["stride"])

         if padding:
             pad = (kernel_size - 1) // 2
         else:
             pad = 0

         #Add the convolutional layer
         conv = nn.Conv2d(prev_filters, filters, kernel_size, stride, pad, bias = bias)
         module.add_module("conv_{0}".format(index), conv)

         #Add the Batch Norm Layer
         if batch_normalize:
             bn = nn.BatchNorm2d(filters)
             module.add_module("batch_norm_{0}".format(index), bn)

         #Check the activation. 
         #It is either Linear or a Leaky ReLU for YOLO
         if activation == "leaky":
             activn = nn.LeakyReLU(0.1, inplace = True)
             module.add_module("leaky_{0}".format(index), activn)

  #If it's an upsampling layer
  #We use Bilinear2dUpsampling
  elif (x["type"] == "upsample"):
        stride = int(x["stride"])
        upsample = nn.Upsample(scale_factor = 2, mode = "bilinear")
        module.add_module("upsample_{}".format(index), upsample)

Route Layer / shortcut Layer：

下面我们写代码创建Route和Shortcut层

  #If it is a route layer
  elif (x["type"] == "route"):
      x["layers"] = x["layers"].split(',')
      #Start  of a route
      start = int(x["layers"][0])
      #end, if there exists one.
      try:
          end = int(x["layers"][1])
      except:
          end = 0
      #Positive anotation
      if start > 0: 
          start = start - index
      if end > 0:
          end = end - index
      route = EmptyLayer()
      module.add_module("route_{0}".format(index), route)
      if end < 0:
          filters = output_filters[index + start] + output_filters[index + end]
      else:
          filters= output_filters[index + start]

  #shortcut corresponds to skip connection
  elif x["type"] == "shortcut":
     shortcut = EmptyLayer()
     module.add_module("shortcut_{}".format(index), shortcut)

Route层的构建代码需要稍微深入解释一下，首先我们提取layers属性，将其转换为整型储存到列表之中。

之后我们用了一个新的层，叫做EmptyLayer,这其实是一个空的层。

route = EmptyLayer()

它是这样定义的：

class EmptyLayer(nn.Module):
    def __init__(self):
        super(EmptyLayer, self).__init__()

为什么我们需要定义一个空的层呢？

空的层看起来就是什么都没做，所以显得有点奇怪。其实Route层像其他任何层一样也是执行一个特定操作（对前面层进行前向传播/拼接）。在Pytorch中，当我们定义一个层就是去继承nn.Module类，然后在类内部的forward函数里面写需要执行的操作。

为了定义Route层，我们得初始化一个nn.Moudle对象，并将属性layers作为它的成员变量。之后我们就可以在forward函数里面写代码来对特征图进行拼接/前传。

但考虑到拼接的代码其实非常简单（对特征图使用torch.cat），定义一个层属于不必要的抽象，导致模板代码的增加。所以我们可以定义一个假的层来替代前面提到的Route层，之后在darknet的forward函数里面直接进行拼接。

Route层前面的卷积层会将对之前的特征图施加卷积操作（有可能是拼接）。下面的代码会更更新filters变量来保存Route层输出的卷积核数量。

if end < 0:
    #If we are concatenating maps
    filters = output_filters[index + start] + output_filters[index + end]
else:
    filters= output_filters[index + start]

shortcut层也是使用empty层定义的，因为它的操作也很简单（相加）。它不需要更新filter变量因为它仅仅是将前一层的特征图添加到后一层。

YOLO Layer:

最后我们来写代码创建YOLO层。

#Yolo is the detection layer
 elif x["type"] == "yolo":
    mask = x["mask"].split(",")
    mask = [int(x) for x in mask]

    anchors = x["anchors"].split(",")
    anchors = [int(a) for a in anchors]
    anchors = [(anchors[i], anchors[i+1]) for i in range(0, len(anchors),2)]
    anchors = [anchors[i] for i in mask]

    detection = DetectionLayer(anchors)
    module.add_module("Detection_{}".format(index), detection)

我们定义一个新的层DetectionLayer，它包含检测边界框需要用到的anchors。

DetectionLayer定义为：

class DetectionLayer(nn.Module):
    def __init__(self, anchors):
        super(DetectionLayer, self).__init__()
        self.anchors = anchors

在循环的结尾，我们要做一些记录。

module_list.append(module)
prev_filters = filters
output_filters.append(filters)

循环的主体到此结束。在create_modules函数的最后，我们返回net_info与module_list.

return (net_info, module_list)

测试此代码：

你可以在darknet.py的结尾加下面几行，然后运行文件。

blocks = parse_cfg("cfg/yolov3.cfg")
print(create_modules(blocks))

你会看到一个长长的列表（准确来讲包含了106个元素），里面的元素类似于：

.
.

  (9): Sequential(
     (conv_9): Conv2d (128, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
     (batch_norm_9): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
     (leaky_9): LeakyReLU(0.1, inplace)
   )
   (10): Sequential(
     (conv_10): Conv2d (64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
     (batch_norm_10): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True)
     (leaky_10): LeakyReLU(0.1, inplace)
   )
   (11): Sequential(
     (shortcut_11): EmptyLayer(
     )
   )
.
.
.

这一部分到此结束。下一部分我们将会组装这些bolock,然后输入一张图片产生输出。

pengcw

Pytorch从0开始实现YOLO V3指南 part2——搭建网络结构层

Further Reading

公告