第四周作业:卷积神经网络(Part2)

一、理论学习

1、AlexNet的主要改进:在全连接层后增加丢弃法Dropout、使用ReLU减缓梯度消失、使用MaxPooling

  

  上下两层是两个GPU的结果,输入为227*227*3的图片(由224*224*3调整得来),第一层卷积的卷积核数量为96,卷积核大小为11*11*3,步长是4,padding=0,卷积后得到的图像大小feature map size为55*55*96,由于采用双CPU处理,所以数据分为两组,分别是55*55*48

  池化运算里步长为2,卷积核大小为3*3,计算过程是(55-3)/2+1=27,池化后的图像尺寸为27*27*96

  第二层卷积输入是27*27*96的像素层,步长为1,卷积核大小为5*5*256,padding=2,feature map 为27*27*256

  池化运算里步长为2,卷积核大小为3*3,计算过程是(27-3)/2+1=13,feature map 为13*13*256

  第三层卷积输入是13*13*128的像素层,步长为1,卷积核大小为3*3*384,padding=1,feature map 为13*13*384

  第四层卷积输入是13*13*192的像素层,卷积核大小为3*3*384,padding=1,feature map 为13*13*384

  第五层卷积输入是13*13*192的像素层,步长为1,卷积核大小为3*3*256,padding=1,feature map 为13*13*256

  池化运算里步长为2,卷积核大小为3*3,计算过程为(13-3)/2+1=6,feature map 为6*6*256

  第六层第七层是全连接层,神经元个数为4096,第八层全连接层神经元个数为1000,对应于ImageNet的1000类。

 1 net = nn.Sequential(
 2     nn.Conv2d(1, 96, kernel_size=11, stride=4, padding=1), nn.ReLU(),
 3     nn.MaxPool2d(kernel_size=3, stride=2),
 4     nn.Conv2d(96, 256, kernel_size=5, padding=2), nn.ReLU(),
 5     nn.MaxPool2d(kernel_size=3, stride=2),
 6     nn.Conv2d(256, 384, kernel_size=3, padding=1), nn.ReLU(),
 7     nn.Conv2d(384, 384, kernel_size=3, padding=1), nn.ReLU(),
 8     nn.Conv2d(384, 256, kernel_size=3, padding=1), nn.ReLU(),
 9     nn.MaxPool2d(kernel_size=3, stride=2), nn.Flatten(),
10     nn.Linear(6400, 4096), nn.ReLU(), nn.Dropout(p=0.5),
11     nn.Linear(4096, 4096), nn.ReLU(), nn.Dropout(p=0.5),
12     nn.Linear(4096, 10))

  除此之外,AlexNet还应用了“局部响应归一化层”,形象来说对于13×13×256,由于不需要太多高激活神经元,LRN要选取一个位置,从这个位置穿过整个通道得到256个数字并进行归一化。但是LRN的效果并不理想。

2、VGG:更大更深的AlexNet,使用可重复使用的卷积块构建网络。网络非常庞大,需要训练的特征数量也很大。随着网络的加深,图像的高度和宽度都在以一定的规律不断缩小,每次池化后刚好缩小一半,而通道数量在不断增加,而且刚好也是在每组卷积操作后增加一倍。也就是说,图像缩小的比例和通道数增加的比例是有规律的。

 1 #vgg块
 2 def vgg_block(num_convs, in_channels, out_channels):
 3     layers = []
 4     for _ in range(num_convs):
 5         layers.append(
 6             nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1))
 7         layers.append(nn.ReLU())
 8         in_channels = out_channels
 9     layers.append(nn.MaxPool2d(kernel_size=2, stride=2))
10     return nn.Sequential(*layers)
11 
12 #vgg网络
13 conv_arch = ((1, 64), (1, 128), (2, 256), (2, 512), (2, 512))
14 
15 def vgg(conv_arch):
16     conv_blks = []
17     in_channels = 1
18     for (num_convs, out_channels) in conv_arch:
19         conv_blks.append(vgg_block(num_convs, in_channels, out_channels))
20         in_channels = out_channels
21 
22     return nn.Sequential(*conv_blks, nn.Flatten(),
23                          nn.Linear(out_channels * 7 * 7, 4096), nn.ReLU(),
24                          nn.Dropout(0.5), nn.Linear(4096, 4096), nn.ReLU(),
25                          nn.Dropout(0.5), nn.Linear(4096, 10))

3、NiN网络:由于全连接层参数多,会产生过拟合

  无全连接层(NiN块里每个卷积后面两个全连接层)、交替使用NiN块和步长为2的最大池化层、最后使用全局平局池化得到输出。

 1 def nin_block(in_channels, out_channels, kernel_size, strides, padding):
 2     return nn.Sequential(
 3         nn.Conv2d(in_channels, out_channels, kernel_size, strides, padding),
 4         nn.ReLU(), nn.Conv2d(out_channels, out_channels, kernel_size=1),
 5         nn.ReLU(), nn.Conv2d(out_channels, out_channels, kernel_size=1),
 6         nn.ReLU())
 7     
 8 net = nn.Sequential(
 9     nin_block(1, 96, kernel_size=11, strides=4, padding=0),
10     nn.MaxPool2d(3, stride=2),
11     nin_block(96, 256, kernel_size=5, strides=1, padding=2),
12     nn.MaxPool2d(3, stride=2),
13     nin_block(256, 384, kernel_size=3, strides=1, padding=1),
14     nn.MaxPool2d(3, stride=2), nn.Dropout(0.5),
15     nin_block(384, 10, kernel_size=3, strides=1, padding=1),
16     nn.AdaptiveAvgPool2d((1, 1)),
17     nn.Flatten())

4、GoogLeNet

  Inception块:4个路径从不同层面抽取信息,在输出通道维进行合并。输入被copy成4块,第一块进行1x1卷积,第二块先进行1x1卷积再进行3x3卷积(pad=1),第三块先进行1x1卷积再进行5x5卷积(pad=2),第四块先进行3x3的最大池化(pad=1)再进行1x1卷积。采用不同大小的卷积核意味着不同大小的感受野,最后拼接意味着不同尺度特征的融合。四块的总体特征是输出跟输入高宽不变,即等高等宽,但是通道数改变。GoogleNet使用了9个Inception块,最后通道数不断增大变为1024。

  

 1 class Inception(nn.Module):
 2     def __init__(self, in_channels, c1, c2, c3, c4, **kwargs):
 3         super(Inception, self).__init__(**kwargs)
 4         self.p1_1 = nn.Conv2d(in_channels, c1, kernel_size=1)
 5         self.p2_1 = nn.Conv2d(in_channels, c2[0], kernel_size=1)
 6         self.p2_2 = nn.Conv2d(c2[0], c2[1], kernel_size=3, padding=1)
 7         self.p3_1 = nn.Conv2d(in_channels, c3[0], kernel_size=1)
 8         self.p3_2 = nn.Conv2d(c3[0], c3[1], kernel_size=5, padding=2)
 9         self.p4_1 = nn.MaxPool2d(kernel_size=3, stride=1, padding=1)
10         self.p4_2 = nn.Conv2d(in_channels, c4, kernel_size=1)
11 
12     def forward(self, x):
13         p1 = F.relu(self.p1_1(x))
14         p2 = F.relu(self.p2_2(F.relu(self.p2_1(x))))
15         p3 = F.relu(self.p3_2(F.relu(self.p3_1(x))))
16         p4 = F.relu(self.p4_2(self.p4_1(x)))
17         return torch.cat((p1, p2, p3, p4), dim=1)
18 
19 b1 = nn.Sequential(nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3),
20                    nn.ReLU(), nn.MaxPool2d(kernel_size=3, stride=2,
21                                            padding=1))
22 
23 b2 = nn.Sequential(nn.Conv2d(64, 64, kernel_size=1), nn.ReLU(),
24                    nn.Conv2d(64, 192, kernel_size=3, padding=1),
25                    nn.MaxPool2d(kernel_size=3, stride=2, padding=1))
26 
27 b3 = nn.Sequential(Inception(192, 64, (96, 128), (16, 32), 32),
28                    Inception(256, 128, (128, 192), (32, 96), 64),
29                    nn.MaxPool2d(kernel_size=3, stride=2, padding=1))
30 
31 b4 = nn.Sequential(Inception(480, 192, (96, 208), (16, 48), 64),
32                    Inception(512, 160, (112, 224), (24, 64), 64),
33                    Inception(512, 128, (128, 256), (24, 64), 64),
34                    Inception(512, 112, (144, 288), (32, 64), 64),
35                    Inception(528, 256, (160, 320), (32, 128), 128),
36                    nn.MaxPool2d(kernel_size=3, stride=2, padding=1))
37 
38 b5 = nn.Sequential(Inception(832, 256, (160, 320), (32, 128), 128),
39                    Inception(832, 384, (192, 384), (48, 128), 128),
40                    nn.AdaptiveAvgPool2d((1, 1)), nn.Flatten())
41 
42 net = nn.Sequential(b1, b2, b3, b4, b5, nn.Linear(1024, 10))

5、批量归一化:底部数据梯度小,训练较慢,后面的层梯度较大,底层变化所有层要跟着变,导致收敛变慢。对全连接层,作用在特征维;对卷积层,作用在通道维。批量归一化可以放在输出激活函数前,也可以作用在输入上。通常使用nn.batchnorm函数完成批量归一化。

6、ResNet:串联一个残差块加入到右边得到f(x)=x+g(x),使得很深的网络容易训练。随着网络深度增加,梯度逐渐消失,因此引入了残差单元,将前面一层的输出直接连到后面的第二层,进而抑制了退化问题。ResNet的层数很深,借鉴了Highway Network思想,使得原来的拟合输出F(x)变成输出和输入的差F(x)-x,其中F(X)是某一层原始的的期望映射输出,x是输入。

  

 1 class Residual(nn.Module):  
 2     def __init__(self, input_channels, num_channels, use_1x1conv=False,
 3                  strides=1):
 4         super().__init__()
 5         self.conv1 = nn.Conv2d(input_channels, num_channels, kernel_size=3,
 6                                padding=1, stride=strides)
 7         self.conv2 = nn.Conv2d(num_channels, num_channels, kernel_size=3,
 8                                padding=1)
 9         if use_1x1conv:
10             self.conv3 = nn.Conv2d(input_channels, num_channels,
11                                    kernel_size=1, stride=strides)
12         else:
13             self.conv3 = None
14         self.bn1 = nn.BatchNorm2d(num_channels)
15         self.bn2 = nn.BatchNorm2d(num_channels)
16         self.relu = nn.ReLU(inplace=True)
17 
18     def forward(self, X):
19         Y = F.relu(self.bn1(self.conv1(X)))
20         Y = self.bn2(self.conv2(Y))
21         if self.conv3:
22             X = self.conv3(X)
23         Y += X
24         return F.relu(Y)
25 
26 b1 = nn.Sequential(nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3),
27                    nn.BatchNorm2d(64), nn.ReLU(),
28                    nn.MaxPool2d(kernel_size=3, stride=2, padding=1))
29 
30 def resnet_block(input_channels, num_channels, num_residuals,
31                  first_block=False):
32     blk = []
33     for i in range(num_residuals):
34         if i == 0 and not first_block:
35             blk.append(
36                 Residual(input_channels, num_channels, use_1x1conv=True,
37                          strides=2))
38         else:
39             blk.append(Residual(num_channels, num_channels))
40     return blk
41 
42 b2 = nn.Sequential(*resnet_block(64, 64, 2, first_block=True))
43 b3 = nn.Sequential(*resnet_block(64, 128, 2))
44 b4 = nn.Sequential(*resnet_block(128, 256, 2))
45 b5 = nn.Sequential(*resnet_block(256, 512, 2))
46 
47 net = nn.Sequential(b1, b2, b3, b4, b5, nn.AdaptiveAvgPool2d((1, 1)),
48                     nn.Flatten(), nn.Linear(512, 10))

二、使用ResNet完成猫狗大战

 1、首先是使用李老师的resnet网络跑的猫狗大战,可以发现网络很不稳定,loss时大时小,得到的分类结果不如Alexnet。

  

 2、使用models自带的resnet50跑猫狗大战,准确率还是不高,我都惊了~

主要是晓晨大佬看出了net.eval(),应用了net.eval()后准确率直线上升,非常nice。

 1 net = models.resnet50(pretrained=True)
 2 for param in net.parameters():
 3     param.requirse_grad = False
 4 net.fc = nn.Linear(2048, 2, bias=True) # 二分类
 5 
 6 print(net)
 7 
 8 # 网络放到GPU上
 9 net = net.to(device)
10 criterion = nn.CrossEntropyLoss()
11 optimizer = optim.Adam(net.fc.parameters(), lr=0.001)
12 for epoch in range(1):  # 重复多轮训练
13     net.train()
14     for i, (inputs, labels) in enumerate(train_loader):
15         inputs = inputs.to(device)
16         labels = labels.to(device)
17         # 优化器梯度归零
18         optimizer.zero_grad()
19         # 正向传播 + 反向传播 + 优化 
20         outputs = net(inputs)
21         loss = criterion(outputs, labels.long())
22         loss.backward()
23         optimizer.step() 
24     print('Epoch: %d loss: %.6f' %(epoch + 1, loss.item()))
25 print('Finished Training')

 

 3、看到浩鹏对二分类的处理就很细致,2048直接到2估计会损失不少信息,再试着温柔的下降试试看~

posted @ 2021-09-26 21:58  古幽月兮  阅读(387)  评论(1编辑  收藏  举报