ResNet
批归一化(Batch Normalization)
令一个Batch为 \(\mathcal{B}=\{\bm{x}_1,...,\bm{x}_m\}\),其中 \(\bm{x}_i\in\mathbb{R}^d\)。
求出均值和方差:
\[\bm{\mu}:=\frac{1}{m}\sum\limits_{i=1}^m\bm{x}_i
\]
\[\bm{\sigma}^2:=\frac{1}{m}\sum\limits_{i=1}^m(\bm{x}_i-\bm{\mu})^2
\]
取一小常数 \(\epsilon>0\),标准化为
\[\hat{\bm{x}_i}=\frac{\bm{x}_i-\bm{\mu}}{\sqrt{\bm{\sigma}^2+\epsilon}}
\]
最后进行一个线性变换(拉伸(scale)参数 \(\bm{\gamma}\) 和偏移(shift)参数 \(\bm\beta\) 需学习)
\[\bm{y}_i=\bm\gamma\hat{\bm{x}_i}+\bm\beta
\]
残差块(Residual Block)
\(x\rightarrow\)卷1\(\rightarrow\)BN\(\rightarrow\)ReLU\(\rightarrow\)卷2\(\rightarrow\)BN\(\stackrel{+x}{\rightarrow}\)ReLU\(\rightarrow y\)。
注意当输入输出通道数不相同时,需要将 \(x\) 卷一层再加上去。
class Residual(nn.Module):
def __init__(self, in_channels, out_channels, stride=1):
super(Residual, self).__init__()
self.ReLU = nn.ReLU()
self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1, stride=stride)
self.conv2 = nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1)
self.bn1 = nn.BatchNorm2d(out_channels)
self.bn2 = nn.BatchNorm2d(out_channels)
if in_channels != out_channels:
self.conv3 = nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride)
else:
self.conv3 = None
def forward(self, x):
y = self.ReLU(self.bn1(self.conv1(x)))
y = self.bn2(self.conv2(y))
if self.conv3:
x = self.conv3(x)
y = self.ReLU(y+x)
return y
ResNet
首先还是卷一下。
self.b1 = nn.Sequential(
nn.Conv2d(in_channels=1, out_channels=64, kernel_size=7, stride=2, padding=3),
nn.ReLU(),
nn.BatchNorm2d(64),
nn.MaxPool2d(kernel_size=3, stride=2, padding=1))
接着是四个类似的子结构,每个由两个残差块组成。后三个子结构通道数翻倍、长宽减半。
self.b2 = nn.Sequential(Residual(64, 64, strides=1),
Residual(64, 64, strides=1))
self.b3 = nn.Sequential(Residual(64, 128, strides=2),
Residual(128, 128, strides=1))
self.b4 = nn.Sequential(Residual(128, 256, strides=2),
Residual(256, 256, strides=1))
self.b5 = nn.Sequential(Residual(256, 512, strides=2),
Residual(512, 512, strides=1))
最后还是平均池化+全连接。
self.b6 = nn.Sequential(nn.AdaptiveAvgPool2d((1, 1)),
nn.Flatten(),
nn.Linear(512, 10))