批量归一化
1.基本概念
2.代码实现
1.基本概念
对输入的标准化(浅层模型)
处理后的任意一个特征在数据集中所有样本上的均值为0、标准差为1。
标准化处理输入数据使各个特征的分布相近
批量归一化(深度模型)
利用小批量上的均值和标准差,不断调整神经网络中间输出,从而使整个神经网络在各层的中间输出的数值更稳定。
1.1对全连接层做批量归一化
位置:全连接层中的仿射变换和激活函数之间。
全连接:
\[\boldsymbol{x} = \boldsymbol{W\boldsymbol{u} + \boldsymbol{b}} \\
output =\phi(\boldsymbol{x})
\]
批量归一化:
\[output=\phi(\text{BN}(\boldsymbol{x}))
\]
\[\boldsymbol{y}^{(i)} = \text{BN}(\boldsymbol{x}^{(i)})
\]
\[\boldsymbol{\mu}_\mathcal{B} \leftarrow \frac{1}{m}\sum_{i = 1}^{m} \boldsymbol{x}^{(i)},
\]
\[\boldsymbol{\sigma}_\mathcal{B}^2 \leftarrow \frac{1}{m} \sum_{i=1}^{m}(\boldsymbol{x}^{(i)} - \boldsymbol{\mu}_\mathcal{B})^2,
\]
\[\hat{\boldsymbol{x}}^{(i)} \leftarrow \frac{\boldsymbol{x}^{(i)} - \boldsymbol{\mu}_\mathcal{B}}{\sqrt{\boldsymbol{\sigma}_\mathcal{B}^2 + \epsilon}},
\]
这⾥ϵ > 0是个很小的常数,保证分母大于0
\[{\boldsymbol{y}}^{(i)} \leftarrow \boldsymbol{\gamma} \odot
\hat{\boldsymbol{x}}^{(i)} + \boldsymbol{\beta}.
\]
引入可学习参数:拉伸参数γ和偏移参数β。若\(\boldsymbol{\gamma} = \sqrt{\boldsymbol{\sigma}_\mathcal{B}^2 + \epsilon}\)和\(\boldsymbol{\beta} = \boldsymbol{\mu}_\mathcal{B}\),批量归一化无效。
1.2对卷积层做批量归⼀化
位置:卷积计算之后、应⽤激活函数之前。
如果卷积计算输出多个通道,我们需要对这些通道的输出分别做批量归一化,且每个通道都拥有独立的拉伸和偏移参数。
计算:对单通道,batchsize=m,卷积计算输出=pxq
对该通道中m×p×q个元素同时做批量归一化,使用相同的均值和方差。
1.3预测时的批量归⼀化
训练:以batch为单位,对每个batch计算均值和方差。
预测:用移动平均估算整个训练数据集的样本均值和方差。
2.代码实现
class BatchNorm(nn.Module):
def __init__(self, *, num_features, num_dims):
super(BatchNorm, self).__init__()
super(BatchNorm, self).__init__()
if num_dims == 2:
shape = (1, num_features)
else:
shape = (1, num_features, 1, 1)
self.gamma = nn.Parameter(torch.ones(shape))
self.beta = nn.Parameter(torch.zeros(shape))
self.moving_mean = torch.zeros(shape)
self.moving_var = torch.zeros(shape)
self.momentum = 0.9
def forward(self, X):
if self.moving_mean.device != X.device:
self.moving_mean = self.moving_mean.to(X.device)
self.moving_var = self.moving_var.to(X.device)
Y, self.moving_mean, self.moving_var = self._batch_norm(self.training,
X, self.gamma, self.beta, self.moving_mean,
self.moving_var, eps=1e-5, momentum=self.momentum)
return Y
def _batch_norm(self, is_training, X, gamma, beta, moving_mean, moving_var, eps, momentum):
if not is_training:
X_hat = (X - moving_mean) / torch.sqrt(moving_var + eps)
else:
assert len(X.shape) in (2, 4)
if len(X.shape) == 2:
mean = X.mean(dim=0)
var = ((X - mean) ** 2).mean(dim=0)
else:
mean = X.mean(dim=0, keepdim=True).mean(dim=2, keepdim=True).mean(dim=3, keepdim=True)
var = ((X - mean) ** 2).mean(dim=0, keepdim=True).mean(dim=2, keepdim=True).mean(dim=3, keepdim=True)
X_hat = (X - mean) / torch.sqrt(var + eps)
moving_mean = momentum * moving_mean + (1.0 - momentum) * mean
moving_var = momentum * moving_var + (1.0 - momentum) * var
Y = gamma * X_hat + beta
return Y, moving_mean, moving_var
带batch norm 的LeNet
class BLeNet(nn.Module):
def __init__(self, *, channels, fig_size, num_class):
super(BLeNet, self).__init__()
self.conv = nn.Sequential(
nn.Conv2d(channels, 6, 5, padding=2),
BatchNorm(num_features=6, num_dims = 4),
nn.Sigmoid(),
nn.AvgPool2d(2, 2),
nn.Conv2d(6, 16, 5),
BatchNorm(num_features=16, num_dims = 4),
nn.Sigmoid(),
nn.AvgPool2d(2, 2),
)
fig_size = (fig_size - 5 + 1 + 4 ) // 1
fig_size = (fig_size - 2 + 2) // 2
fig_size = (fig_size - 5 + 1) // 1
fig_size = (fig_size - 2 + 2) // 2
self.fc = nn.Sequential(
nn.Flatten(),
nn.Linear(16 * fig_size * fig_size, 120),
BatchNorm(num_features=120, num_dims = 2),
nn.Sigmoid(),
nn.Linear(120, 84),
BatchNorm(num_features=84, num_dims = 2),
nn.Sigmoid(),
nn.Linear(84, num_class),
)
def forward(self, X):
conv_features = self.conv(X)
output = self.fc(conv_features)
return output
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· Linux系列:如何用heaptrack跟踪.NET程序的非托管内存泄露
· 开发者必知的日志记录最佳实践
· SQL Server 2025 AI相关能力初探
· Linux系列:如何用 C#调用 C方法造成内存泄露
· AI与.NET技术实操系列(二):开始使用ML.NET
· 被坑几百块钱后,我竟然真的恢复了删除的微信聊天记录!
· 没有Manus邀请码?试试免邀请码的MGX或者开源的OpenManus吧
· 【自荐】一款简洁、开源的在线白板工具 Drawnix
· 园子的第一款AI主题卫衣上架——"HELLO! HOW CAN I ASSIST YOU TODAY
· Docker 太简单,K8s 太复杂?w7panel 让容器管理更轻松!