深度学习标准化
Swin Transformer
作者:elfin
1、Batch Normalization
使用BN时,我们只需要使用torch.nn.BatchNorm2d()
指定通道数即可。它会在每个通道上分别求均值和方差在进行标准化。
1.1 数据准备
import torch
BatchNorm2d = torch.nn.BatchNorm2d
test = torch.rand((1,3,2,2))
1.2 数据展示
test[0,:,:,:]
tensor([[[6.7027e-01, 5.3149e-01],
[4.6797e-01, 3.1028e-02]],
[[4.1371e-01, 1.2022e-04],
[2.3150e-01, 2.5120e-01]],
[[5.2258e-01, 9.6350e-02],
[4.6467e-01, 3.6091e-01]]])
1.3 BN转化
BatchNorm2d(3)(test)
Out:
tensor([[[[ 1.0252e+00, 4.4465e-01],
[ 1.7894e-01, -1.6488e+00]],
[[ 1.2858e+00, -1.5194e+00],
[ 4.9993e-02, 1.8359e-01]],
[[ 9.8744e-01, -1.6194e+00],
[ 6.3328e-01, -1.3302e-03]]]], grad_fn=<NativeBatchNormBackward>)
(test[:,0,:,:] - test[:,0,:,:].numpy().mean()) / test[0,0,:,:].numpy().std()
Out:
tensor([[ 1.0253, 0.4447],
[ 0.1790, -1.6489]])
这里我们清楚看到两个结构是一致的!下面我们直接测试batch_size不为1的情况:
test = torch.rand((10,3,2,2))
BatchNorm2d(3)(test)[0,0,:,:]
Out:
tensor([[ 1.6257, 0.5479],
[-1.3761, 0.8000]], grad_fn=<SliceBackward>)
res = (test[:,0,:,:] - test[:,0,:,:].numpy().mean()) / test[:,0,:,:].numpy().std()
res[0,:,:]
Out:
tensor([[ 1.6258, 0.5479],
[-1.3762, 0.8000]])
2、Layer Normalization
LN(Layer Normalization)也是做标准化,但是它不是在样本间,标准化的数据采集只会在单个样本内。
关于torch.nn.LayerNorm()
的参数我们有很多种的配置:
>>> input = torch.randn(20, 5, 10, 10)
>>> # With Learnable Parameters
>>> m = nn.LayerNorm(input.size()[1:])
>>> # Without Learnable Parameters
>>> m = nn.LayerNorm(input.size()[1:], elementwise_affine=False)
>>> # Normalize over last two dimensions
>>> m = nn.LayerNorm([10, 10])
>>> # Normalize over last dimension of size 10
>>> m = nn.LayerNorm(10)
>>> # Activating the module
>>> output = m(input)
2.1 数据展示
squence = torch.rand((2,3,10))
squence
Out:
tensor([[[0.1151, 0.9571, 0.5986, 0.4692, 0.7029, 0.5159, 0.4494, 0.9428,
0.9714, 0.9938],
[0.6456, 0.5997, 0.7542, 0.7266, 0.7021, 0.2900, 0.7044, 0.1627,
0.3725, 0.9454],
[0.9398, 0.3861, 0.5276, 0.8783, 0.8319, 0.1181, 0.6185, 0.9689,
0.6393, 0.7770]],
[[0.2786, 0.8901, 0.7228, 0.3740, 0.4186, 0.6857, 0.8438, 0.4762,
0.4106, 0.4823],
[0.5199, 0.7644, 0.2987, 0.3745, 0.6000, 0.7266, 0.0854, 0.1954,
0.5413, 0.1656],
[0.5487, 0.2655, 0.9256, 0.7352, 0.4081, 0.8017, 0.7130, 0.5364,
0.5441, 0.8483]]])
2.2 指定一个维度
LN = torch.nn.LayerNorm
LN(10)(squence)
Out:
tensor([[[-1.9932, 1.0227, -0.2616, -0.7251, 0.1120, -0.5578, -0.7961,
0.9712, 1.0739, 1.1540],
[ 0.2423, 0.0411, 0.7180, 0.5971, 0.4899, -1.3160, 0.5000,
-1.8739, -0.9546, 1.5561],
[ 1.0619, -1.1060, -0.5519, 0.8214, 0.6396, -2.1551, -0.1960,
1.1760, -0.1146, 0.4246]],
[[-1.3968, 1.6568, 0.8218, -0.9200, -0.6974, 0.6363, 1.4258,
-0.4100, -0.7372, -0.3793],
[ 0.4093, 1.4885, -0.5673, -0.2324, 0.7629, 1.3218, -1.5084,
-1.0233, 0.5037, -1.1548],
[-0.4265, -1.8654, 1.4884, 0.5210, -1.1411, 0.8590, 0.4084,
-0.4893, -0.4500, 1.0955]]], grad_fn=<NativeLayerNormBackward>)
(squence[0,0,:] - squence[0,0,:].numpy().mean()) / squence[0,0,:].numpy().std()
Out:
tensor([-1.9934, 1.0227, -0.2617, -0.7252, 0.1120, -0.5578, -0.7961,
0.9713, 1.0740, 1.1540])
对比发现,这里只在最后一个维度进行操作!
2.3 指定两个维度
squence2 = torch.rand((2,2,7))
LN([2,7])(squence2)
Out:
tensor([[[-0.1525, -0.3791, 1.9005, 0.9187, -1.2562, -0.9069, 0.4788],
[-0.9507, -0.5147, -1.1867, 1.9212, 0.4739, -0.4837, 0.1374]],
[[-0.4490, -1.2532, 1.2571, -0.7904, -0.7550, -1.0003, 0.2586],
[ 1.2673, -0.8106, -0.2374, 1.4318, 0.0237, 1.8428, -0.7854]]],
grad_fn=<NativeLayerNormBackward>)
(squence2[0,:,:] - squence2[0,:,:].numpy().mean()) / squence2[0,:,:].numpy().std()
Out:
tensor([[-0.1525, -0.3791, 1.9006, 0.9188, -1.2563, -0.9070, 0.4788],
[-0.9508, -0.5148, -1.1867, 1.9214, 0.4739, -0.4838, 0.1374]])
这里两种结构也是一致的,说明指定两个维度时,是在最后两个维度上进行标准化!
未完!
清澈的爱,只为中国