各种Normalization

1 BatchNorm、InstanceNorm和LayerNorm的理解

[1] Batch Normalization, Instance Normalization, Layer Normalization: Structural Nuances
• Transformer的Encoder使用了Layer Normalization
• 还有个Group Normalization,可以参考《全面解读Group Normalization》

2 BatchNorm

2.1 momentum参数在计算running mean和running variance中起到importance factor的作用

[2] https://stats.stackexchange.com/questions/219808/how-and-why-does-batch-normalization-use-moving-averages-to-track-the-accuracy-o
[3] Batch Normlization Explained

running_mean = momentum * running_mean + (1-momentum) * new_mean 
running_var = momentum* running_var + (1-momentum) * new_var

Momentum is the importance given to the last seen mini-batch, a.k.a “lag”. If the momentum is set to 0, the running mean and variance come from the last seen mini-batch. However, this may be biased and not the desirable one for testing. Conversely, if momentum is set to 1, it uses the running mean and variance from the first mini-batch. Essentially, momentum controls how much each new mini-batch contributes to the running averages.
Ideally, the momentum should be set close to 1 (>0.9) to ensure slow learning of the running mean and variance such that the noise in a mini-batch is ignored.

2.2 torch.utils.checkpoint对batch normalization的处理

[4] Trading compute for memory in PyTorch models using Checkpointing

Batch normalization layer maintains the running mean and variance stats depending on the current minibatch and everytime a forward pass is run, the stats are updated based on the momentum value. In checkpointing, running the forward pass twice on a model segment in the same iteration will result in updating mean and stats value. In order to avoid this, use the new_momentum = sqrt(momentum) as the momentum value.

3 AdaIN(Adaptive Instance Normalization)

AdaIN是style transfer中经常用到的一种normalization

AdaIN receives a content input x and a style input y, and simply aligns the channel- wise mean and variance of x to match those of y. Unlike BN, IN or CIN, AdaIN has no learnable affine parameters.

AdaIN(x,y)=σ(y)(xμ(x)σ(x))+μ(y)

IBN-Net对Instance Normalization和Batch Normalization的一个推论

IN learns features that are invariant to appearance changes, such as colors, styles, and virtuality/reality, while BN is essential for preserving content related information

IBN-Net在ReID模型中用得比较多。

posted @   渐渐的笔记本  阅读(141)  评论(0编辑  收藏  举报
编辑推荐:
· AI与.NET技术实操系列:基于图像分类模型对图像进行分类
· go语言实现终端里的倒计时
· 如何编写易于单元测试的代码
· 10年+ .NET Coder 心语,封装的思维:从隐藏、稳定开始理解其本质意义
· .NET Core 中如何实现缓存的预热?
阅读排行:
· 25岁的心里话
· 闲置电脑爆改个人服务器(超详细) #公网映射 #Vmware虚拟网络编辑器
· 零经验选手,Compose 一天开发一款小游戏!
· 因为Apifox不支持离线,我果断选择了Apipost!
· 通过 API 将Deepseek响应流式内容输出到前端
点击右上角即可分享
微信分享提示