attention, transformers

这啥呀，慢慢啃

Attention

最初来源于 NLP 机器翻译的 Sequence to Sequence 模型，早先的encoder-decoder结构随着句子长度增加翻译性能会下降，因为模型记不住太长的句子。人类翻译的直观的感觉是看一部分翻译一部分，只关心要翻译的那一小部分，这个就是attention的原理。而transformer是基于attention机制的一种网络结构。个人比较关注CV，所以想在这里整理一下相关的资料。

“the CBAM paper was the first to successfully showcase the wide applicability of the module, especially for Image Classification and Object Detection tasks.”

CV里面常见的有 CBAM。。。

Terms：
GAP - Global avg pool

BAM

CBAM

SE-Net

Ref：