swin transformer
论文标题:Swin Transformer: Hierarchical Vision Transformer using ShiftedWindows
swin transformer的主要有特点有三个:
- 第一,把图像划分为一个个窗口,只在窗口内部计算self-attention。这样带来的优势是,self-attention的计算复杂度只与图像尺寸呈线性 系,而非平方关系。(Swin Transformer builds hierarchical feature maps by merging image patches in deeper layers and has linear computation complexity to input image size due to computation of self-attention only within each local window.)
- 第二,后面layer的patch会合并前面layer的patch,所以越深的layer,它的patch size越大,视野越大,从而构建出hierarchical feature maps。(Swin Transformer constructs a hierarchical representation by starting from small-sized patches (outlined in gray) and gradually merging neighboring patches in deeper Transformer layers.)
- 第三个特点是shifted window,就是前后两层的window划分之间有偏移。每一个swin transformer block都包含两层,第一层是W-MSA (window multi-head self-attention),第二层是SW-MSA (shifted window multi-head self-attention)。前后层这种shifted window分别为对方的被拆开的window带来了联结。(The shifted windows bridge the windows of the preceding layer, providing connections among them that significantly enhance modeling power)
论文讲解资料:
知乎:CV+Transformer之Swin Transformer
分类:
Deep Learning
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· Manus爆火,是硬核还是营销?
· 终于写完轮子一部分:tcp代理 了,记录一下
· 震惊!C++程序真的从main开始吗?99%的程序员都答错了
· 别再用vector<bool>了!Google高级工程师:这可能是STL最大的设计失误
· 单元测试从入门到精通