Mamba
🥥 Table of Content
Part 1: The Issues of Transformer
Part 2: State Space Model(SSM)
Part 3: Mamba: A Selective SSM
-
A selective scan algorithm, which allows the model to filter (ir)relevant information
-
A hardware-aware algorithm that allows for efficient storage of (intermediate) results through parallel scan, kernel fusion, and recomputation.
data:image/s3,"s3://crabby-images/39c7f/39c7fe9af79f060c2ff3397c6ddc5125b7c2941f" alt=""
data:image/s3,"s3://crabby-images/bbb6d/bbb6d88505719cf06e567311175e48c974cf968c" alt=""
data:image/s3,"s3://crabby-images/4188d/4188d8bb815dda2a291d4aa85fd8d12d4d69d121" alt=""
🥑 Get Started!
Article 1: Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Article 2: Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Article 3: Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference
Article 4: MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
Article 4: S4: Efficiently Modeling Long Sequences with Structured State Spaces
Blog 1: 一文通透想颠覆Transformer的Mamba:从SSM、HiPPO、S4到Mamba | CSDN
Blog 2: 通透理解FlashAttention与FlashAttention2:全面降低显存读写、加快计算速度
Video 1: 下个风口?Mamba手推公式&代码手搓 | Bilibili
Video 2: 【博士Vlog】2024最新模型Mamba详解,Transformer已死,你想知道的都在这里了! | Bilibili
Video 3: 视觉十分钟|mamba模型讲解(含transformer,RNN,SSM,S4部分) | Bilibili
Video 4: Mamba和S4解读:架构、并行扫描、内核融合、循环、卷积、数学 | Bilibili
State Space Model(SSM)
Model | Training Phase | Inference Phase | Addition Issue |
---|---|---|---|
RNN(1986) | Slow(not parallelizable) | Fast(scales linearly with sequence length) | Rapid Forgetting |
LSTM(1997) | Slow | Fast | Forgetting |
Transformer(2017) | Fast(parallelizable) | Slow(scales quadratically with sequence length) | Ram & Time: |
Mamba(2023) | Fast | Fast(scales linearly with sequence length + unbounded context length) | Ram & Time: |
Flash Attention
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】凌霞软件回馈社区,博客园 & 1Panel & Halo 联合会员上线
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】博客园社区专享云产品让利特惠,阿里云新客6.5折上折
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· DeepSeek “源神”启动!「GitHub 热点速览」
· 微软正式发布.NET 10 Preview 1:开启下一代开发框架新篇章
· 我与微信审核的“相爱相杀”看个人小程序副业
· C# 集成 DeepSeek 模型实现 AI 私有化(本地部署与 API 调用教程)
· spring官宣接入deepseek,真的太香了~