合集-多模态大模型论文阅读

Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

摘要：研究问题 1.作者认为LLM（GPT-4、Gemini）已经很先进了，视觉模态的大模型于LLM性能之间存在gap。 2. 对于视觉自身，图像分辨率是一个核心因素，但是提高分辨率对计算性能和cost有要求。综上所述，作者希望“how to push forward the VLMs approach 阅读全文

posted @ 2024-06-29 21:36 沐沐mu 阅读(147) 评论(0) 推荐(0) 编辑

Visual Instruction Tuning （LLaVA）

摘要：论文链接：https://proceedings.neurips.cc/paper_files/paper/2023/file/6dcf277ea32ce3288914faf369fe6de0-Paper-Conference.pdf 代码链接： https://github.com/haotian 阅读全文

posted @ 2024-06-30 21:27 沐沐mu 阅读(271) 评论(0) 推荐(0) 编辑