多模态预训练模型

DALL-E

关键词:dVAE,图像生成,Transformer,codebook
https://github.com/openai/DALL-E
https://openai.com/blog/dall-e/
https://arxiv.org/abs/2102.12092,

CLIP

关键词:对比学习,图文匹配,特征空间对齐
CLIP: Contrastive Language-Image Pre-training
https://github.com/OpenAI/CLIP
https://arxiv.org/abs/2103.00020v1

BLIP

关键词:Captioning and Filtering (CapFilt),Bootstrapping,理解和生成
https://github.com/salesforce/BLIP
https://arxiv.org/abs/2201.12086

CogView

关键词:中文图像生成,迁移学习,Sandwich-LN,VQ-VAE,GPT
https://github.com/THUDM/CogView
https://arxiv.org/abs/2105.13290
https://wudao.aminer.cn/CogView/index.html

BriVL/WenLan

关键词:弱语义相关,RoBERTa,InfoNCE,MoCo,中文图文检索,Faster-RCNN,EfficientNet B7
https://arxiv.org/abs/2103.06561,WenLan 1.0
https://arxiv.org/abs/2110.14378,WenLan 2.0

WenLan 2.0

UniT

关键词:多模态、多任务、统一Transformer,DETR,BERT
Facebook
https://arxiv.org/abs/2102.10772
https://mmf.sh/

UNITER

关键词:集成、BERT、Faster R-CNN、图文匹配、多模态
2019 ECCV
李琳婕 多模态预训练模型UNITER, 通用的图像-文本语言表征学习
UNiversal Image-TExt Representation
https://arxiv.org/abs/1909.11740
https://github.com/ChenRocks/UNITER

ALIGN

ALIGN: A Large-scale ImaGe and Noisy-text embedding
https://arxiv.org/abs/2102.05918
代码未公开

10亿图文对儿

ViLT

关键词:图像切块、拼接、速度快、最简单的多模态
ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision
NAVER AI Lab
ViLT:最简单的多模态Transformer
2021.6 ICML
论文地址 https://arxiv.org/abs/2102.03334
代码地址 https://github.com/dandelin/vilt


CPT

CROSS-MODAL PROMPT TUNING

posted on 2022-02-17 21:30  宋岳庭  阅读(478)  评论(0编辑  收藏  举报