多模态-中文数据集

(1) 华为悟空
悟空，华为，https://wukong-dataset.github.io/wukong-dataset/
The dataset contains 100 Million <image, text> pairs

(2) FLICKR的中文版
flickr30k-cn、flickr8k-cn
https://github.com/weiyuk/fluent-cap

(3)COCO中文版
https://github.com/li-xirong/coco-cn

(4) muge
https://github.com/MUGE-2021，电商数据集ECommerce-IC
MUGE（牧歌，Multimodal Understanding and Generation Evaluation）是业界首个大规模中文多模态评测基准，由达摩院联合浙江大学、阿里云天池平台联合发布，中国计算机学会计算机视觉专委会（CCF-CV专委）协助推出。目前包括：多模态理解与生成任务在内的多模态评测基准，其中包括图像描述、图文检索以及基于文本的图像生成。
模型：M6、OFA

M6-Corpus，J. Lin, R. Men, A. Yang, C. Zhou, M. Ding, Y. Zhang, P. Wang, A. Wang, L. Jiang, X. Jia, et al. M6: A chinese multimodal pretrainer. arXiv preprint arXiv:2103.00823, 2021.

(5) WuDaoCorpora
CogView、悟道2.0、文澜2.0

WuDaoMM：用于预训练模型的大规模多模态数据集
https://github.com/BAAI-WuDao/WuDaoMM/

(6) Product1M
100万图文对儿
X. Zhan, Y. Wu, X. Dong, Y. Wei, M. Lu, Y. Zhang, H. Xu, and X. Liang. Product1m: Towards weakly supervised instance-level product retrieval via cross-modal pretraining. In International Conference on Computer Vision, 2021.

posted on 2022-03-09 09:20 宋岳庭阅读(1564) 评论(0) 编辑收藏举报