摘要: VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text 2021-07-22 08:54:20 Paper: https://arxiv.org/pdf/2104.11178. 阅读全文
posted @ 2021-07-22 11:38 AHU-WangXiao 阅读(1090) 评论(0) 推荐(0) 编辑