07 2021 档案
该文被密码保护。
摘要:VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text 2021-07-22 08:54:20 Paper: https://arxiv.org/pdf/2104.11178.
阅读全文
摘要:OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and Generation 2021-07-21 20:23:07 Paper: https://arxiv.org/pdf/2107.00249.pdf Code: No
阅读全文
摘要:AST: Audio Spectrogram Transformer 2021-07-21 19:38:36 Paper: https://arxiv.org/pdf/2104.01778.pdf Code: https://github.com/YuanGongND/ast 1. Backgrou
阅读全文
摘要:Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts 2021-07-20 08:58:37 Paper: cvpr2021 Code: https://git
阅读全文