arXiv - CSCL: "VATLM: Visual-Audio-Text Pre-Training with Unifie…" - Qoto Mastodon

arXiv - CSCL @arxiv_cscl@qoto.org

VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning. (arXiv:2211.11275v2 [eess.AS] UPDATED)

http://arxiv.org/abs/2211.11275 #arXiv #NLProc

May 22, 2023, 03:07 · · arxiv-cscl · · ·

Sign in to participate in the conversation