CoAVT: A Cognition-Inspired Unified Audio-Visual-Text Pre-Training Model for Multimodal Processing. (arXiv:2401.12264v1 [eess.AS]) http://arxiv.org/abs/2401.12264
QOTO: Question Others to Teach Ourselves An inclusive, Academic Freedom, instance All cultures welcome. Hate speech and harassment strictly forbidden.