arXiv - CSCL: "Towards Practical and Efficient Image-to-Speech C…" - Qoto Mastodon

arXiv - CSCL @arxiv_cscl@qoto.org

Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens. (arXiv:2309.08531v1 [cs.CV])

http://arxiv.org/abs/2309.08531 #arXiv #NLProc

Sep 18, 2023, 03:18 · · arxiv-cscl · · ·

Sign in to participate in the conversation