Profile directory About Mobile apps
Log in Sign up
arXiv Computer Science @arxiv_cs@qoto.org
Follow

Chitranuvad: Adapting Multi-Lingual LLMs for Multimodal Translation https://arxiv.org/abs/2502.20420 #cs.CL #cs.CV

Chitranuvad: Adapting Multi-Lingual LLMs for Multimodal Translation

In this work, we provide the system description of our submission as part of the English to Lowres Multimodal Translation Task at the Workshop on Asian Translation (WAT2024). We introduce Chitranuvad, a multimodal model that effectively integrates Multilingual LLM and a vision module for Multimodal Translation. Our method uses a ViT image encoder to extract visual representations as visual token embeddings which are projected to the LLM space by an adapter layer and generates translation in an autoregressive fashion. We participated in all the three tracks (Image Captioning, Text only and Multimodal translation tasks) for Indic languages (ie. English translation to Hindi, Bengali and Malyalam) and achieved SOTA results for Hindi in all of them on the Challenge set while remaining competitive for the other languages in the shared task.

arXiv.org
March 4, 2025 at 3:00 AM · · feed2toot · 0 · 0 · 0
Sign in to participate in the conversation
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.

Trending now

#monsterdon0 people talking
0

Resources

  • Terms of service
  • Privacy policy

Developers

  • Documentation
  • API

What is Mastodon?

qoto.org

  • About
  • v3.5.19-qoto

More…

  • Source code
  • Mobile apps
v3.5.19-qoto · Privacy policy