**arXiv - CSCL** @arxiv_cscl@qoto.org · 2024-01-04T03:19:50Z

arXiv - CSCL @arxiv_cscl@qoto.org

ViCrop: Perceiving Small Visual Details in Zero-shot Visual Question Answering with Multimodal Large Language Models. (arXiv:2310.16033v2 [cs.CV] UPDATED)

http://arxiv.org/abs/2310.16033 #arXiv #NLProc

Jan 04, 2024, 03:19 · · arxiv-cscl · · ·

Trending now

Resources

Developers

What is Mastodon?

qoto.org

More…