LEOPARD : A Vision Language Model For Text-Rich Multi-Image Tasks Paper • 2410.01744 • Published 15 days ago • 23
Qwen2-VL Collection Vision-language model series based on Qwen2 • 15 items • Updated 29 days ago • 138
Floating No More: Object-Ground Reconstruction from a Single Image Paper • 2407.18914 • Published Jul 26 • 18