LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token Paper • 2501.03895 • Published 5 days ago • 40
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token Paper • 2501.03895 • Published 5 days ago • 40
Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data Paper • 2410.18558 • Published Oct 24, 2024 • 18