view post Post 1957 🔥🔥Introducing Ola! State-of-the-art omni-modal understanding model with advanced progressive modality alignment strategy!Ola ranks #1 on OpenCompass Leaderboard (<10B). 📜Paper: https://arxiv.org/abs/2502.04328🛠️Code: https://github.com/Ola-Omni/Ola🛠️We have fully released our video&audio training data, intermediate image&video model at THUdyh/ola-67b8220eb93406ec87aeec37. Try to build your own powerful omni-modal model with our data and models! See translation 👀 3 3 + Reply
Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment Paper • 2502.04328 • Published 22 days ago • 27
Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives Paper • 2501.04003 • Published Jan 7 • 25
view post Post 1125 🚀🚀🚀Introducing Insight-V! An early attempt towards o1-like multi-modal reasoning. We offer a structured long-chain visual reasoning data generation pipeline and a multi-agent system to unleash the reasoning potential of MLLMs.📜 Paper: https://arxiv.org/abs/2411.14432🛠️ Github: https://github.com/dongyh20/Insight-V💼 Model Weight: THUdyh/insight-v-673f5e1dd8ab5f2d8d332035 🔥 5 5 👀 3 3 😎 1 1 + Reply
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models Paper • 2411.14432 • Published Nov 21, 2024 • 23
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models Paper • 2411.14432 • Published Nov 21, 2024 • 23