Agentless: Demystifying LLM-based Software Engineering Agents Paper • 2407.01489 • Published Jul 1, 2024 • 53
Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives Paper • 2501.04003 • Published 5 days ago • 20
An Empirical Study of Autoregressive Pre-training from Videos Paper • 2501.05453 • Published 3 days ago • 29
InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection Paper • 2501.04575 • Published 4 days ago • 21
Agent Laboratory: Using LLM Agents as Research Assistants Paper • 2501.04227 • Published 5 days ago • 66
LLM4SR: A Survey on Large Language Models for Scientific Research Paper • 2501.04306 • Published 5 days ago • 29
URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics Paper • 2501.04686 • Published 4 days ago • 44
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking Paper • 2501.04519 • Published 4 days ago • 189
Cosmos World Foundation Model Platform for Physical AI Paper • 2501.03575 • Published 6 days ago • 55
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models Paper • 2501.03262 • Published 9 days ago • 72
MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models Paper • 2501.02955 • Published 6 days ago • 39
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos Paper • 2501.04001 • Published 5 days ago • 36
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token Paper • 2501.03895 • Published 5 days ago • 43
BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning Paper • 2501.03226 • Published 6 days ago • 34
STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution Paper • 2501.02976 • Published 6 days ago • 46
VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation Paper • 2412.21059 • Published 13 days ago • 18
Virgo: A Preliminary Exploration on Reproducing o1-like MLLM Paper • 2501.01904 • Published 9 days ago • 29
VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction Paper • 2501.01957 • Published 9 days ago • 34
EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation Paper • 2501.01895 • Published 9 days ago • 44