UI-TARS: Pioneering Automated GUI Interaction with Native Agents Paper β’ 2501.12326 β’ Published 2 days ago β’ 43
OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking Paper β’ 2501.09751 β’ Published 7 days ago β’ 45
Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training Paper β’ 2501.11425 β’ Published 4 days ago β’ 70
PaSa: An LLM Agent for Comprehensive Academic Paper Search Paper β’ 2501.10120 β’ Published 7 days ago β’ 37
Do generative video models learn physical principles from watching videos? Paper β’ 2501.09038 β’ Published 9 days ago β’ 29
FAST: Efficient Action Tokenization for Vision-Language-Action Models Paper β’ 2501.09747 β’ Published 7 days ago β’ 22
Learnings from Scaling Visual Tokenizers for Reconstruction and Generation Paper β’ 2501.09755 β’ Published 7 days ago β’ 33
MMDocIR: Benchmarking Multi-Modal Retrieval for Long Documents Paper β’ 2501.08828 β’ Published 9 days ago β’ 28
Towards Best Practices for Open Datasets for LLM Training Paper β’ 2501.08365 β’ Published 9 days ago β’ 47
Padding Tone: A Mechanistic Analysis of Padding Tokens in T2I Models Paper β’ 2501.06751 β’ Published 12 days ago β’ 31
MiniMax-01: Scaling Foundation Models with Lightning Attention Paper β’ 2501.08313 β’ Published 9 days ago β’ 268
Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction Paper β’ 2501.03218 β’ Published 17 days ago β’ 35
VideoRAG: Retrieval-Augmented Generation over Video Corpus Paper β’ 2501.05874 β’ Published 14 days ago β’ 66
Centurio: On Drivers of Multilingual Ability of Large Vision-Language Model Paper β’ 2501.05122 β’ Published 15 days ago β’ 18
Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives Paper β’ 2501.04003 β’ Published 16 days ago β’ 24
Agent Laboratory: Using LLM Agents as Research Assistants Paper β’ 2501.04227 β’ Published 16 days ago β’ 80
TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization Paper β’ 2412.21037 β’ Published 25 days ago β’ 23