Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass Paper • 2501.13928 • Published about 8 hours ago • 3
Pairwise RM: Perform Best-of-N Sampling with Knockout Tournament Paper • 2501.13007 • Published 1 day ago • 13
IntellAgent: A Multi-Agent Framework for Evaluating Conversational AI Systems Paper • 2501.11067 • Published 4 days ago • 7
O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning Paper • 2501.12570 • Published 2 days ago • 13
VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding Paper • 2501.13106 • Published 1 day ago • 44
Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback Paper • 2501.12895 • Published 1 day ago • 42
FilmAgent: A Multi-Agent Framework for End-to-End Film Automation in Virtual 3D Spaces Paper • 2501.12909 • Published 1 day ago • 43
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published 1 day ago • 113
MSTS: A Multimodal Safety Test Suite for Vision-Language Models Paper • 2501.10057 • Published 7 days ago • 7
Video Depth Anything: Consistent Depth Estimation for Super-Long Videos Paper • 2501.12375 • Published 2 days ago • 16
Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments Paper • 2501.10893 • Published 5 days ago • 20
Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models Paper • 2501.11873 • Published 3 days ago • 59
Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise Paper • 2501.08331 • Published 9 days ago • 16
EMO2: End-Effector Guided Audio-Driven Avatar Video Generation Paper • 2501.10687 • Published 6 days ago • 10
Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation Paper • 2501.12202 • Published 2 days ago • 22
UI-TARS: Pioneering Automated GUI Interaction with Native Agents Paper • 2501.12326 • Published 2 days ago • 42