Mobius: Text to Seamless Looping Video Generation via Latent Shift Paper • 2502.20307 • Published 1 day ago • 6 • 1
FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute Paper • 2502.20126 • Published 1 day ago • 8 • 1
R1-T1: Fully Incentivizing Translation Capability in LLMs via Reasoning Learning Paper • 2502.19735 • Published 1 day ago • 5 • 1
TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding Paper • 2502.19400 • Published 2 days ago • 34 • 2
An Overview of Large Language Models for Statisticians Paper • 2502.17814 • Published 4 days ago • 3 • 2
WebGames: Challenging General-Purpose Web-Browsing AI Agents Paper • 2502.18356 • Published 3 days ago • 8 • 2
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution Paper • 2502.18449 • Published 3 days ago • 54 • 4
X-Dancer: Expressive Music to Human Dance Video Generation Paper • 2502.17414 • Published 4 days ago • 9 • 3
VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing Paper • 2502.17258 • Published 4 days ago • 58 • 4
RIFLEx: A Free Lunch for Length Extrapolation in Video Diffusion Transformers Paper • 2502.15894 • Published 7 days ago • 16 • 3
One-step Diffusion Models with $f$-Divergence Distribution Matching Paper • 2502.15681 • Published 7 days ago • 5 • 2
SIFT: Grounding LLM Reasoning in Contexts via Stickers Paper • 2502.14922 • Published 9 days ago • 28 • 3
Think Inside the JSON: Reinforcement Strategy for Strict LLM Schema Adherence Paper • 2502.14905 • Published 10 days ago • 9 • 2
InterFeedback: Unveiling Interactive Intelligence of Large Multimodal Models via Human Feedback Paper • 2502.15027 • Published 8 days ago • 6 • 2
The Relationship Between Reasoning and Performance in Large Language Models -- o3 (mini) Thinks Harder, Not Longer Paper • 2502.15631 • Published 7 days ago • 7 • 2
Dynamic Concepts Personalization from Single Videos Paper • 2502.14844 • Published 8 days ago • 14 • 2
Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation Paper • 2502.14846 • Published 8 days ago • 13 • 2
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published 8 days ago • 118 • 7