DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published 1 day ago • 131
VideoWorld: Exploring Knowledge Learning from Unlabeled Videos Paper • 2501.09781 • Published 7 days ago • 20
Do generative video models learn physical principles from watching videos? Paper • 2501.09038 • Published 9 days ago • 29
Textoon: Generating Vivid 2D Cartoon Characters from Text Descriptions Paper • 2501.10020 • Published 7 days ago • 21
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps Paper • 2501.09732 • Published 8 days ago • 64
RepVideo: Rethinking Cross-Layer Representation for Video Generation Paper • 2501.08994 • Published 9 days ago • 15
The Lessons of Developing Process Reward Models in Mathematical Reasoning Paper • 2501.07301 • Published 11 days ago • 85
MiniMax-01: Scaling Foundation Models with Lightning Attention Paper • 2501.08313 • Published 10 days ago • 268
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs Paper • 2501.06186 • Published 13 days ago • 59
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos Paper • 2501.04001 • Published 16 days ago • 41
Large Motion Video Autoencoding with Cross-modal Video VAE Paper • 2412.17805 • Published Dec 23, 2024 • 24
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling Paper • 2412.05271 • Published Dec 6, 2024 • 129
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published Dec 13, 2024 • 139
STIV: Scalable Text and Image Conditioned Video Generation Paper • 2412.07730 • Published Dec 10, 2024 • 71
PaliGemma 2: A Family of Versatile VLMs for Transfer Paper • 2412.03555 • Published Dec 4, 2024 • 124
Open-Sora Plan: Open-Source Large Video Generation Model Paper • 2412.00131 • Published Nov 28, 2024 • 33
Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations Paper • 2410.10792 • Published Oct 14, 2024 • 29
SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory Paper • 2411.11922 • Published Nov 18, 2024 • 18