RIFLEx: A Free Lunch for Length Extrapolation in Video Diffusion Transformers Paper • 2502.15894 • Published 16 days ago • 19
RelaCtrl: Relevance-Guided Efficient Control for Diffusion Transformers Paper • 2502.14377 • Published 17 days ago • 12
Which of These Best Describes Multiple Choice Evaluation with LLMs? A) Forced B) Flawed C) Fixable D) All of the Above Paper • 2502.14127 • Published 18 days ago • 2
OctoTools: An Agentic Framework with Extensible Tools for Complex Reasoning Paper • 2502.11271 • Published 21 days ago • 16
ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimodal Models Paper • 2502.09696 • Published 24 days ago • 38
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU Paper • 2502.08910 • Published 25 days ago • 143
Skrr: Skip and Re-use Text Encoder Layers for Memory Efficient Text-to-Image Generation Paper • 2502.08690 • Published 25 days ago • 41
The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding Paper • 2502.08946 • Published 24 days ago • 182
Goku: Flow Based Video Generative Foundation Models Paper • 2502.04896 • Published about 1 month ago • 95
Lumina-Video: Efficient and Flexible Video Generation with Multi-scale Next-DiT Paper • 2502.06782 • Published 27 days ago • 13
Generating Symbolic World Models via Test-time Scaling of Large Language Models Paper • 2502.04728 • Published about 1 month ago • 19
VideoRoPE: What Makes for Good Video Rotary Position Embedding? Paper • 2502.05173 • Published 30 days ago • 64
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach Paper • 2502.05171 • Published 30 days ago • 121