Slamming: Training a Speech Language Model on One GPU in a Day Paper • 2502.15814 • Published 9 days ago • 56
DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks Paper • 2502.17157 • Published 4 days ago • 48
RelaCtrl: Relevance-Guided Efficient Control for Diffusion Transformers Paper • 2502.14377 • Published 8 days ago • 11
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU Paper • 2502.08910 • Published 16 days ago • 142
An Open Recipe: Adapting Language-Specific LLMs to a Reasoning Model in One Day via Model Merging Paper • 2502.09056 • Published 15 days ago • 30
ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates Paper • 2502.06772 • Published 18 days ago • 19
ARR: Question Answering with Large Language Models via Analyzing, Retrieving, and Reasoning Paper • 2502.04689 • Published 22 days ago • 7
view article Article Fine-tune Deepseek-R1 with a Synthetic Reasoning Dataset By sdiazlor • 18 days ago • 44
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling Paper • 2502.06703 • Published 18 days ago • 140
On-device Sora: Enabling Diffusion-Based Text-to-Video Generation for Mobile Devices Paper • 2502.04363 • Published 24 days ago • 11
Generating Symbolic World Models via Test-time Scaling of Large Language Models Paper • 2502.04728 • Published 22 days ago • 18
BOLT: Bootstrap Long Chain-of-Thought in Language Models without Distillation Paper • 2502.03860 • Published 22 days ago • 23
DeepRAG: Thinking to Retrieval Step by Step for Large Language Models Paper • 2502.01142 • Published 25 days ago • 23