Can this Model Also Recognize Dogs? Zero-Shot Model Search from Weights Paper • 2502.09619 • Published 15 days ago • 31
TransMLA: Multi-head Latent Attention Is All You Need Paper • 2502.07864 • Published 17 days ago • 45
LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters! Paper • 2502.07374 • Published 17 days ago • 35
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling Paper • 2502.06703 • Published 18 days ago • 140
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach Paper • 2502.05171 • Published 21 days ago • 120
view article Article π0 and π0-FAST: Vision-Language-Action Models for General Robot Control 25 days ago • 109
Large Language Models Think Too Fast To Explore Effectively Paper • 2501.18009 • Published 30 days ago • 23
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs Paper • 2501.18585 • Published 29 days ago • 56
O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning Paper • 2501.12570 • Published Jan 22 • 24
view article Article Brain-Inspired Efficient Pruning: Exploiting Criticality in Spiking Neural Networks By mikelabs • Nov 22, 2024 • 1
LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models Paper • 2411.09595 • Published Nov 14, 2024 • 72
view article Article Efficient Deep Learning: A Comprehensive Overview of Optimization Techniques 👐 📚 By Isayoften • Aug 26, 2024 • 50
view article Article A failed experiment: Infini-Attention, and why we should keep trying? Aug 14, 2024 • 59
view article Article Shape Rotation 101: An Intro to Einsum and Jax Transformers By dejavucoder • Jun 22, 2024 • 3
MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing Paper • 2306.10012 • Published Jun 16, 2023 • 35