Top papers ⭐
updated
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper
•
2501.08313
•
Published
•
273
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep
Thinking
Paper
•
2501.04519
•
Published
•
258
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for
Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper
•
2412.13663
•
Published
•
134
Apollo: An Exploration of Video Understanding in Large Multimodal Models
Paper
•
2412.10360
•
Published
•
140
Paper
•
2412.08905
•
Published
•
109
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper
•
2412.09871
•
Published
•
93
Paper
•
2412.15115
•
Published
•
347
Paper
•
2501.14249
•
Published
•
63
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language
Model
Paper
•
2502.02737
•
Published
•
196
The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of
Physical Concept Understanding
Paper
•
2502.08946
•
Published
•
182
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on
a Single GPU
Paper
•
2502.08910
•
Published
•
142
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse
Attention
Paper
•
2502.11089
•
Published
•
135
MLGym: A New Framework and Benchmark for Advancing AI Research Agents
Paper
•
2502.14499
•
Published
•
167