Rethinking Optimization and Architecture for Tiny Language Models Paper • 2402.02791 • Published Feb 5 • 12
Specialized Language Models with Cheap Inference from Limited Domain Data Paper • 2402.01093 • Published Feb 2 • 45
Scavenging Hyena: Distilling Transformers into Long Convolution Models Paper • 2401.17574 • Published Jan 31 • 15
Understanding LLMs: A Comprehensive Overview from Training to Inference Paper • 2401.02038 • Published Jan 4 • 61
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey Paper • 2312.00678 • Published Dec 1, 2023 • 2
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism Paper • 2401.02954 • Published Jan 5 • 40
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling Paper • 2401.16380 • Published Jan 29 • 47
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection Paper • 2403.03507 • Published Mar 6 • 182
Beyond Language Models: Byte Models are Digital World Simulators Paper • 2402.19155 • Published Feb 29 • 49