Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction Paper • 2409.17422 • Published Sep 25 • 24 • 5
BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline Paper • 2408.15079 • Published Aug 27 • 52 • 4
Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models Paper • 2408.15518 • Published Aug 28 • 42 • 4
Better Alignment with Instruction Back-and-Forth Translation Paper • 2408.04614 • Published Aug 8 • 14 • 3
Gemma 2: Improving Open Language Models at a Practical Size Paper • 2408.00118 • Published Jul 31 • 75 • 3
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention Paper • 2407.02490 • Published Jul 2 • 23 • 4
Self-Play Preference Optimization for Language Model Alignment Paper • 2405.00675 • Published May 1 • 25 • 7
Simple linear attention language models balance the recall-throughput tradeoff Paper • 2402.18668 • Published Feb 28 • 18 • 12
Training-Free Long-Context Scaling of Large Language Models Paper • 2402.17463 • Published Feb 27 • 19 • 3
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases Paper • 2402.14905 • Published Feb 22 • 126 • 13