-
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
Paper • 2501.04519 • Published • 245 -
Learning an evolved mixture model for task-free continual learning
Paper • 2207.05080 • Published • 1 -
EVOLvE: Evaluating and Optimizing LLMs For Exploration
Paper • 2410.06238 • Published • 1 -
Smaller Language Models Are Better Instruction Evolvers
Paper • 2412.11231 • Published • 27
Collections
Discover the best community collections!
Collections including paper arxiv:2410.13166
-
VILA^2: VILA Augmented VILA
Paper • 2407.17453 • Published • 40 -
Octopus v4: Graph of language models
Paper • 2404.19296 • Published • 117 -
Octo-planner: On-device Language Model for Planner-Action Agents
Paper • 2406.18082 • Published • 48 -
Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models
Paper • 2408.15518 • Published • 43
-
How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study
Paper • 2404.14047 • Published • 45 -
LiteSearch: Efficacious Tree Search for LLM
Paper • 2407.00320 • Published • 38 -
Cut Your Losses in Large-Vocabulary Language Models
Paper • 2411.09009 • Published • 44 -
LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models
Paper • 2411.09595 • Published • 72
-
TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
Paper • 2404.11912 • Published • 17 -
SnapKV: LLM Knows What You are Looking for Before Generation
Paper • 2404.14469 • Published • 24 -
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Paper • 2312.11514 • Published • 258 -
An Evolved Universal Transformer Memory
Paper • 2410.13166 • Published • 3
-
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention
Paper • 2404.07143 • Published • 106 -
RULER: What's the Real Context Size of Your Long-Context Language Models?
Paper • 2404.06654 • Published • 35 -
An Evolved Universal Transformer Memory
Paper • 2410.13166 • Published • 3