Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment
Paper
•
2405.03594
•
Published
•
7
Sparse pre-trained and fine-tuned Llama models made by Neural Magic + Cerebras
Note Our sparse Llama 2 7B base that was pruned to 50% sparsity and retrained on 50B tokens.
Note Our sparse Llama 2 7B base that was pruned to 70% sparsity and retrained on 150B tokens.