Abstract
Humans generally acquire new skills without compromising the old; however, the opposite holds for Large Language Models (LLMs), e.g., from LLaMA to CodeLLaMA. To this end, we propose a new post-pretraining method for LLMs with an expansion of Transformer blocks. We tune the expanded blocks using only new corpus, efficiently and effectively improving the model's knowledge without catastrophic forgetting. In this paper, we experiment on the corpus of code and math, yielding LLaMA Pro-8.3B, a versatile foundation model initialized from LLaMA2-7B, excelling in general tasks, programming, and mathematics. LLaMA Pro and its instruction-following counterpart (LLaMA Pro-Instruct) achieve advanced performance among various benchmarks, demonstrating superiority over existing open models in the LLaMA family and the immense potential of reasoning and addressing diverse tasks as an intelligent agent. Our findings provide valuable insights into integrating natural and programming languages, laying a solid foundation for developing advanced language agents that operate effectively in various environments.
Community
If the authors ever come on HuggingFace, consider adding arxiv:2308.06103 to the citation list.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Zebra: Extending Context Window with Layerwise Grouped Local-Global Attention (2023)
- ChatGPT's One-year Anniversary: Are Open-Source Large Language Models Catching up? (2023)
- Boosting Large Language Model for Speech Synthesis: An Empirical Study (2023)
- DeepSeek LLM: Scaling Open-Source Language Models with Longtermism (2024)
- LM-Cocktail: Resilient Tuning of Language Models via Model Merging (2023)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
How LLaMA Pro Revolutionizes AI with Block Expansion
Links π:
π Subscribe: https://www.youtube.com/@Arxflix
π Twitter: https://x.com/arxflix
π LMNT (Partner): https://lmnt.com/
Models citing this paper 56
Browse 56 models citing this paperDatasets citing this paper 0
No dataset linking this paper