Model Summary
Introducing the world's first Llama-3 base model with 6B parameters. This model is a untrained model which was created from Meta-Llama-3-8B using a technique called downcycling .
You can check trained version of this model here: https://huggingface.co./prince-canuma/Llama-3-6B-v0.1
Model Description
- Developed by: Prince Canuma
- Sponsored by: General
- Model type: Llama
- License: Llama-3
Model Sources
- Repository: https://github.com/Blaizzy/Coding-LLMs-from-scratch/tree/main/Llama-3
- Video: https://youtube.com/playlist?list=PLDn_JsyofyfTH5_5V1MNb8UYKxMl6IMNy&si=5Y4cm-6wrMOD1Abr
Citation
BibTeX:
@misc{prince2024downcycling,
title={Efficient LLM Downcycling: Generating Diverse Model Sizes from Pretrained Giants},
author={Prince Canuma},
year={2024},
}
Thank You!
I want to extend my heartfelt thanks to the community for the invaluable expertise and unwavering support.
Additionally, I would like to thank Viet from General Catalyst (GC) for providing me with the much needed compute.
This is my most ambitious project yet, and it wouldn't have been possible without the incredible open-source ML community!
Developers, I am eager to see and hear about the innovative fine-tunes and applications you create.
Users, I am excited to learn about your experiences and use cases.
Thank you for your interest and support!
References:
@misc{komatsuzaki2023sparse,
title={Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints},
author={Aran Komatsuzaki and Joan Puigcerver and James Lee-Thorp and Carlos Riquelme Ruiz and Basil Mustafa and Joshua Ainslie and Yi Tay and Mostafa Dehghani and Neil Houlsby},
year={2023},
eprint={2212.05055},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
@misc{sanyal2024pretraining,
title={Pre-training Small Base LMs with Fewer Tokens},
author={Sunny Sanyal and Sujay Sanghavi and Alexandros G. Dimakis},
year={2024},
eprint={2404.08634},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
- Downloads last month
- 18