# Mixture of Attentions for Speculative Decoding This checkpoint was obtained from "[Mixture of Attentions For Speculative Decoding](https://arxiv.org/abs/2410.03804)" by Matthieu Zimmer*, Milan Gritta*, Gerasimos Lampouras, Haitham Bou Ammar, and Jun Wang. The paper introduces a novel architecture for speculative decoding that enhances the speed of large language model (LLM) inference. It is supported in vLLM see our [Github repository](https://github.com/huawei-noah/HEBO/tree/mixture-of-attentions/). ### Checkpoints | Base Model | MOA Spec on Hugging Face | Base Model Parameters | MOA Spec Parameters | |------|------|------|------| | meta-llama/Meta-Llama-3-8B-Instruct | [huawei-noah/MOASpec-Llama-3-8B-Instruct](https://huggingface.co./huawei-noah/MOASpec-Llama-3-8B-Instruct) | 8B | 0.25B | ## Citation If you use this code or this checkpoint in your research, please cite our paper: ```bibtex @misc{zimmer2024mixtureattentionsspeculativedecoding, title={Mixture of Attentions For Speculative Decoding}, author={Matthieu Zimmer and Milan Gritta and Gerasimos Lampouras and Haitham Bou Ammar and Jun Wang}, year={2024}, eprint={2410.03804}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2410.03804}, } ``` ## License This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for more details. Disclaimer: This open source project is not an official Huawei product, Huawei is not expected to provide support for this project.