Update README.md
Browse files
README.md
CHANGED
@@ -1,6 +1,6 @@
|
|
1 |
# Mixture of Attentions for Speculative Decoding
|
2 |
|
3 |
-
This
|
4 |
The paper introduces a novel architecture for speculative decoding that enhances the speed of large language model (LLM) inference.
|
5 |
|
6 |
It is supported in vLLM see our [Github repository](https://github.com/huawei-noah/HEBO/tree/mixture-of-attentions/).
|
|
|
1 |
# Mixture of Attentions for Speculative Decoding
|
2 |
|
3 |
+
This checkpoint was obtained from "[Mixture of Attentions For Speculative Decoding](https://arxiv.org/abs/2410.03804)" by Matthieu Zimmer*, Milan Gritta*, Gerasimos Lampouras, Haitham Bou Ammar, and Jun Wang.
|
4 |
The paper introduces a novel architecture for speculative decoding that enhances the speed of large language model (LLM) inference.
|
5 |
|
6 |
It is supported in vLLM see our [Github repository](https://github.com/huawei-noah/HEBO/tree/mixture-of-attentions/).
|