Spaces:

llama-moe
/

README

Running

Spico commited on Dec 25, 2023

Commit

2f58e8d

•

1 Parent(s): e0704ea

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -17,7 +17,7 @@ We build LLaMA-MoE with the following two steps:
 1. Partition LLaMA's FFNs into sparse experts and insert top-K gate for each layer of experts.
 2. Continually pre-train the initialized MoE model with an optimized data sampling weights from [Sheared LLaMA](https://arxiv.org/abs/2310.06694) and filtered datasets from [SlimPajama](https://www.cerebras.net/blog/slimpajama-a-627b-token-cleaned-and-deduplicated-version-of-redpajama).
-The number of activated model parameters is only 3.0~3.5B, which is friendly for deployment and research usage.
 | Model                     | \#Activated Experts | \#Experts | \#Activated Params |                                   Links                                   |

 1. Partition LLaMA's FFNs into sparse experts and insert top-K gate for each layer of experts.
 2. Continually pre-train the initialized MoE model with an optimized data sampling weights from [Sheared LLaMA](https://arxiv.org/abs/2310.06694) and filtered datasets from [SlimPajama](https://www.cerebras.net/blog/slimpajama-a-627b-token-cleaned-and-deduplicated-version-of-redpajama).
+The number of activated model parameters is only 3.0~3.5B, which is friendly for deployment and research usage. Please refer to our [technical report](https://github.com/pjlab-sys4nlp/llama-moe/blob/main/docs/LLaMA_MoE.pdf) for more details.
 | Model                     | \#Activated Experts | \#Experts | \#Activated Params |                                   Links                                   |