Spico commited on
Commit
e0704ea
1 Parent(s): 7f97e19

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -17,7 +17,7 @@ We build LLaMA-MoE with the following two steps:
17
  1. Partition LLaMA's FFNs into sparse experts and insert top-K gate for each layer of experts.
18
  2. Continually pre-train the initialized MoE model with an optimized data sampling weights from [Sheared LLaMA](https://arxiv.org/abs/2310.06694) and filtered datasets from [SlimPajama](https://www.cerebras.net/blog/slimpajama-a-627b-token-cleaned-and-deduplicated-version-of-redpajama).
19
 
20
- The total number of model parameters is only 6.7B, which is friendly for deployment and research usage.
21
 
22
 
23
  | Model | \#Activated Experts | \#Experts | \#Activated Params | Links |
 
17
  1. Partition LLaMA's FFNs into sparse experts and insert top-K gate for each layer of experts.
18
  2. Continually pre-train the initialized MoE model with an optimized data sampling weights from [Sheared LLaMA](https://arxiv.org/abs/2310.06694) and filtered datasets from [SlimPajama](https://www.cerebras.net/blog/slimpajama-a-627b-token-cleaned-and-deduplicated-version-of-redpajama).
19
 
20
+ The number of activated model parameters is only 3.0~3.5B, which is friendly for deployment and research usage.
21
 
22
 
23
  | Model | \#Activated Experts | \#Experts | \#Activated Params | Links |