Spaces:

llama-moe
/

README

Running

Spico commited on Dec 24, 2023

Commit

e0704ea

•

1 Parent(s): 7f97e19

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -17,7 +17,7 @@ We build LLaMA-MoE with the following two steps:
 1. Partition LLaMA's FFNs into sparse experts and insert top-K gate for each layer of experts.
 2. Continually pre-train the initialized MoE model with an optimized data sampling weights from [Sheared LLaMA](https://arxiv.org/abs/2310.06694) and filtered datasets from [SlimPajama](https://www.cerebras.net/blog/slimpajama-a-627b-token-cleaned-and-deduplicated-version-of-redpajama).
-The total number of model parameters is only 6.7B, which is friendly for deployment and research usage.
 | Model                     | \#Activated Experts | \#Experts | \#Activated Params |                                   Links                                   |

 1. Partition LLaMA's FFNs into sparse experts and insert top-K gate for each layer of experts.
 2. Continually pre-train the initialized MoE model with an optimized data sampling weights from [Sheared LLaMA](https://arxiv.org/abs/2310.06694) and filtered datasets from [SlimPajama](https://www.cerebras.net/blog/slimpajama-a-627b-token-cleaned-and-deduplicated-version-of-redpajama).
+The number of activated model parameters is only 3.0~3.5B, which is friendly for deployment and research usage.
 | Model                     | \#Activated Experts | \#Experts | \#Activated Params |                                   Links                                   |