Update README.md
Browse files
README.md
CHANGED
@@ -50,7 +50,7 @@ print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
|
|
50 |
| Model | #Experts | #Activated Experts | #Params | # Activated Params | Flops(T) per sample (se q=2048) | Model Weights |
|
51 |
| ------------------- | -------- | ------------------ | ------- | ------------------ | --------------------------------- | ------------------------------------------------------------ |
|
52 |
| 265M | - | - | 265M | 265M | 0.48 | [🤗 llama-265m](https://huggingface.co/JuncaiL/llama-265m) |
|
53 |
-
| 8 $\times$ 265M MoE |
|
54 |
| llama-7b | - | - | 7B | 7B | 25.29 | |
|
55 |
|
56 |
**Model Evaluation**
|
|
|
50 |
| Model | #Experts | #Activated Experts | #Params | # Activated Params | Flops(T) per sample (se q=2048) | Model Weights |
|
51 |
| ------------------- | -------- | ------------------ | ------- | ------------------ | --------------------------------- | ------------------------------------------------------------ |
|
52 |
| 265M | - | - | 265M | 265M | 0.48 | [🤗 llama-265m](https://huggingface.co/JuncaiL/llama-265m) |
|
53 |
+
| 8 $\times$ 265M MoE | 8 | 2 | 970M | 332M | 0.76 | [🤗 llama-8x265m-moe](https://huggingface.co/JuncaiL/llama-8x265m-moe) |
|
54 |
| llama-7b | - | - | 7B | 7B | 25.29 | |
|
55 |
|
56 |
**Model Evaluation**
|