Update README.md
Browse files
README.md
CHANGED
@@ -25,7 +25,7 @@ The table below compares the vocabulary size, percentage of unkown tokens(tokens
|
|
25 |
Employing `Llama-2 7b` with the extended tokenizer as the foundational model, we develop four different expert models with four separate train datasets.
|
26 |
As the figure shows, the four expert models are then incorporated into one model using the `Mixtral` Architecture, resulting in a 19b MoE model.
|
27 |
|
28 |
-
<img width="500" alt="architecture" src="https://github.com/user-attachments/assets/
|
29 |
|
30 |
|
31 |
|
|
|
25 |
Employing `Llama-2 7b` with the extended tokenizer as the foundational model, we develop four different expert models with four separate train datasets.
|
26 |
As the figure shows, the four expert models are then incorporated into one model using the `Mixtral` Architecture, resulting in a 19b MoE model.
|
27 |
|
28 |
+
<img width="500" alt="architecture" src="https://github.com/user-attachments/assets/c261117c-5c47-4b40-a20a-ba2c824e6f23">
|
29 |
|
30 |
|
31 |
|