seemdog
/

Mixnu

@@ -25,7 +25,7 @@ The table below compares the vocabulary size, percentage of unkown tokens(tokens
 Employing `Llama-2 7b` with the extended tokenizer as the foundational model, we develop four different expert models with four separate train datasets.
 As the figure shows, the four expert models are then incorporated into one model using the `Mixtral` Architecture, resulting in a 19b MoE model.
-<img width="500" alt="architecture" src="https://github.com/user-attachments/assets/003400e7-7bfb-49bb-84a3-5bf315cb96bb">

 Employing `Llama-2 7b` with the extended tokenizer as the foundational model, we develop four different expert models with four separate train datasets.
 As the figure shows, the four expert models are then incorporated into one model using the `Mixtral` Architecture, resulting in a 19b MoE model.
+<img width="500" alt="architecture" src="https://github.com/user-attachments/assets/c261117c-5c47-4b40-a20a-ba2c824e6f23">