JunxiongWang
/

Llama3.2-Mamba2-3B-distill

PyTorch

llama

Model card Files Files and versions Community

JunxiongWang commited on Oct 15, 2024

Commit

27b9c9a

verified ·

1 Parent(s): 33d8081

Update README.md

Browse files

Files changed (1) hide show

README.md +26 -0

README.md CHANGED Viewed

@@ -1,3 +1,29 @@
 ---
 license: apache-2.0
 ---

 ---
 license: apache-2.0
 ---
+Zero-shot results when using the [Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) as the teacher model, and the [Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) as the initialized model
+| Model          | [Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) | [Llama-3.2-Mamba2-0.5-3B-sft](https://huggingface.co/JunxiongWang/Mamba2InLlama3B_Half)       | [Llama-3.2-Mamba2-0.5-3B-dpo](https://huggingface.co/JunxiongWang/Mamba2InLlama3B_Half_DPO)       |
+|---------------|---------------------------------------------------------------------------------|-----------------------------------|-----------------------------------|
+| Initialization Model | N/A                                                                             | Llama-3.2-3B-Instruct             | Llama-3.2-3B-Instruct             |
+| Teacher Model | N/A                                                                             | Llama-3.1-8B-Instruct             | Llama-3.1-8B-Instruct             |
+| arc_challenge   | 0.459                                                                           | 0.4667                                                            | 0.541                                                                 |
+| arc_easy        | 0.7407                                                                          | 0.7668                                                            | 0.8026                                                                |                                                               |
+| hellaswag       | 0.7043                                                                          | 0.6913                                                            | 0.7445                                                                |
+| mmlu            | 0.6043                                                                          | 0.5271                                                            | 0.5247                                                                |
+| openbookqa      | 0.36                                                                            | 0.388                                                             | 0.424                                                                 |
+| piqa            | 0.7568                                                                          | 0.7601                                                            | 0.7769                                                                |
+| pubmedqa        | 0.696                                                                           | 0.638                                                             | 0.654                                                                 |
+| race            | 0.4067                                                                          | 0.3981                                                            | 0.4344                                                                |
+| winogrande      | 0.6748                                                                          | 0.6606                                                            | 0.6732                                                                |
+```
+@article{junxiongdaniele2024mambainllama,
+  title   = {The Mamba in the Llama: Distilling and Accelerating Hybrid Models},
+  author  = {Junxiong Wang and Daniele Paliotta and Avner May and Alexander M. Rush and Tri Dao},
+  journal = {arXiv preprint arXiv:2408.15237},
+  year    = {2024}
+}
+```