JunxiongWang commited on
Commit
27b9c9a
·
verified ·
1 Parent(s): 33d8081

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -0
README.md CHANGED
@@ -1,3 +1,29 @@
1
  ---
2
  license: apache-2.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
  ---
4
+
5
+ Zero-shot results when using the [Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) as the teacher model, and the [Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) as the initialized model
6
+
7
+ | Model | [Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) | [Llama-3.2-Mamba2-0.5-3B-sft](https://huggingface.co/JunxiongWang/Mamba2InLlama3B_Half) | [Llama-3.2-Mamba2-0.5-3B-dpo](https://huggingface.co/JunxiongWang/Mamba2InLlama3B_Half_DPO) |
8
+ |---------------|---------------------------------------------------------------------------------|-----------------------------------|-----------------------------------|
9
+ | Initialization Model | N/A | Llama-3.2-3B-Instruct | Llama-3.2-3B-Instruct |
10
+ | Teacher Model | N/A | Llama-3.1-8B-Instruct | Llama-3.1-8B-Instruct |
11
+ | arc_challenge | 0.459 | 0.4667 | 0.541 |
12
+ | arc_easy | 0.7407 | 0.7668 | 0.8026 | |
13
+ | hellaswag | 0.7043 | 0.6913 | 0.7445 |
14
+ | mmlu | 0.6043 | 0.5271 | 0.5247 |
15
+ | openbookqa | 0.36 | 0.388 | 0.424 |
16
+ | piqa | 0.7568 | 0.7601 | 0.7769 |
17
+ | pubmedqa | 0.696 | 0.638 | 0.654 |
18
+ | race | 0.4067 | 0.3981 | 0.4344 |
19
+ | winogrande | 0.6748 | 0.6606 | 0.6732 |
20
+
21
+
22
+ ```
23
+ @article{junxiongdaniele2024mambainllama,
24
+ title = {The Mamba in the Llama: Distilling and Accelerating Hybrid Models},
25
+ author = {Junxiong Wang and Daniele Paliotta and Avner May and Alexander M. Rush and Tri Dao},
26
+ journal = {arXiv preprint arXiv:2408.15237},
27
+ year = {2024}
28
+ }
29
+ ```