Transformers
GGUF
English
Generated from Trainer
Eval Results
Inference Endpoints
conversational
aashish1904 commited on
Commit
445456c
1 Parent(s): e95e508

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +238 -0
README.md ADDED
@@ -0,0 +1,238 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+
4
+ license: other
5
+ license_name: qwen
6
+ license_link: https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/blob/main/LICENSE
7
+ datasets:
8
+ - Magpie-Align/Magpie-Qwen2.5-Pro-1M-v0.1
9
+ base_model:
10
+ - Qwen/Qwen2.5-7B-Instruct
11
+ library_name: transformers
12
+ tags:
13
+ - generated_from_trainer
14
+ language:
15
+ - en
16
+ model-index:
17
+ - name: cybertron-v4-qw7B-MGS
18
+ results:
19
+ - task:
20
+ type: text-generation
21
+ name: Text Generation
22
+ dataset:
23
+ name: IFEval (0-Shot)
24
+ type: HuggingFaceH4/ifeval
25
+ args:
26
+ num_few_shot: 0
27
+ metrics:
28
+ - type: inst_level_strict_acc and prompt_level_strict_acc
29
+ value: 62.64
30
+ name: strict accuracy
31
+ source:
32
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=fblgit/cybertron-v4-qw7B-MGS
33
+ name: Open LLM Leaderboard
34
+ - task:
35
+ type: text-generation
36
+ name: Text Generation
37
+ dataset:
38
+ name: BBH (3-Shot)
39
+ type: BBH
40
+ args:
41
+ num_few_shot: 3
42
+ metrics:
43
+ - type: acc_norm
44
+ value: 37.04
45
+ name: normalized accuracy
46
+ source:
47
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=fblgit/cybertron-v4-qw7B-MGS
48
+ name: Open LLM Leaderboard
49
+ - task:
50
+ type: text-generation
51
+ name: Text Generation
52
+ dataset:
53
+ name: MATH Lvl 5 (4-Shot)
54
+ type: hendrycks/competition_math
55
+ args:
56
+ num_few_shot: 4
57
+ metrics:
58
+ - type: exact_match
59
+ value: 27.72
60
+ name: exact match
61
+ source:
62
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=fblgit/cybertron-v4-qw7B-MGS
63
+ name: Open LLM Leaderboard
64
+ - task:
65
+ type: text-generation
66
+ name: Text Generation
67
+ dataset:
68
+ name: GPQA (0-shot)
69
+ type: Idavidrein/gpqa
70
+ args:
71
+ num_few_shot: 0
72
+ metrics:
73
+ - type: acc_norm
74
+ value: 8.05
75
+ name: acc_norm
76
+ source:
77
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=fblgit/cybertron-v4-qw7B-MGS
78
+ name: Open LLM Leaderboard
79
+ - task:
80
+ type: text-generation
81
+ name: Text Generation
82
+ dataset:
83
+ name: MuSR (0-shot)
84
+ type: TAUR-Lab/MuSR
85
+ args:
86
+ num_few_shot: 0
87
+ metrics:
88
+ - type: acc_norm
89
+ value: 13.2
90
+ name: acc_norm
91
+ source:
92
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=fblgit/cybertron-v4-qw7B-MGS
93
+ name: Open LLM Leaderboard
94
+ - task:
95
+ type: text-generation
96
+ name: Text Generation
97
+ dataset:
98
+ name: MMLU-PRO (5-shot)
99
+ type: TIGER-Lab/MMLU-Pro
100
+ config: main
101
+ split: test
102
+ args:
103
+ num_few_shot: 5
104
+ metrics:
105
+ - type: acc
106
+ value: 38.59
107
+ name: accuracy
108
+ source:
109
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=fblgit/cybertron-v4-qw7B-MGS
110
+ name: Open LLM Leaderboard
111
+
112
+ ---
113
+
114
+ [![QuantFactory Banner](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ)](https://hf.co/QuantFactory)
115
+
116
+
117
+ # QuantFactory/cybertron-v4-qw7B-MGS-GGUF
118
+ This is quantized version of [fblgit/cybertron-v4-qw7B-MGS](https://huggingface.co/fblgit/cybertron-v4-qw7B-MGS) created using llama.cpp
119
+
120
+ # Original Model Card
121
+
122
+
123
+ # cybertron-v4-qw7B-MGS
124
+
125
+ **WE ARE BACK** Cybertron v4, #1 LLM in its class. Based on the amazing Qwen2.5 7B
126
+
127
+ **Scoring #1 LLM of 7B and 8B at 30.10.2024.**
128
+
129
+ ![cybertron-v4-MGS](https://huggingface.co/fblgit/cybertron-v4-qw7B-MGS/resolve/main/cybertron_v4MGS.png)
130
+
131
+ Here we use our novel approach called `MGS`. Its up to you to figure out what it means.
132
+
133
+ Cybertron V4 went thru SFT over `Magpie-Align/Magpie-Qwen2.5-Pro-1M-v0.1`
134
+
135
+ ## Quantz
136
+ Avaialble at https://huggingface.co/bartowski/cybertron-v4-qw7B-MGS-GGUF
137
+
138
+ ## MGS
139
+ Being fair:
140
+
141
+ https://arxiv.org/pdf/2410.21228
142
+
143
+ MGS, among other things.. a strategy of tackling corpora forgetful.
144
+
145
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
146
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_fblgit__cybertron-v4-qw7B-MGS)
147
+
148
+ | Metric |Value|
149
+ |-------------------|----:|
150
+ |Avg. |31.21|
151
+ |IFEval (0-Shot) |62.64|
152
+ |BBH (3-Shot) |37.04|
153
+ |MATH Lvl 5 (4-Shot)|27.72|
154
+ |GPQA (0-shot) | 8.05|
155
+ |MuSR (0-shot) |13.20|
156
+ |MMLU-PRO (5-shot) |38.59|
157
+
158
+ ## Try Cybertron v4!
159
+
160
+ Thanks to @rombodawg for contributing with a free to use Inference space hosted at:
161
+
162
+ https://huggingface.co/spaces/rombodawg/Try_fblgit_cybertron-v4-qw7B-MGS
163
+
164
+ ## Training procedure
165
+ 1 Epoch as usual.
166
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
167
+
168
+ ### Training hyperparameters
169
+
170
+ The following hyperparameters were used during training:
171
+ - seed: 42
172
+ - distributed_type: multi-GPU
173
+ - num_devices: 8
174
+ - total_train_batch_size: 128
175
+ - total_eval_batch_size: 16
176
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
177
+ - num_epochs: 1
178
+
179
+ ### Training results
180
+
181
+ | Training Loss | Epoch | Step | Validation Loss |
182
+ |:-------------:|:------:|:----:|:---------------:|
183
+ | 0.7405 | 0.0007 | 1 | 0.5760 |
184
+ | 0.6146 | 0.0502 | 71 | 0.5045 |
185
+ | 0.5908 | 0.1003 | 142 | 0.4930 |
186
+ | 0.5669 | 0.1505 | 213 | 0.4854 |
187
+ | 0.5575 | 0.2007 | 284 | 0.4811 |
188
+ | 0.535 | 0.2508 | 355 | 0.4765 |
189
+ | 0.5161 | 0.3010 | 426 | 0.4736 |
190
+ | 0.5268 | 0.3511 | 497 | 0.4726 |
191
+ | 0.5119 | 0.4013 | 568 | 0.4701 |
192
+ | 0.5329 | 0.4515 | 639 | 0.4687 |
193
+ | 0.5167 | 0.5016 | 710 | 0.4673 |
194
+ | 0.5105 | 0.5518 | 781 | 0.4660 |
195
+ | 0.5203 | 0.6020 | 852 | 0.4653 |
196
+ | 0.5035 | 0.6521 | 923 | 0.4646 |
197
+ | 0.4903 | 0.7023 | 994 | 0.4641 |
198
+ | 0.5031 | 0.7525 | 1065 | 0.4628 |
199
+ | 0.5147 | 0.8026 | 1136 | 0.4629 |
200
+ | 0.5037 | 0.8528 | 1207 | 0.4620 |
201
+ | 0.5029 | 0.9029 | 1278 | 0.4620 |
202
+ | 0.492 | 0.9531 | 1349 | 0.4621 |
203
+
204
+
205
+ ### Framework versions
206
+
207
+ - PEFT 0.13.2
208
+ - Transformers 4.45.2
209
+ - Pytorch 2.3.0+cu121
210
+ - Datasets 3.0.1
211
+ - Tokenizers 0.20.1
212
+
213
+ ## Citations
214
+ ```
215
+ @misc{thebeagle-v2,
216
+ title={TheBeagle v2: MGS},
217
+ author={Xavier Murias},
218
+ year={2024},
219
+ publisher = {HuggingFace},
220
+ journal = {HuggingFace repository},
221
+ howpublished = {\url{https://huggingface.co/fblgit/TheBeagle-v2beta-32B-MGS}},
222
+ }
223
+
224
+ @misc{qwen2.5,
225
+ title = {Qwen2.5: A Party of Foundation Models},
226
+ url = {https://qwenlm.github.io/blog/qwen2.5/},
227
+ author = {Qwen Team},
228
+ month = {September},
229
+ year = {2024}
230
+ }
231
+
232
+ @article{qwen2,
233
+ title={Qwen2 Technical Report},
234
+ author={An Yang and Baosong Yang and Binyuan Hui and Bo Zheng and Bowen Yu and Chang Zhou and Chengpeng Li and Chengyuan Li and Dayiheng Liu and Fei Huang and Guanting Dong and Haoran Wei and Huan Lin and Jialong Tang and Jialin Wang and Jian Yang and Jianhong Tu and Jianwei Zhang and Jianxin Ma and Jin Xu and Jingren Zhou and Jinze Bai and Jinzheng He and Junyang Lin and Kai Dang and Keming Lu and Keqin Chen and Kexin Yang and Mei Li and Mingfeng Xue and Na Ni and Pei Zhang and Peng Wang and Ru Peng and Rui Men and Ruize Gao and Runji Lin and Shijie Wang and Shuai Bai and Sinan Tan and Tianhang Zhu and Tianhao Li and Tianyu Liu and Wenbin Ge and Xiaodong Deng and Xiaohuan Zhou and Xingzhang Ren and Xinyu Zhang and Xipin Wei and Xuancheng Ren and Yang Fan and Yang Yao and Yichang Zhang and Yu Wan and Yunfei Chu and Yuqiong Liu and Zeyu Cui and Zhenru Zhang and Zhihao Fan},
235
+ journal={arXiv preprint arXiv:2407.10671},
236
+ year={2024}
237
+ }
238
+ ```