Update README.md
Browse files
README.md
CHANGED
@@ -7,8 +7,74 @@ library_name: transformers
|
|
7 |
tags:
|
8 |
- mergekit
|
9 |
- merge
|
10 |
-
license: llama3.
|
11 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
12 |
# merge
|
13 |
|
14 |
This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
|
|
|
7 |
tags:
|
8 |
- mergekit
|
9 |
- merge
|
10 |
+
license: llama3.3
|
11 |
---
|
12 |
+
# about
|
13 |
+
|
14 |
+
The Teaz series is my third attempt at making merges, after the Kostume and Kermes series.
|
15 |
+
|
16 |
+
This time, the goal was to make a smart model with a low perplexity, in accordance to the principles of the Kermes series, but with a merge of 3 merged models like on the kostume series.
|
17 |
+
|
18 |
+
Huihui's abliterated models were used:
|
19 |
+
- Llama 3.3 70b as the pivot of the first/main model.
|
20 |
+
- Nemotron 3.1 70b and Deepseek R1 Distill 70b as the pillars.
|
21 |
+
- and Tulu 3 70b as the backers of the 2nd and 3rd models.
|
22 |
+
|
23 |
+
Bingo again. I hit 3.45 ppl512 wikieng, 62+ or ARC-C, and 82+ on ARC-E. Absolute top of the class for L3.x 70b, like Kermes is for L3 3.2 3b.
|
24 |
+
|
25 |
+
No cheating, no contaminating, just the wonderful MergeKit model-stock merge technique leveraged to a new level (compared to what I already saw being done, at least).
|
26 |
+
|
27 |
+
Next projects will involve that model as the "smarts pillar" of further merges, aimed at any use case.
|
28 |
+
|
29 |
+
---
|
30 |
+
# credits
|
31 |
+
|
32 |
+
Kudos go to the model authors, and to the Arcee / MergeKit folks, as well as to HF hosting the MergeKit App.
|
33 |
+
Also a big-up to SteelSkull, observing him cooking Nevoria decided me to try to make some merges by myself.
|
34 |
+
|
35 |
+
---
|
36 |
+
# historic
|
37 |
+
|
38 |
+
1) On the Kostume series started on the 11/02/0205 I tried to make a triple stock merge of 3 intermediary stock merges of a dozen of model or so.
|
39 |
+
This, to see if I could pile up their abilities.
|
40 |
+
|
41 |
+
Not bad, but nothing special about it, it's a bit hard for me to judge at 3b.
|
42 |
+
|
43 |
+
2) On the Kermes series started the day after, I defined a simpler approach:
|
44 |
+
|
45 |
+
- Perplexity is the main constraint. Usual L3.2 3b finetunes are around 10.5-11 ppl512wikieng, Hermes is around 9.5.
|
46 |
+
- I also measure in French and Serbian to observe the variances.
|
47 |
+
|
48 |
+
- Arc Challenge and Easy are the second constraint to judge its basic logics.
|
49 |
+
- Usual L3.2 3b finetunes hit 40 and 60-65 respectively, Hermes3 hits 47+ and 70+.
|
50 |
+
|
51 |
+
- Lack of censorship. I always keep in mind to pick models compatible with that AMAP.
|
52 |
+
- This, may it be through the picked models' abliteration or the datasets they use.
|
53 |
+
|
54 |
+
- And of course, the test, both In Kobold/Croco.CPP (spamming very offensive requests), and in ST (a 10k prompt with a big lorebook).
|
55 |
+
|
56 |
+
Kermes series are basically stock merges on the top of anothers.
|
57 |
+
- The goal was to maintain as much the qualities of the models used, so I stay on 1+2 models for the first merge, and 1+2 for the second as well.
|
58 |
+
|
59 |
+
And bingo. Perplexity goes down still, ARC remain stable, it's quite unhinged still, and.. quite coherent, event at 10k+ context.
|
60 |
+
|
61 |
+
---
|
62 |
+
# quantizations
|
63 |
+
|
64 |
+
GGUF static quantizations (Thanks Mradermacher!) :
|
65 |
+
|
66 |
+
https://huggingface.co/mradermacher/Llama_3.2_3b_Kermes_v2.1-GGUF
|
67 |
+
|
68 |
+
GGUF iMatrix quantizations (Thanks Mradermacher!) :
|
69 |
+
|
70 |
+
https://huggingface.co/mradermacher/Llama_3.2_3b_Kermes_v2.1-i1-GGUF
|
71 |
+
|
72 |
+
GGUF custom iMatrix quantizations:
|
73 |
+
|
74 |
+
https://huggingface.co/Nexesenex/Llama_3.2_3b_Kermes_v2.1-iMat-CQ-GGUF
|
75 |
+
|
76 |
+
---
|
77 |
+
# merge
|
78 |
# merge
|
79 |
|
80 |
This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
|