Nexesenex commited on
Commit
df797df
·
verified ·
1 Parent(s): cd2ad2d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +67 -1
README.md CHANGED
@@ -7,8 +7,74 @@ library_name: transformers
7
  tags:
8
  - mergekit
9
  - merge
10
- license: llama3.2
11
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  # merge
13
 
14
  This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
 
7
  tags:
8
  - mergekit
9
  - merge
10
+ license: llama3.3
11
  ---
12
+ # about
13
+
14
+ The Teaz series is my third attempt at making merges, after the Kostume and Kermes series.
15
+
16
+ This time, the goal was to make a smart model with a low perplexity, in accordance to the principles of the Kermes series, but with a merge of 3 merged models like on the kostume series.
17
+
18
+ Huihui's abliterated models were used:
19
+ - Llama 3.3 70b as the pivot of the first/main model.
20
+ - Nemotron 3.1 70b and Deepseek R1 Distill 70b as the pillars.
21
+ - and Tulu 3 70b as the backers of the 2nd and 3rd models.
22
+
23
+ Bingo again. I hit 3.45 ppl512 wikieng, 62+ or ARC-C, and 82+ on ARC-E. Absolute top of the class for L3.x 70b, like Kermes is for L3 3.2 3b.
24
+
25
+ No cheating, no contaminating, just the wonderful MergeKit model-stock merge technique leveraged to a new level (compared to what I already saw being done, at least).
26
+
27
+ Next projects will involve that model as the "smarts pillar" of further merges, aimed at any use case.
28
+
29
+ ---
30
+ # credits
31
+
32
+ Kudos go to the model authors, and to the Arcee / MergeKit folks, as well as to HF hosting the MergeKit App.
33
+ Also a big-up to SteelSkull, observing him cooking Nevoria decided me to try to make some merges by myself.
34
+
35
+ ---
36
+ # historic
37
+
38
+ 1) On the Kostume series started on the 11/02/0205 I tried to make a triple stock merge of 3 intermediary stock merges of a dozen of model or so.
39
+ This, to see if I could pile up their abilities.
40
+
41
+ Not bad, but nothing special about it, it's a bit hard for me to judge at 3b.
42
+
43
+ 2) On the Kermes series started the day after, I defined a simpler approach:
44
+
45
+ - Perplexity is the main constraint. Usual L3.2 3b finetunes are around 10.5-11 ppl512wikieng, Hermes is around 9.5.
46
+ - I also measure in French and Serbian to observe the variances.
47
+
48
+ - Arc Challenge and Easy are the second constraint to judge its basic logics.
49
+ - Usual L3.2 3b finetunes hit 40 and 60-65 respectively, Hermes3 hits 47+ and 70+.
50
+
51
+ - Lack of censorship. I always keep in mind to pick models compatible with that AMAP.
52
+ - This, may it be through the picked models' abliteration or the datasets they use.
53
+
54
+ - And of course, the test, both In Kobold/Croco.CPP (spamming very offensive requests), and in ST (a 10k prompt with a big lorebook).
55
+
56
+ Kermes series are basically stock merges on the top of anothers.
57
+ - The goal was to maintain as much the qualities of the models used, so I stay on 1+2 models for the first merge, and 1+2 for the second as well.
58
+
59
+ And bingo. Perplexity goes down still, ARC remain stable, it's quite unhinged still, and.. quite coherent, event at 10k+ context.
60
+
61
+ ---
62
+ # quantizations
63
+
64
+ GGUF static quantizations (Thanks Mradermacher!) :
65
+
66
+ https://huggingface.co/mradermacher/Llama_3.2_3b_Kermes_v2.1-GGUF
67
+
68
+ GGUF iMatrix quantizations (Thanks Mradermacher!) :
69
+
70
+ https://huggingface.co/mradermacher/Llama_3.2_3b_Kermes_v2.1-i1-GGUF
71
+
72
+ GGUF custom iMatrix quantizations:
73
+
74
+ https://huggingface.co/Nexesenex/Llama_3.2_3b_Kermes_v2.1-iMat-CQ-GGUF
75
+
76
+ ---
77
+ # merge
78
  # merge
79
 
80
  This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).