|
--- |
|
base_model: |
|
- Nexesenex/Llama_3.x_70b_Smarteaz_0.2_R1 |
|
- Nexesenex/Llama_3.x_70b_Smarteaz_0.2_NMT |
|
- Nexesenex/Llama_3.x_70b_Smarteaz_0.1 |
|
library_name: transformers |
|
tags: |
|
- mergekit |
|
- merge |
|
license: llama3.3 |
|
--- |
|
# about |
|
|
|
The Teaz series is my third attempt at making merges, after the Kostume and Kermes series. |
|
|
|
This time, the goal was to make a smart model with a low perplexity, in accordance to the principles of the Kermes series, but with a merge of 3 merged models like on the kostume series. |
|
|
|
Huihui's abliterated models were used: |
|
- Llama 3.3 70b as the pivot of the first/main model. |
|
- Nemotron 3.1 70b and Deepseek R1 Distill 70b as the pillars. |
|
- and Tulu 3 70b as the backers of the 2nd and 3rd models. |
|
|
|
Bingo again. I hit 3.45 ppl512 wikieng, 62+ or ARC-C, and 82+ on ARC-E. Absolute top of the class for L3.x 70b, like Kermes is for L3 3.2 3b. |
|
|
|
No cheating, no contaminating, just the wonderful MergeKit model-stock merge technique leveraged to a new level (compared to what I already saw being done, at least). |
|
|
|
Next projects will involve that model as the "smarts pillar" of further merges, aimed at any use case. |
|
|
|
--- |
|
# credits |
|
|
|
Kudos go to the model authors, and to the Arcee / MergeKit folks, as well as to HF hosting the MergeKit App. |
|
Also a big-up to SteelSkull, observing him cooking Nevoria decided me to try to make some merges by myself. |
|
|
|
--- |
|
# historic |
|
|
|
First : On the Kostume series started on the 11/02/0205 I tried to make a triple stock merge of 3 intermediary stock merges of a dozen of model or so. |
|
This, to see if I could pile up their abilities. |
|
|
|
- Not bad, but nothing special about it, it's a bit hard for me to judge at 3b. |
|
|
|
Second : On the Kermes series started the day after, I defined a simpler approach: |
|
|
|
- Perplexity is the main constraint. Usual L3.2 3b finetunes are around 10.5-11 ppl512wikieng, Hermes is around 9.5. |
|
- I also measure in French and Serbian to observe the variances. |
|
|
|
- Arc Challenge and Easy are the second constraint to judge its basic logics. |
|
- Usual L3.2 3b finetunes hit 40 and 60-65 respectively, Hermes3 hits 47+ and 70+. |
|
|
|
- Lack of censorship. I always keep in mind to pick models compatible with that AMAP. |
|
- This, may it be through the picked models' abliteration or the datasets they use. |
|
|
|
- And of course, the test, both In Kobold/Croco.CPP (spamming very offensive requests), and in ST (a 10k prompt with a big lorebook). |
|
|
|
Kermes series are basically stock merges on the top of anothers. |
|
- The goal was to maintain as much the qualities of the models used, so I stay on 1+2 models for the first merge, and 1+2 for the second as well. |
|
|
|
And bingo. Perplexity goes down still, ARC remain stable, it's quite unhinged still, and.. quite coherent, event at 10k+ context. |
|
|
|
--- |
|
# quantizations |
|
|
|
GGUF static quantizations (Thanks Mradermacher!) : |
|
|
|
https://huggingface.co./mradermacher/Llama_3.x_70b_Smarteaz_V1-GGUF |
|
|
|
GGUF iMatrix quantizations (Thanks Mradermacher!) : |
|
|
|
https://huggingface.co./mradermacher/Llama_3.x_70b_Smarteaz_V1-i1-GGUF |
|
|
|
--- |
|
# merge |
|
# merge |
|
|
|
This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit). |
|
|
|
## Merge Details |
|
### Merge Method |
|
|
|
This model was merged using the [Model Stock](https://arxiv.org/abs/2403.19522) merge method using [Nexesenex/Llama_3.x_70b_Smarteaz_0.1](https://huggingface.co./Nexesenex/Llama_3.x_70b_Smarteaz_0.1) as a base. |
|
|
|
### Models Merged |
|
|
|
The following models were included in the merge: |
|
* [Nexesenex/Llama_3.x_70b_Smarteaz_0.2_R1](https://huggingface.co./Nexesenex/Llama_3.x_70b_Smarteaz_0.2_R1) |
|
* [Nexesenex/Llama_3.x_70b_Smarteaz_0.2_NMT](https://huggingface.co./Nexesenex/Llama_3.x_70b_Smarteaz_0.2_NMT) |
|
|
|
### Configuration |
|
|
|
The following YAML configuration was used to produce this model: |
|
|
|
```yaml |
|
merge_method: model_stock |
|
models: |
|
- model: Nexesenex/Llama_3.x_70b_Smarteaz_0.2_NMT |
|
parameters: |
|
weight: 1.0 |
|
- model: Nexesenex/Llama_3.x_70b_Smarteaz_0.2_R1 |
|
parameters: |
|
weight: 1.0 |
|
base_model: Nexesenex/Llama_3.x_70b_Smarteaz_0.1 |
|
dtype: bfloat16 |
|
normalize: true |
|
|
|
``` |