--- base_model: - Nexesenex/Llama_3.x_70b_Smarteaz_0.2_R1 - Nexesenex/Llama_3.x_70b_Smarteaz_0.2_NMT - Nexesenex/Llama_3.x_70b_Smarteaz_0.1 library_name: transformers tags: - mergekit - merge license: llama3.3 --- # about The Teaz series is my third attempt at making merges, after the Kostume and Kermes series. This time, the goal was to make a smart model with a low perplexity, in accordance to the principles of the Kermes series, but with a merge of 3 merged models like on the kostume series. Huihui's abliterated models were used: - Llama 3.3 70b as the pivot of the first/main model. - Nemotron 3.1 70b and Deepseek R1 Distill 70b as the pillars. - and Tulu 3 70b as the backers of the 2nd and 3rd models. Bingo again. I hit 3.45 ppl512 wikieng, 62+ or ARC-C, and 82+ on ARC-E. Absolute top of the class for L3.x 70b, like Kermes is for L3 3.2 3b. No cheating, no contaminating, just the wonderful MergeKit model-stock merge technique leveraged to a new level (compared to what I already saw being done, at least). Next projects will involve that model as the "smarts pillar" of further merges, aimed at any use case. --- # credits Kudos go to the model authors, and to the Arcee / MergeKit folks, as well as to HF hosting the MergeKit App. Also a big-up to SteelSkull, observing him cooking Nevoria decided me to try to make some merges by myself. --- # historic First : On the Kostume series started on the 11/02/0205 I tried to make a triple stock merge of 3 intermediary stock merges of a dozen of model or so. This, to see if I could pile up their abilities. - Not bad, but nothing special about it, it's a bit hard for me to judge at 3b. Second : On the Kermes series started the day after, I defined a simpler approach: - Perplexity is the main constraint. Usual L3.2 3b finetunes are around 10.5-11 ppl512wikieng, Hermes is around 9.5. - I also measure in French and Serbian to observe the variances. - Arc Challenge and Easy are the second constraint to judge its basic logics. - Usual L3.2 3b finetunes hit 40 and 60-65 respectively, Hermes3 hits 47+ and 70+. - Lack of censorship. I always keep in mind to pick models compatible with that AMAP. - This, may it be through the picked models' abliteration or the datasets they use. - And of course, the test, both In Kobold/Croco.CPP (spamming very offensive requests), and in ST (a 10k prompt with a big lorebook). Kermes series are basically stock merges on the top of anothers. - The goal was to maintain as much the qualities of the models used, so I stay on 1+2 models for the first merge, and 1+2 for the second as well. And bingo. Perplexity goes down still, ARC remain stable, it's quite unhinged still, and.. quite coherent, event at 10k+ context. --- # quantizations GGUF static quantizations (Thanks Mradermacher!) : https://huggingface.co./mradermacher/Llama_3.x_70b_Smarteaz_V1-GGUF GGUF iMatrix quantizations (Thanks Mradermacher!) : https://huggingface.co./mradermacher/Llama_3.x_70b_Smarteaz_V1-i1-GGUF --- # merge # merge This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit). ## Merge Details ### Merge Method This model was merged using the [Model Stock](https://arxiv.org/abs/2403.19522) merge method using [Nexesenex/Llama_3.x_70b_Smarteaz_0.1](https://huggingface.co./Nexesenex/Llama_3.x_70b_Smarteaz_0.1) as a base. ### Models Merged The following models were included in the merge: * [Nexesenex/Llama_3.x_70b_Smarteaz_0.2_R1](https://huggingface.co./Nexesenex/Llama_3.x_70b_Smarteaz_0.2_R1) * [Nexesenex/Llama_3.x_70b_Smarteaz_0.2_NMT](https://huggingface.co./Nexesenex/Llama_3.x_70b_Smarteaz_0.2_NMT) ### Configuration The following YAML configuration was used to produce this model: ```yaml merge_method: model_stock models: - model: Nexesenex/Llama_3.x_70b_Smarteaz_0.2_NMT parameters: weight: 1.0 - model: Nexesenex/Llama_3.x_70b_Smarteaz_0.2_R1 parameters: weight: 1.0 base_model: Nexesenex/Llama_3.x_70b_Smarteaz_0.1 dtype: bfloat16 normalize: true ```