--- base_model: - cstr/llama3.1-8b-spaetzle-v85 - cstr/llama3.1-8b-spaetzle-v86 - cstr/llama3.1-8b-spaetzle-v74 tags: - merge - mergekit - lazymergekit - cstr/llama3.1-8b-spaetzle-v85 - cstr/llama3.1-8b-spaetzle-v86 - cstr/llama3.1-8b-spaetzle-v74 license: llama3 language: - en - de --- # llama3.1-8b-spaetzle-v90 llama3.1-8b-spaetzle-v90 is a progressive merge of merges. # evaluation German EQ-Bench v2_de: 69.93 (171/171). English (v2): 77.88 (171/171) [Open LLM Leaderboard Evaluation Results](https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard) Detailed results can be found [here](https://huggingface.co./datasets/open-llm-leaderboard/details_cstr__llama3.1-8b-spaetzle-v90) | Metric |Value| |-------------------|----:| |Avg. |27.59| |IFEval (0-Shot) |73.56| |BBH (3-Shot) |32.76| |MATH Lvl 5 (4-Shot)|13.37| |GPQA (0-shot) | 4.36| |MuSR (0-shot) |11.15| |MMLU-PRO (5-shot) |30.34| # merge tree The merge tree involves the following models: - NousResearch/Hermes-3-Llama-3.1-8B - Undi95/Meta-Llama-3.1-8B-Claude - Dampfinchen/Llama-3.1-8B-Ultra-Instruct - VAGOsolutions/Llama-3.1-SauerkrautLM-8b-Instruct - akjindal53244/Llama-3.1-Storm-8B - nbeerbower/llama3.1-gutenberg-8B - Undi95/Meta-Llama-3.1-8B-Claude - DiscoResearch/Llama3-DiscoLeo-Instruct-8B-v0.1 - nbeerbower/llama-3-wissenschaft-8B-v2 - Azure99/blossom-v5-llama3-8b - VAGOsolutions/Llama-3-SauerkrautLM-8b-Instruct - princeton-nlp/Llama-3-Instruct-8B-SimPO - Locutusque/llama-3-neural-chat-v1-8b - Locutusque/Llama-3-Orca-1.0-8B - DiscoResearch/Llama3_DiscoLM_German_8b_v0.1_experimental - seedboxai/Llama-3-Kafka-8B-v0.2 - VAGOsolutions/Llama-3-SauerkrautLM-8b-Instruct - nbeerbower/llama-3-wissenschaft-8B-v2 - mlabonne/Daredevil-8B-abliterated-dpomix There have been a number of steps involved, among which, slep merging of only middle layers compensating for tokenizer / chat template differences. An illustration below. ## 🧩 Configuration The final merge for this was: ```yaml models: - model: cstr/llama3.1-8b-spaetzle-v59 # no parameters necessary for base model - model: cstr/llama3.1-8b-spaetzle-v85 parameters: density: 0.65 weight: 0.3 - model: cstr/llama3.1-8b-spaetzle-v86 parameters: density: 0.65 weight: 0.3 - model: cstr/llama3.1-8b-spaetzle-v74 parameters: density: 0.65 weight: 0.3 merge_method: dare_ties base_model: cstr/llama3.1-8b-spaetzle-v59 parameters: int8_mask: true dtype: bfloat16 random_seed: 0 tokenizer_source: base ``` Among the previous steps: ```yaml models: - model: NousResearch/Hermes-3-Llama-3.1-8B merge_method: slerp base_model: cstr/llama3.1-8b-spaetzle-v74 parameters: t: - value: [0, 0, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0, 0] dtype: float16 ``` ## 💻 Usage Use with llama3 chat template as common. Here are GGUF quants for use with llama.cpp & wrappers as e.g. ollama: [cstr/llama3.1-8b-spaetzle-v90-GGUF](https://huggingface.co./cstr/llama3.1-8b-spaetzle-v90-GGUF)