--- license: llama3.1 language: - en library_name: transformers tags: - mergekit - merge base_model: - meta-llama/Meta-Llama-3.1-70B-Instruct - turboderp/Cat-Llama-3-70B-instruct - Nexusflow/Athene-70B --- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/649dc85249ae3a68334adcc6/KxaiZ7rDKkYlix99O9j5H.png) **Cathallama** ===================================== Awesome model, my new daily driver. Edit: I am seeing a lot of token generations pointing to unknown unicode addresses that didn't show up during testing for this model, so I have stopped using it and I am working on a new version. **Notable Performance** * 9% overall success rate increase on MMLU-PRO over LLaMA 3.1 70b at Q4_0 * Strong performance in MMLU-PRO categories overall * Great performance during manual testing **Creation workflow** ===================== **Models merged** * meta-llama/Meta-Llama-3.1-70B-Instruct * turboderp/Cat-Llama-3-70B-instruct * Nexusflow/Athene-70B ``` flowchart TD A[Nexusflow_Athene] -->|Merge with| B[Meta-Llama-3.1] C[turboderp_Cat] -->|Merge with| D[Meta-Llama-3.1] B -->| | E[Merge] D -->| | E[Merge] E[Merge] -->|Result| F[Cathallama] ``` ![image/png](https://cdn-uploads.huggingface.co/production/uploads/649dc85249ae3a68334adcc6/bBcB194tAtsZjPUnI1pDQ.png) **Testing** ===================== **Hyperparameters** --------------- * **Temperature**: 0.0 for automated, 0.9 for manual * **Penalize repeat sequence**: 1.05 * **Consider N tokens for penalize**: 256 * **Penalize repetition of newlines** * **Top-K sampling**: 40 * **Top-P sampling**: 0.95 * **Min-P sampling**: 0.05 **LLaMAcpp Version** ------------------ * b3527-2-g2d5dd7bb * -fa -ngl -1 -ctk f16 --no-mmap **Tested Files** ------------------ * Cathallama-70B.Q4_0.gguf * Nexusflow_Athene-70B.Q4_0.gguf * turboderp_Cat-Llama-3-70B-instruct.Q4_0.gguf * Meta-Llama-3.1-70B-Instruct.Q4_0.gguf **Tests** -------------- **Manual testing** | Category | Test Case | Cathallama-70B.Q4_0.gguf | Nexusflow_Athene-70B.Q4_0.gguf | turboderp_Cat-Llama-3-70B-instruct.Q4_0.gguf | Meta-Llama-3.1-70B-Instruct.Q4_0.gguf | | --- | --- | --- | --- | --- | --- | | **Common Sense** | Ball on cup | OK | KO | KO | OK | | | Big duck small horse | KO | OK | KO | OK | | | Killers | OK | OK | KO | OK | | | Strawberry r's | OK | KO | KO | KO | | | 9.11 or 9.9 bigger | KO | OK | OK | KO | | | Dragon or lens | KO | KO | KO | KO | | | Shirts | OK | OK | KO | KO | | | Sisters | OK | KO | KO | KO | | | Jane faster | OK | OK | OK | OK | | **Programming** | JSON | OK | OK | OK | OK | | | Python snake game | OK | KO | KO | KO | | **Math** | Door window combination | OK | OK | KO | KO | | **Smoke** | Poem | OK | OK | OK | OK | | | Story | OK | OK | KO | OK | *Note: See [sample_generations.txt](https://huggingface.co./gbueno86/Cathallama-70B/blob/main/sample_generations.txt) on the main folder of the repo for the raw generations.* **MMLU-PRO** | Model | Success % | | --- | --- | | Cathallama-70B.Q4_0.gguf | **51.0%** | | turboderp_Cat-Llama-3-70B-instruct.Q4_0.gguf | 37.0% | | Nexusflow_Athene-70B.Q4_0.gguf | 41.0% | | Meta-Llama-3.1-70B-Instruct.Q4_0.gguf | 42.0% | | MMLU-PRO category| Cathallama-70B.Q4_0.gguf | Nexusflow_Athene-70B.Q4_0.gguf | turboderp_Cat-Llama-3-70B-instruct.Q4_0.gguf | Meta-Llama-3.1-70B-Instruct.Q4_0.gguf | | --- | --- | --- | --- | --- | | Business | **50.0%** | 45.0% | 20.0% | 40.0% | | Law | **40.0%** | 30.0% | 30.0% | 35.0% | | Psychology | **85.0%** | 80.0% | 70.0% | 75.0% | | Biology | 80.0% | 70.0% | **85.0%** | 80.0% | | Chemistry | **55.0%** | 40.0% | 35.0% | 35.0% | | History | **65.0%** | 60.0% | 55.0% | **65.0%** | | Other | **55.0%** | 50.0% | 45.0% | 50.0% | | Health | **75.0%** | 40.0% | 60.0% | 65.0% | | Economics | **80.0%** | 75.0% | 65.0% | 70.0% | | Math | **45.0%** | 35.0% | 15.0% | 40.0% | | Physics | **50.0%** | 45.0% | 45.0% | 45.0% | | Computer Science | **60.0%** | 55.0% | 55.0% | **60.0%** | | Philosophy | 55.0% | **60.0%** | 45.0% | 50.0% | | Engineering | 35.0% | **40.0%** | 25.0% | 35.0% | *Note: MMLU-PRO Overall tested with 100 questions. Categories testes with 20 questions from each category.* **PubmedQA** Model Name | Success% | | --- | --- | | Cathallama-70B.Q4_0.gguf| 73.00% | | turboderp_Cat-Llama-3-70B-instruct.Q4_0.gguf | **76.00%** | | Nexusflow_Athene-70B.Q4_0.gguf | 67.00% | | Meta-Llama-3.1-70B-Instruct.Q4_0.gguf | 72.00% | **Request** -------------- If you are hiring in the EU or can sponsor a visa, PM me :D PS. Thank you mradermacher for the GGUFs!