blockblockblock's picture
Upload folder using huggingface_hub
3bf6afe verified
|
raw
history blame
5.12 kB
metadata
license: llama3.1
language:
  - en
library_name: transformers
tags:
  - mergekit
  - merge
base_model:
  - meta-llama/Meta-Llama-3.1-70B-Instruct
  - turboderp/Cat-Llama-3-70B-instruct
  - Nexusflow/Athene-70B

image/png

Cathallama

Awesome model, my new daily driver.

Notable Performance

  • 9% overall success rate increase on MMLU-PRO over LLaMA 3.1 70b
  • Strong performance in MMLU-PRO categories overall
  • Great performance during manual testing

Creation workflow

Models merged

  • meta-llama/Meta-Llama-3.1-70B-Instruct
  • turboderp/Cat-Llama-3-70B-instruct
  • Nexusflow/Athene-70B
flowchart TD
    A[Nexusflow_Athene] -->|Merge with| B[Meta-Llama-3.1]
    C[turboderp_Cat] -->|Merge with| D[Meta-Llama-3.1]
    B -->| | E[Merge]
    D -->| | E[Merge]
    E[Merge] -->|Result| F[Cathallama]

image/png

Testing

Hyperparameters

  • Temperature: 0.0 for automated, 0.9 for manual
  • Penalize repeat sequence: 1.05
  • Consider N tokens for penalize: 256
  • Penalize repetition of newlines
  • Top-K sampling: 40
  • Top-P sampling: 0.95
  • Min-P sampling: 0.05

LLaMAcpp Version

  • b3527-2-g2d5dd7bb
  • -fa -ngl -1 -ctk f16 --no-mmap

Tested Files

  • Cathallama-70B.Q4_0.gguf
  • Nexusflow_Athene-70B.Q4_0.gguf
  • turboderp_Cat-Llama-3-70B-instruct.Q4_0.gguf
  • Meta-Llama-3.1-70B-Instruct.Q4_0.gguf

Tests

Manual testing

Category Test Case Cathallama-70B.Q4_0.gguf Nexusflow_Athene-70B.Q4_0.gguf turboderp_Cat-Llama-3-70B-instruct.Q4_0.gguf Meta-Llama-3.1-70B-Instruct.Q4_0.gguf
Common Sense Ball on cup OK KO KO OK
Big duck small horse KO OK KO OK
Killers OK OK KO OK
Strawberry r's OK KO KO KO
9.11 or 9.9 bigger KO OK OK KO
Dragon or lens KO KO KO KO
Shirts OK OK KO KO
Sisters OK KO KO KO
Jane faster OK OK OK OK
Programming JSON OK OK OK OK
Python snake game OK KO KO KO
Math Door window combination OK OK KO KO
Smoke Poem OK OK OK OK
Story OK OK KO OK

Note: See sample_generations.txt on the main folder of the repo for the raw generations.

MMLU-PRO

Model Success %
Cathallama-70B 51.0%
turboderp_Cat-Llama-3-70B-instruct 37.0%
Nexusflow_Athene-70B 41.0%
Meta-Llama-3.1-70B-Instruct 42.0%
MMLU-PRO category Cathallama-70B.Q4_0.gguf Nexusflow_Athene-70B.Q4_0.gguf turboderp_Cat-Llama-3-70B-instruct.Q4_0.gguf Meta-Llama-3.1-70B-Instruct.Q4_0.gguf
Business 50.0% 45.0% 20.0% 40.0%
Law 40.0% 30.0% 30.0% 35.0%
Psychology 85.0% 80.0% 70.0% 75.0%
Biology 80.0% 70.0% 85.0% 80.0%
Chemistry 55.0% 40.0% 35.0% 35.0%
History 65.0% 60.0% 55.0% 65.0%
Other 55.0% 50.0% 45.0% 50.0%
Health 75.0% 40.0% 60.0% 65.0%
Economics 80.0% 75.0% 65.0% 70.0%
Math 45.0% 35.0% 15.0% 40.0%
Physics 50.0% 45.0% 45.0% 45.0%
Computer Science 60.0% 55.0% 55.0% 60.0%
Philosophy 55.0% 60.0% 45.0% 50.0%
Engineering 35.0% 40.0% 25.0% 35.0%

Note: MMLU-PRO Overall tested with 100 questions. Categories testes with 20 questions from each category.

PubmedQA

Model Name Success%
Cathallama-70B.Q4_0.gguf 73.00%
turboderp_Cat-Llama-3-70B-instruct.Q4_0.gguf 76.00%
Nexusflow_Athene-70B.Q4_0.gguf 67.00%
Meta-Llama-3.1-70B-Instruct.Q4_0.gguf 72.00%

Request

If you are hiring in the EU or can sponsor a visa, PM me :D

PS. Thank you mradermacher for the GGUFs!