euclaise's picture
Adding Evaluation Results (#2)
be65142
metadata
license: cc0-1.0
datasets:
  - JeanKaddour/minipile
language:
  - en
library_name: transformers

GPT-NeoX trained on MiniPile, for a baseline to compare my MANN models against. Uses NeelNanda/gpt-neox-tokenizer-digits for tokenization.

The exact model configuration is as follows:

cfg = GPTNeoXConfig(
    vocab_size = len(tokenizer),
    hidden_size = 768,
    intermediate_size = 768*4,
    num_hidden_layers = 12,
    num_attention_heads = 12,
    tie_word_embeddings = True,
    hidden_act = "gelu_new",
    tokenizer = "NeelNanda/gpt-neox-tokenizer-digits"
)

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 25.1
ARC (25-shot) 20.73
HellaSwag (10-shot) 27.03
MMLU (5-shot) 25.31
TruthfulQA (0-shot) 49.19
Winogrande (5-shot) 52.33
GSM8K (5-shot) 0.0
DROP (3-shot) 1.09