rhysjones
/

gpt2-124M-edu-fineweb-10B

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

rhysjones commited on Jun 13, 2024

Commit

6e644f3

·

verified ·

1 Parent(s): 5ce02cc

Update with float16 evals

Files changed (1) hide show

README.md +7 -7

README.md CHANGED Viewed

@@ -15,7 +15,7 @@ library_name: transformers
 A 124M parameter GPT2 model trained with the 10B fineweb-edu dataset using [https://github.com/karpathy/llm.c](https://github.com/karpathy/llm.c)
-Training took 20 hours on a single 4090 GPU, giving the following graphs:
 ![gpt2-124M-edu-fineweb-10B](https://huggingface.co/rhysjones/gpt2-124M-edu-fineweb-10B/resolve/main/graph.png)
@@ -47,10 +47,10 @@ The model has had no further finetuning.
 Evals using [Eleuther AI Harness](https://github.com/EleutherAI/lm-evaluation-harness/tree/b281b0921b636bc36ad05c0b0b0763bd6dd43463) gives:
 | Eval Test | Score |
 | --------- | ----- |
-| arc_challenge (25 shot) | 24.83 |
-| gsm8k (5 shot) | 0.00 |
-| hellaswag (10 shot) | 32.52 |
-| mmlu (5 shot) | 25.95 |
 | truthfulqa (0 shot) | 42.45 |
-| winogrande (5 shot) | 53.35 |
-| **Overall Score** | **29.85** |

 A 124M parameter GPT2 model trained with the 10B fineweb-edu dataset using [https://github.com/karpathy/llm.c](https://github.com/karpathy/llm.c)
+Training took 20 hours on a single 4090 GPU (limited to 350W), giving the following graphs:
 ![gpt2-124M-edu-fineweb-10B](https://huggingface.co/rhysjones/gpt2-124M-edu-fineweb-10B/resolve/main/graph.png)
 Evals using [Eleuther AI Harness](https://github.com/EleutherAI/lm-evaluation-harness/tree/b281b0921b636bc36ad05c0b0b0763bd6dd43463) gives:
 | Eval Test | Score |
 | --------- | ----- |
+| arc_challenge (25 shot) | 24.49 |
+| gsm8k (5 shot) | 0.08 |
+| hellaswag (10 shot) | 32.64 |
+| mmlu (5 shot) | 26.06 |
 | truthfulqa (0 shot) | 42.45 |
+| winogrande (5 shot) | 52.17 |
+| **Overall Score** | **29.65** |