rhysjones
/

gpt2-124M-edu-fineweb-10B

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

rhysjones commited on Jun 7, 2024

Commit

7ecae3b

·

verified ·

1 Parent(s): 1ade5d1

Update README.md

Files changed (1) hide show

README.md +26 -1

README.md CHANGED Viewed

@@ -1,4 +1,5 @@
 ---
 widget:
 - example_title: Example interaction
   text: During photosynthesis in green plants
@@ -12,4 +13,28 @@ library_name: transformers
 # Model Card for gpt2-124M-edu-fineweb-10B
-A 124M parameter GPT2 model trained with 10B fineweb-edu dataset using [https://github.com/karpathy/llm.c](https://github.com/karpathy/llm.c)

 ---
+datasets: HuggingFaceFW/fineweb-edu
 widget:
 - example_title: Example interaction
   text: During photosynthesis in green plants
 # Model Card for gpt2-124M-edu-fineweb-10B
+A 124M parameter GPT2 model trained with the 10B fineweb-edu dataset using [https://github.com/karpathy/llm.c](https://github.com/karpathy/llm.c)
+Training took 20 hours on a single 4090 GPU, giving the following graphs:
+![gpt2-124M-edu-fineweb-10B](https://huggingface.co/rhysjones/gpt2-124M-edu-fineweb-10B/resolve/main/graph.png)
+The training parameters where:
+```
+./train_gpt2cu \
+    -i "dev/data/edu_fineweb10B/edu_fineweb_train_*.bin" \
+    -j "dev/data/edu_fineweb10B/edu_fineweb_val_*.bin" \
+    -o log124M \
+    -e "d12" \
+    -b 56 -t 1024 \
+    -d 458752 \
+    -r 1 \
+    -z 1 \
+    -c 0.1 \
+    -l 0.002 \
+    -q 0.0 \
+    -u 700 \
+    -n 5000 \
+    -v 250 -s 20000 \
+    -h 1
+```