Update README.md
Browse files
README.md
CHANGED
@@ -1,4 +1,5 @@
|
|
1 |
---
|
|
|
2 |
widget:
|
3 |
- example_title: Example interaction
|
4 |
text: During photosynthesis in green plants
|
@@ -12,4 +13,28 @@ library_name: transformers
|
|
12 |
|
13 |
# Model Card for gpt2-124M-edu-fineweb-10B
|
14 |
|
15 |
-
A 124M parameter GPT2 model trained with 10B fineweb-edu dataset using [https://github.com/karpathy/llm.c](https://github.com/karpathy/llm.c)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
datasets: HuggingFaceFW/fineweb-edu
|
3 |
widget:
|
4 |
- example_title: Example interaction
|
5 |
text: During photosynthesis in green plants
|
|
|
13 |
|
14 |
# Model Card for gpt2-124M-edu-fineweb-10B
|
15 |
|
16 |
+
A 124M parameter GPT2 model trained with the 10B fineweb-edu dataset using [https://github.com/karpathy/llm.c](https://github.com/karpathy/llm.c)
|
17 |
+
|
18 |
+
Training took 20 hours on a single 4090 GPU, giving the following graphs:
|
19 |
+
|
20 |
+
![gpt2-124M-edu-fineweb-10B](https://huggingface.co/rhysjones/gpt2-124M-edu-fineweb-10B/resolve/main/graph.png)
|
21 |
+
|
22 |
+
The training parameters where:
|
23 |
+
```
|
24 |
+
./train_gpt2cu \
|
25 |
+
-i "dev/data/edu_fineweb10B/edu_fineweb_train_*.bin" \
|
26 |
+
-j "dev/data/edu_fineweb10B/edu_fineweb_val_*.bin" \
|
27 |
+
-o log124M \
|
28 |
+
-e "d12" \
|
29 |
+
-b 56 -t 1024 \
|
30 |
+
-d 458752 \
|
31 |
+
-r 1 \
|
32 |
+
-z 1 \
|
33 |
+
-c 0.1 \
|
34 |
+
-l 0.002 \
|
35 |
+
-q 0.0 \
|
36 |
+
-u 700 \
|
37 |
+
-n 5000 \
|
38 |
+
-v 250 -s 20000 \
|
39 |
+
-h 1
|
40 |
+
```
|