rhysjones commited on
Commit
7ecae3b
·
verified ·
1 Parent(s): 1ade5d1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -1
README.md CHANGED
@@ -1,4 +1,5 @@
1
  ---
 
2
  widget:
3
  - example_title: Example interaction
4
  text: During photosynthesis in green plants
@@ -12,4 +13,28 @@ library_name: transformers
12
 
13
  # Model Card for gpt2-124M-edu-fineweb-10B
14
 
15
- A 124M parameter GPT2 model trained with 10B fineweb-edu dataset using [https://github.com/karpathy/llm.c](https://github.com/karpathy/llm.c)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ datasets: HuggingFaceFW/fineweb-edu
3
  widget:
4
  - example_title: Example interaction
5
  text: During photosynthesis in green plants
 
13
 
14
  # Model Card for gpt2-124M-edu-fineweb-10B
15
 
16
+ A 124M parameter GPT2 model trained with the 10B fineweb-edu dataset using [https://github.com/karpathy/llm.c](https://github.com/karpathy/llm.c)
17
+
18
+ Training took 20 hours on a single 4090 GPU, giving the following graphs:
19
+
20
+ ![gpt2-124M-edu-fineweb-10B](https://huggingface.co/rhysjones/gpt2-124M-edu-fineweb-10B/resolve/main/graph.png)
21
+
22
+ The training parameters where:
23
+ ```
24
+ ./train_gpt2cu \
25
+ -i "dev/data/edu_fineweb10B/edu_fineweb_train_*.bin" \
26
+ -j "dev/data/edu_fineweb10B/edu_fineweb_val_*.bin" \
27
+ -o log124M \
28
+ -e "d12" \
29
+ -b 56 -t 1024 \
30
+ -d 458752 \
31
+ -r 1 \
32
+ -z 1 \
33
+ -c 0.1 \
34
+ -l 0.002 \
35
+ -q 0.0 \
36
+ -u 700 \
37
+ -n 5000 \
38
+ -v 250 -s 20000 \
39
+ -h 1
40
+ ```