PhilipQuirke
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -2,48 +2,28 @@
|
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
|
5 |
-
|
6 |
-
|
7 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
8 |
|
9 |
-
Accuracy of "Six9s" means model predicts 99.9999% of questions correctly, so it has at most one prediction failure per million. "Five9s" means 99.999% etc
|
10 |
-
|
11 |
-
The available models are:
|
12 |
-
|
13 |
-
# Addition models
|
14 |
-
- add_d5_l1_h3_t15K_s372001.pth AddAccuracy=Two9s. Inaccurate as only has one layer. Can predict S0, S1 and S2 complexity questions.
|
15 |
-
- add_d5_l2_h3_t15K_s372001.pth AddAccuracy=Six9s. AvgFinalLoss=1.6e-08
|
16 |
-
- add_d6_l2_h3_t15K_s372001.pth AddAccuracy=Six9s. AvgFinalLoss=1.7e-08. MAIN FOCUS
|
17 |
-
- add_d6_l2_h3_t20K_s173289.pth AddAccuracy=Six9s. AvgFinalLoss=1.5e-08
|
18 |
-
- add_d6_l2_h3_t20K_s572091.pth AddAccuracy=Six9s. AvgFinalLoss=7e-09
|
19 |
-
- add_d5_l2_h3_t40K_s372001.pth AddAccuracy=Six9s. AvgFinalLoss=2e-09. Fewer nodes
|
20 |
-
- add_d6_l2_h3_t40K_s372001.pth AddAccuracy=Six9s. AvgFinalLoss 2e-09. Fewer nodes
|
21 |
-
- add_d10_l2_h3_t40K_s572091.pth AddAccuracy=Six9s. AvgFinalLoss=8e-09. (1/M fail: 0000000555+0000000445=+00000001000 ModelAnswer: +00000000900)
|
22 |
-
|
23 |
-
# Subtraction models
|
24 |
-
- sub_d6_l2_h3_t30K_s372001.pth SubAccuracy=Six9s. AvgFinalLoss=5.8e-06
|
25 |
-
- sub_d10_l2_h3_t75K_s173289.pth SubAccuracy=Two9s. (6672/M fails) AvgFinalLoss=0.002002022
|
26 |
-
|
27 |
-
# Mixed (addition and subtraction) models
|
28 |
-
- mix_d6_l3_h4_t40K_s372001.pth Add/SubAccuracy=Six9s/Six9s. AvgFinalLoss=5e-09. (1/M fail: 463687+166096=+0629783 ModelAnswer: +0639783)
|
29 |
-
- mix_d10_l3_h4_t75K_s173289.pth Add/SubAccuracy=Five9s/Two9s. AvgFinalLoss=1.125e-06 (2/M fail: 3301956441+6198944455=+09500900896 ModelAnswer: +09500800896) (295/M fail: 8531063649-0531031548=+08000032101 ModelAnswer: +07900032101)
|
30 |
-
|
31 |
-
# insert-mode-1 Mixed models initialised with addition model
|
32 |
-
- ins1_mix_d6_l3_h4_t40K_s372001.pth Add/SubAccuracy=Six9s/Six9s. AvgFinalLoss=8e-09. MAIN FOCUS
|
33 |
-
- ins1_mix_d6_l3_h4_t40K_s173289.pth Add/SubAccuracy=Five9s/Five9s. AvgFinalLoss=1.4e-08. (3/M fails e.g. 850038+159060=+1009098 ModelAnswer: +0009098) (2 fails e.g. 77285-477285=+0100000 Q: ModelAnswer: +0000000).
|
34 |
-
- ins1_mix_d6_l3_h4_t50K_s572091.pth Add/SubAccuracy=Six9s/Five9s. AvgFinalLoss=2.9e-08. (4/M fails e.g. 986887-286887=+0700000 ModelAnswer: +0600000)
|
35 |
-
- ins1_mix_d6_l3_h3_t40K_s572091.pth Add/SubAccuracy=Six9s/Five9s. AvgFinalLoss=1.7e-08. (3/M fails e.g. 072074-272074=-0200000 ModelAnswer: +0200000)
|
36 |
-
- ins1_mix_d10_l3_h3_t50K_s572091.pth Add/SubAccuracy=Five9s/Five9s. AvgFinalLoss 6.3e-7 (6/M fails e.g. 5068283822+4931712829=+09999996651 ModelAnswer: +19099996651) (7/M fails e.g. 3761900218-0761808615=+03000091603 ModelAnswer: +02000091603)
|
37 |
-
- ins1_mix_d6_l2_h3_t40K_s572091.pth Add/SubAccuracy=Six9s/Five9s. AvgLoss = 2.4e-08 (5/M fails e.g. 565000-364538=+0200462 ModelAnswer: +0100462)
|
38 |
-
- ins1_mix_d6_l3_h3_t80K_s572091.pth Add/SubAccuracy=Six9s/Five9s. Fewer nodes?
|
39 |
-
|
40 |
-
# insert-mode-2 Mixed model initialised with addition model. Reset useful heads every 100 epochs.
|
41 |
-
- ins2_mix_d6_l4_h4_t40K_s372001.pth Add/SubAccuracy=Five9s/Five9s. AvgFinalLoss=1.7e-08. (3/M fails e.g. 530757+460849=+0991606 ModelAnswer: +0091606) (8 fails e.g. 261926-161857=+0100069 ModelAnswer: +0000069)
|
42 |
-
|
43 |
-
# insert-mode-3 Mixed model initialized with addition model. Reset useful heads & MLPs every 100 epochs.
|
44 |
-
- ins3_mix_d6_l4_h3_t40K_s372001.pth Add/SubAccuracy=Four9s/Two9s. AvgFinalLoss=3.0e-04. (17/M fails e.g. 273257+056745=+0330002 ModelAnswer: +0320002) (3120 fails e,g. 09075-212133=-0003058 ModelAnswer: +0003058)
|
45 |
-
|
46 |
-
|
47 |
-
These files are generated by the Colabs:
|
48 |
-
- https://github.com/PhilipQuirke/transformer-maths/blob/main/assets/VerifiedArithmeticTrain.ipynb trains the model outputing a ".pt" file of model weights and a "_train.json" file of training losses over epochs
|
49 |
-
- https://github.com/PhilipQuirke/transformer-maths/blob/main/assets/VerifiedArithmeticAnalyse.ipynb analyses the trained model. It outputs "_behaviour.json" of observed behaviours of nodes. It outputs "_algorithm.json" of each node's algorithmic purposes.
|
|
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
|
5 |
+
This respository contains ~45 folders.
|
6 |
+
Each folder contains a transformer model that can predict addition questions, subtraction questions or both.
|
7 |
+
|
8 |
+
The folder name (e.g. sub_d6_l2_h3_t20K_s173289) contains:
|
9 |
+
- "add", "sub", or "mix": Shows the types of questions the model can predict.
|
10 |
+
- "d5" to "d20"" How many digits the model handles e.g. a d5 sub model can predict the answer in 123450-345670=-0123230
|
11 |
+
- "l1", "l2" or "l3": The number of layers in the model
|
12 |
+
- "h3" or "h4": The number of attention heads in the model
|
13 |
+
- "t15K" to "t85K" etc: The number of batches the model was trained on
|
14 |
+
- "s372001" etc: The random seed used in model training
|
15 |
+
|
16 |
+
Some folder names also contain:
|
17 |
+
- "ins1": Before training the model was initialized with a smaller, accurate addition model
|
18 |
+
- "ins2": As per ins1, but the inserted, useful attention heads were not allowed to change
|
19 |
+
- "ins3": As per ins2, but the inserted MLP layers were also not allowed to change
|
20 |
+
|
21 |
+
Each folder contains:
|
22 |
+
- model.pth: The transformer model as described above
|
23 |
+
- training_loss.json: Data gathered during model training. Used to plot "loss over training batches" graphs
|
24 |
+
- behaviors.json: Data gathered about the behavior of the model by direct inspection.
|
25 |
+
- features.json: Data gathered about hypothesised algorithm features via experimentation.
|
26 |
+
|
27 |
+
The first 2 files were created by the https://github.com/PhilipQuirke/quanta_maths/blob/main/notebooks/VerifiedArithmeticTrain.ipynb notebook.
|
28 |
+
The last 2 files were created by the https://github.com/PhilipQuirke/quanta_maths/blob/main/notebooks/VerifiedArithmeticAnalyse.ipynb notebook.
|
29 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|