PhilipQuirke commited on
Commit
248a83c
·
verified ·
1 Parent(s): 92810c3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -44
README.md CHANGED
@@ -2,48 +2,28 @@
2
  license: apache-2.0
3
  ---
4
 
5
- Contains ".pth" and ".json" files for Transformer models that answers n-digit addition and/or subtractions questions (e.g. 123450-345670=-0123230).
6
- The associated Colabs support models doing addition, subtraction or both (aka "mixed"), n_digits >= 4, n_layers = 1 .. 4, n_heads >= 3.
7
- An untrained mixed model can be initialised with (aka re-use) a previously trained addition model.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
 
9
- Accuracy of "Six9s" means model predicts 99.9999% of questions correctly, so it has at most one prediction failure per million. "Five9s" means 99.999% etc
10
-
11
- The available models are:
12
-
13
- # Addition models
14
- - add_d5_l1_h3_t15K_s372001.pth AddAccuracy=Two9s. Inaccurate as only has one layer. Can predict S0, S1 and S2 complexity questions.
15
- - add_d5_l2_h3_t15K_s372001.pth AddAccuracy=Six9s. AvgFinalLoss=1.6e-08
16
- - add_d6_l2_h3_t15K_s372001.pth AddAccuracy=Six9s. AvgFinalLoss=1.7e-08. MAIN FOCUS
17
- - add_d6_l2_h3_t20K_s173289.pth AddAccuracy=Six9s. AvgFinalLoss=1.5e-08
18
- - add_d6_l2_h3_t20K_s572091.pth AddAccuracy=Six9s. AvgFinalLoss=7e-09
19
- - add_d5_l2_h3_t40K_s372001.pth AddAccuracy=Six9s. AvgFinalLoss=2e-09. Fewer nodes
20
- - add_d6_l2_h3_t40K_s372001.pth AddAccuracy=Six9s. AvgFinalLoss 2e-09. Fewer nodes
21
- - add_d10_l2_h3_t40K_s572091.pth AddAccuracy=Six9s. AvgFinalLoss=8e-09. (1/M fail: 0000000555+0000000445=+00000001000 ModelAnswer: +00000000900)
22
-
23
- # Subtraction models
24
- - sub_d6_l2_h3_t30K_s372001.pth SubAccuracy=Six9s. AvgFinalLoss=5.8e-06
25
- - sub_d10_l2_h3_t75K_s173289.pth SubAccuracy=Two9s. (6672/M fails) AvgFinalLoss=0.002002022
26
-
27
- # Mixed (addition and subtraction) models
28
- - mix_d6_l3_h4_t40K_s372001.pth Add/SubAccuracy=Six9s/Six9s. AvgFinalLoss=5e-09. (1/M fail: 463687+166096=+0629783 ModelAnswer: +0639783)
29
- - mix_d10_l3_h4_t75K_s173289.pth Add/SubAccuracy=Five9s/Two9s. AvgFinalLoss=1.125e-06 (2/M fail: 3301956441+6198944455=+09500900896 ModelAnswer: +09500800896) (295/M fail: 8531063649-0531031548=+08000032101 ModelAnswer: +07900032101)
30
-
31
- # insert-mode-1 Mixed models initialised with addition model
32
- - ins1_mix_d6_l3_h4_t40K_s372001.pth Add/SubAccuracy=Six9s/Six9s. AvgFinalLoss=8e-09. MAIN FOCUS
33
- - ins1_mix_d6_l3_h4_t40K_s173289.pth Add/SubAccuracy=Five9s/Five9s. AvgFinalLoss=1.4e-08. (3/M fails e.g. 850038+159060=+1009098 ModelAnswer: +0009098) (2 fails e.g. 77285-477285=+0100000 Q: ModelAnswer: +0000000).
34
- - ins1_mix_d6_l3_h4_t50K_s572091.pth Add/SubAccuracy=Six9s/Five9s. AvgFinalLoss=2.9e-08. (4/M fails e.g. 986887-286887=+0700000 ModelAnswer: +0600000)
35
- - ins1_mix_d6_l3_h3_t40K_s572091.pth Add/SubAccuracy=Six9s/Five9s. AvgFinalLoss=1.7e-08. (3/M fails e.g. 072074-272074=-0200000 ModelAnswer: +0200000)
36
- - ins1_mix_d10_l3_h3_t50K_s572091.pth Add/SubAccuracy=Five9s/Five9s. AvgFinalLoss 6.3e-7 (6/M fails e.g. 5068283822+4931712829=+09999996651 ModelAnswer: +19099996651) (7/M fails e.g. 3761900218-0761808615=+03000091603 ModelAnswer: +02000091603)
37
- - ins1_mix_d6_l2_h3_t40K_s572091.pth Add/SubAccuracy=Six9s/Five9s. AvgLoss = 2.4e-08 (5/M fails e.g. 565000-364538=+0200462 ModelAnswer: +0100462)
38
- - ins1_mix_d6_l3_h3_t80K_s572091.pth Add/SubAccuracy=Six9s/Five9s. Fewer nodes?
39
-
40
- # insert-mode-2 Mixed model initialised with addition model. Reset useful heads every 100 epochs.
41
- - ins2_mix_d6_l4_h4_t40K_s372001.pth Add/SubAccuracy=Five9s/Five9s. AvgFinalLoss=1.7e-08. (3/M fails e.g. 530757+460849=+0991606 ModelAnswer: +0091606) (8 fails e.g. 261926-161857=+0100069 ModelAnswer: +0000069)
42
-
43
- # insert-mode-3 Mixed model initialized with addition model. Reset useful heads & MLPs every 100 epochs.
44
- - ins3_mix_d6_l4_h3_t40K_s372001.pth Add/SubAccuracy=Four9s/Two9s. AvgFinalLoss=3.0e-04. (17/M fails e.g. 273257+056745=+0330002 ModelAnswer: +0320002) (3120 fails e,g. 09075-212133=-0003058 ModelAnswer: +0003058)
45
-
46
-
47
- These files are generated by the Colabs:
48
- - https://github.com/PhilipQuirke/transformer-maths/blob/main/assets/VerifiedArithmeticTrain.ipynb trains the model outputing a ".pt" file of model weights and a "_train.json" file of training losses over epochs
49
- - https://github.com/PhilipQuirke/transformer-maths/blob/main/assets/VerifiedArithmeticAnalyse.ipynb analyses the trained model. It outputs "_behaviour.json" of observed behaviours of nodes. It outputs "_algorithm.json" of each node's algorithmic purposes.
 
2
  license: apache-2.0
3
  ---
4
 
5
+ This respository contains ~45 folders.
6
+ Each folder contains a transformer model that can predict addition questions, subtraction questions or both.
7
+
8
+ The folder name (e.g. sub_d6_l2_h3_t20K_s173289) contains:
9
+ - "add", "sub", or "mix": Shows the types of questions the model can predict.
10
+ - "d5" to "d20"" How many digits the model handles e.g. a d5 sub model can predict the answer in 123450-345670=-0123230
11
+ - "l1", "l2" or "l3": The number of layers in the model
12
+ - "h3" or "h4": The number of attention heads in the model
13
+ - "t15K" to "t85K" etc: The number of batches the model was trained on
14
+ - "s372001" etc: The random seed used in model training
15
+
16
+ Some folder names also contain:
17
+ - "ins1": Before training the model was initialized with a smaller, accurate addition model
18
+ - "ins2": As per ins1, but the inserted, useful attention heads were not allowed to change
19
+ - "ins3": As per ins2, but the inserted MLP layers were also not allowed to change
20
+
21
+ Each folder contains:
22
+ - model.pth: The transformer model as described above
23
+ - training_loss.json: Data gathered during model training. Used to plot "loss over training batches" graphs
24
+ - behaviors.json: Data gathered about the behavior of the model by direct inspection.
25
+ - features.json: Data gathered about hypothesised algorithm features via experimentation.
26
+
27
+ The first 2 files were created by the https://github.com/PhilipQuirke/quanta_maths/blob/main/notebooks/VerifiedArithmeticTrain.ipynb notebook.
28
+ The last 2 files were created by the https://github.com/PhilipQuirke/quanta_maths/blob/main/notebooks/VerifiedArithmeticAnalyse.ipynb notebook.
29