PhilipQuirke
/

VerifiedArithmetic

Model card Files Files and versions Community

PhilipQuirke commited on 27 days ago

Commit

248a83c

verified ·

1 Parent(s): 92810c3

Update README.md

Browse files

Files changed (1) hide show

README.md +24 -44

README.md CHANGED Viewed

@@ -2,48 +2,28 @@
 license: apache-2.0
 ---
-Contains ".pth" and ".json" files for Transformer models that answers n-digit addition and/or subtractions questions (e.g. 123450-345670=-0123230).
-The associated Colabs support models doing addition, subtraction or both (aka "mixed"), n_digits >= 4, n_layers = 1 .. 4, n_heads >= 3.
-An untrained mixed model can be initialised with (aka re-use) a previously trained addition model.
-Accuracy of "Six9s" means model predicts 99.9999% of questions correctly, so it has at most one prediction failure per million. "Five9s" means 99.999% etc
-The available models are:
-# Addition models
-- add_d5_l1_h3_t15K_s372001.pth  AddAccuracy=Two9s. Inaccurate as only has one layer. Can predict S0, S1 and S2 complexity questions.
-- add_d5_l2_h3_t15K_s372001.pth  AddAccuracy=Six9s. AvgFinalLoss=1.6e-08
-- add_d6_l2_h3_t15K_s372001.pth  AddAccuracy=Six9s. AvgFinalLoss=1.7e-08. MAIN FOCUS
-- add_d6_l2_h3_t20K_s173289.pth  AddAccuracy=Six9s. AvgFinalLoss=1.5e-08
-- add_d6_l2_h3_t20K_s572091.pth  AddAccuracy=Six9s. AvgFinalLoss=7e-09
-- add_d5_l2_h3_t40K_s372001.pth  AddAccuracy=Six9s. AvgFinalLoss=2e-09. Fewer nodes
-- add_d6_l2_h3_t40K_s372001.pth  AddAccuracy=Six9s. AvgFinalLoss 2e-09. Fewer nodes
-- add_d10_l2_h3_t40K_s572091.pth AddAccuracy=Six9s. AvgFinalLoss=8e-09. (1/M fail: 0000000555+0000000445=+00000001000 ModelAnswer: +00000000900)
-# Subtraction models
-- sub_d6_l2_h3_t30K_s372001.pth  SubAccuracy=Six9s. AvgFinalLoss=5.8e-06
-- sub_d10_l2_h3_t75K_s173289.pth  SubAccuracy=Two9s. (6672/M fails) AvgFinalLoss=0.002002022
-# Mixed (addition and subtraction) models
-- mix_d6_l3_h4_t40K_s372001.pth  Add/SubAccuracy=Six9s/Six9s. AvgFinalLoss=5e-09. (1/M fail: 463687+166096=+0629783 ModelAnswer: +0639783)
-- mix_d10_l3_h4_t75K_s173289.pth  Add/SubAccuracy=Five9s/Two9s. AvgFinalLoss=1.125e-06 (2/M fail: 3301956441+6198944455=+09500900896 ModelAnswer: +09500800896) (295/M fail: 8531063649-0531031548=+08000032101 ModelAnswer: +07900032101)
-# insert-mode-1 Mixed models initialised with addition model
-- ins1_mix_d6_l3_h4_t40K_s372001.pth  Add/SubAccuracy=Six9s/Six9s. AvgFinalLoss=8e-09. MAIN FOCUS
-- ins1_mix_d6_l3_h4_t40K_s173289.pth  Add/SubAccuracy=Five9s/Five9s. AvgFinalLoss=1.4e-08. (3/M fails e.g. 850038+159060=+1009098 ModelAnswer: +0009098) (2 fails e.g. 77285-477285=+0100000 Q: ModelAnswer: +0000000).
-- ins1_mix_d6_l3_h4_t50K_s572091.pth  Add/SubAccuracy=Six9s/Five9s. AvgFinalLoss=2.9e-08. (4/M fails e.g. 986887-286887=+0700000 ModelAnswer: +0600000)
-- ins1_mix_d6_l3_h3_t40K_s572091.pth  Add/SubAccuracy=Six9s/Five9s. AvgFinalLoss=1.7e-08. (3/M fails e.g. 072074-272074=-0200000 ModelAnswer: +0200000)
-- ins1_mix_d10_l3_h3_t50K_s572091.pth  Add/SubAccuracy=Five9s/Five9s. AvgFinalLoss 6.3e-7  (6/M fails e.g. 5068283822+4931712829=+09999996651 ModelAnswer: +19099996651) (7/M fails e.g. 3761900218-0761808615=+03000091603 ModelAnswer: +02000091603)
-- ins1_mix_d6_l2_h3_t40K_s572091.pth  Add/SubAccuracy=Six9s/Five9s. AvgLoss = 2.4e-08 (5/M fails e.g. 565000-364538=+0200462 ModelAnswer: +0100462)
-- ins1_mix_d6_l3_h3_t80K_s572091.pth  Add/SubAccuracy=Six9s/Five9s. Fewer nodes?
-# insert-mode-2 Mixed model initialised with addition model. Reset useful heads every 100 epochs.
-- ins2_mix_d6_l4_h4_t40K_s372001.pth  Add/SubAccuracy=Five9s/Five9s. AvgFinalLoss=1.7e-08. (3/M fails e.g. 530757+460849=+0991606 ModelAnswer: +0091606) (8 fails e.g. 261926-161857=+0100069 ModelAnswer: +0000069)
-# insert-mode-3 Mixed model initialized with addition model. Reset useful heads & MLPs every 100 epochs.
-- ins3_mix_d6_l4_h3_t40K_s372001.pth  Add/SubAccuracy=Four9s/Two9s. AvgFinalLoss=3.0e-04. (17/M fails e.g. 273257+056745=+0330002 ModelAnswer: +0320002) (3120 fails e,g. 09075-212133=-0003058 ModelAnswer: +0003058)
-These files are generated by the Colabs:
-- https://github.com/PhilipQuirke/transformer-maths/blob/main/assets/VerifiedArithmeticTrain.ipynb trains the model outputing a ".pt" file of model weights and a "_train.json" file of training losses over epochs
-- https://github.com/PhilipQuirke/transformer-maths/blob/main/assets/VerifiedArithmeticAnalyse.ipynb analyses the trained model. It outputs "_behaviour.json" of observed behaviours of nodes. It outputs "_algorithm.json" of each node's algorithmic purposes.

 license: apache-2.0
 ---
+This respository contains ~45 folders.
+Each folder contains a transformer model that can predict addition questions, subtraction questions or both.
+The folder name (e.g. sub_d6_l2_h3_t20K_s173289) contains:
+- "add", "sub", or "mix": Shows the types of questions the model can predict.
+- "d5" to "d20"" How many digits the model handles e.g. a d5 sub model can predict the answer in 123450-345670=-0123230
+- "l1", "l2" or "l3": The number of layers in the model
+- "h3" or "h4": The number of attention heads in the model
+- "t15K" to "t85K" etc: The number of batches the model was trained on
+- "s372001" etc: The random seed used in model training
+Some folder names also contain:
+- "ins1": Before training the model was initialized with a smaller, accurate addition model
+- "ins2": As per ins1, but the inserted, useful attention heads were not allowed to change
+- "ins3": As per ins2, but the inserted MLP layers were also not allowed to change
+Each folder contains:
+- model.pth: The transformer model as described above
+- training_loss.json: Data gathered during model training. Used to plot "loss over training batches" graphs
+- behaviors.json: Data gathered about the behavior of the model by direct inspection.
+- features.json: Data gathered about hypothesised algorithm features via experimentation.
+The first 2 files were created by the https://github.com/PhilipQuirke/quanta_maths/blob/main/notebooks/VerifiedArithmeticTrain.ipynb notebook.
+The last 2 files were created by the https://github.com/PhilipQuirke/quanta_maths/blob/main/notebooks/VerifiedArithmeticAnalyse.ipynb notebook.