euclaise commited on
Commit
9fac2b5
·
1 Parent(s): 2af004d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -3
README.md CHANGED
@@ -5,10 +5,17 @@ datasets:
5
  - euclaise/prm800k_preferences
6
  ---
7
 
8
- Expirements in preference learning.
9
 
10
- Trained with PRO on SuperMC and PRM800K for 3 epochs, using my supertrainer2000 framework.
11
 
12
  This is an expiremental model.
13
 
14
- Benchmarks coming soon.
 
 
 
 
 
 
 
 
5
  - euclaise/prm800k_preferences
6
  ---
7
 
8
+ Expirements in large-scale preference learning.
9
 
10
+ Trained with PRO (preference ranking optimization, see https://arxiv.org/abs/2306.17492) on SuperMC and PRM800K for 3 epochs, using my supertrainer2000 framework.
11
 
12
  This is an expiremental model.
13
 
14
+ Benchmarks coming soon.
15
+
16
+ Hyperparameters:
17
+ - AdamW, weight decay of 0.01, otherwise default hyperparams
18
+ - Maximum LR of 1e-5
19
+ - Cosine schedule with a warmup of 5400 steps
20
+ - Batch size of 4 (2 real x 2 accumulated)
21
+ - Maximum of 5 epochs, early stopping (visual observation), stopped after 3