Update README.md
Browse files
README.md
CHANGED
@@ -5,10 +5,17 @@ datasets:
|
|
5 |
- euclaise/prm800k_preferences
|
6 |
---
|
7 |
|
8 |
-
Expirements in preference learning.
|
9 |
|
10 |
-
Trained with PRO on SuperMC and PRM800K for 3 epochs, using my supertrainer2000 framework.
|
11 |
|
12 |
This is an expiremental model.
|
13 |
|
14 |
-
Benchmarks coming soon.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
5 |
- euclaise/prm800k_preferences
|
6 |
---
|
7 |
|
8 |
+
Expirements in large-scale preference learning.
|
9 |
|
10 |
+
Trained with PRO (preference ranking optimization, see https://arxiv.org/abs/2306.17492) on SuperMC and PRM800K for 3 epochs, using my supertrainer2000 framework.
|
11 |
|
12 |
This is an expiremental model.
|
13 |
|
14 |
+
Benchmarks coming soon.
|
15 |
+
|
16 |
+
Hyperparameters:
|
17 |
+
- AdamW, weight decay of 0.01, otherwise default hyperparams
|
18 |
+
- Maximum LR of 1e-5
|
19 |
+
- Cosine schedule with a warmup of 5400 steps
|
20 |
+
- Batch size of 4 (2 real x 2 accumulated)
|
21 |
+
- Maximum of 5 epochs, early stopping (visual observation), stopped after 3
|