File size: 929 Bytes
f503f5d 8d57e97 2c3c0b0 f499a00 dbbcb88 2af004d 9fac2b5 ac382e2 029d608 4cdddf5 ac382e2 f6905cf ac382e2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
---
license: apache-2.0
datasets:
- euclaise/SuperMC
- euclaise/prm800k_preferences
---
Expirements in large-scale small-scale preference learning.
**This one was a failure, it benchmarks horribly, despite responding okay to trivia questions in testing**
falcon-rw-1b trained with PRO (preference ranking optimization, see https://arxiv.org/abs/2306.17492) on SuperMC and PRM800K (only stage 1) for 3 epochs, using my supertrainer2000 framework.
This is an expiremental model.
Benchmarks coming soon.
Hyperparameters:
- AdamW, weight decay of 0.01, otherwise default hyperparams
- Maximum LR of 1e-5
- Cosine schedule with a warmup of 5400 steps
- Batch size of 4 (2 real x 2 accumulated)
- Maximum of 5 epochs, early stopping (visual observation), stopped after 3
- Gradient clipping norm value of 1.0
- PRO beta of 4
Training prompt format:
```
### Query
[insert instruction here]
### Answer
[insert response here]
``` |