File size: 929 Bytes

f503f5d
 
 
 
 
 
 
8d57e97
2c3c0b0
f499a00
 
dbbcb88
2af004d
 
 
9fac2b5
 
 
 
 
 
 
ac382e2
029d608
4cdddf5
ac382e2
 
 
 
 
 
f6905cf
ac382e2

---
license: apache-2.0
datasets:
- euclaise/SuperMC
- euclaise/prm800k_preferences
---

Expirements in large-scale small-scale preference learning.

**This one was a failure, it benchmarks horribly, despite responding okay to trivia questions in testing**

falcon-rw-1b trained with PRO (preference ranking optimization, see https://arxiv.org/abs/2306.17492) on SuperMC and PRM800K (only stage 1) for 3 epochs, using my supertrainer2000 framework.

This is an expiremental model.

Benchmarks coming soon.

Hyperparameters:
- AdamW, weight decay of 0.01, otherwise default hyperparams
- Maximum LR of 1e-5
- Cosine schedule with a warmup of 5400 steps
- Batch size of 4 (2 real x 2 accumulated)
- Maximum of 5 epochs, early stopping (visual observation), stopped after 3
- Gradient clipping norm value of 1.0
- PRO beta of 4

Training prompt format:

```
### Query
[insert instruction here]

### Answer
[insert response here]
```