metadata
license: apache-2.0
datasets:
- euclaise/SuperMC
- euclaise/prm800k_preferences
Expirements in preference learning.
Trained with PRO on SuperMC and PRM800K for 3 epochs, using my supertrainer2000 framework.
This is an expiremental model.
Benchmarks coming soon.