This is the collection of synthetic preference data and trained reward models in "Cross-lingual Transfer of Reward Models in Multilingual Alignment".
IQWiki-XFACT
community
AI & ML interests
None defined yet.
Recent Activity
View all activity
Collections
1
models
2
datasets
28
iqwiki-kor/Qwen2.5-7B-distill-SFT-DPO-beta0.01-Iter1-v2-Self-seed192-w-score
Viewer
•
Updated
•
60.9k
•
9
iqwiki-kor/Qwen2.5-7B-distill-SFT-DPO-beta0.01-Iter1-v2-Self-seed42
Viewer
•
Updated
•
60.9k
•
10
iqwiki-kor/Qwen2.5-7B-distill-SFT-DPO-beta0.01-Iter1-v2-Self-seed192
Viewer
•
Updated
•
60.9k
•
7
iqwiki-kor/wDPO-it-final1
Viewer
•
Updated
•
10k
•
34
iqwiki-kor/wDPO-ko
Viewer
•
Updated
•
10k
•
37
iqwiki-kor/uf-g4o_translated-Qwen2.5-7B-distill-SFT-DPO-beta0.1-seed8049
Viewer
•
Updated
•
56.8k
•
36
iqwiki-kor/khs-Qwen2.5-7B-distill-SFT-DPO-beta0.1-seed6247
Viewer
•
Updated
•
10.2k
•
33
iqwiki-kor/khs-Qwen2.5-7B-distill-SFT-DPO-beta0.1-seed1903
Viewer
•
Updated
•
10.2k
•
31
iqwiki-kor/Qwen2.5-7B-distill-SFT-DPO-beta0.1-op-samp4-seed6247
Viewer
•
Updated
•
10.2k
•
32
iqwiki-kor/Q2.5-7B-dist-op-pref-seed2938
Viewer
•
Updated
•
56.8k
•
35