File size: 421 Bytes
966b717
b735e7f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8ae0030
 
 
b735e7f
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
license: apache-2.0

a DPO LoRA fine-tuned model with preference dataset

LoRA Experiment

RWKV-5.2-3b-World-DPO is merged model with base

Base Model

RWKV-5-World-3B-v2-20231113-ctx4096

Parameters:
Lora Rank 8
Lora Alpha 16
ctx length 4096
epoch:19


Dataset 
Randomly chosed 1000pairs
https://huggingface.co./datasets/HuggingFaceH4/ultrafeedback_binarized

trainer
https://github.com/OpenMOSE/RWKV-LM-RLHF-DPO-LoRA