File size: 421 Bytes
966b717 b735e7f 8ae0030 b735e7f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
license: apache-2.0
a DPO LoRA fine-tuned model with preference dataset
LoRA Experiment
RWKV-5.2-3b-World-DPO is merged model with base
Base Model
RWKV-5-World-3B-v2-20231113-ctx4096
Parameters:
Lora Rank 8
Lora Alpha 16
ctx length 4096
epoch:19
Dataset
Randomly chosed 1000pairs
https://huggingface.co./datasets/HuggingFaceH4/ultrafeedback_binarized
trainer
https://github.com/OpenMOSE/RWKV-LM-RLHF-DPO-LoRA
|