license: apache-2.0 a DPO LoRA fine-tuned model with preference dataset LoRA Experiment RWKV-5.2-3b-World-DPO is merged model with base Base Model RWKV-5-World-3B-v2-20231113-ctx4096 Parameters: Lora Rank 8 Lora Alpha 16 ctx length 4096 epoch:19 Dataset Randomly chosed 1000pairs https://huggingface.co./datasets/HuggingFaceH4/ultrafeedback_binarized trainer https://github.com/OpenMOSE/RWKV-LM-RLHF-DPO-LoRA