Upload README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,25 @@
|
|
1 |
-
---
|
2 |
license: apache-2.0
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
license: apache-2.0
|
2 |
+
|
3 |
+
a DPO LoRA fine-tuned model with preference dataset
|
4 |
+
|
5 |
+
LoRA Experiment
|
6 |
+
|
7 |
+
RWKV-5.2-3b-World-DPO is merged model with base
|
8 |
+
|
9 |
+
Base Model
|
10 |
+
|
11 |
+
RWKV-5-World-3B-v2-20231113-ctx4096
|
12 |
+
|
13 |
+
Parameters:
|
14 |
+
Lora Rank 8
|
15 |
+
Lora Alpha 16
|
16 |
+
ctx length 4096
|
17 |
+
epoch:19
|
18 |
+
|
19 |
+
|
20 |
+
Dataset
|
21 |
+
Randomly chosed 1000pairs
|
22 |
+
https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized
|
23 |
+
|
24 |
+
|
25 |
+
|