Update README.md
Browse files
README.md
CHANGED
@@ -8,7 +8,7 @@ tags:
|
|
8 |
- rwkv
|
9 |
license: apache-2.0
|
10 |
datasets:
|
11 |
-
-
|
12 |
|
13 |
---
|
14 |
|
@@ -22,9 +22,12 @@ RWKV-4 3B is a L32-D2560 causal language model trained on the Pile. See https://
|
|
22 |
|
23 |
At this moment you have to use my Github code (https://github.com/BlinkDL/RWKV-LM) to run it.
|
24 |
|
25 |
-
ctx_len =
|
26 |
-
|
27 |
-
|
|
|
|
|
|
|
28 |
|
29 |
Final checkpoint: RWKV-4-Pile-3B-20221008-8023.pth : Trained on the Pile for 331B tokens.
|
30 |
* Pile loss 1.9469
|
@@ -32,3 +35,4 @@ Final checkpoint: RWKV-4-Pile-3B-20221008-8023.pth : Trained on the Pile for 331
|
|
32 |
* PIQA acc 73.72%
|
33 |
* SC2016 acc 70.28%
|
34 |
* Hellaswag acc_norm 59.63%
|
|
|
|
8 |
- rwkv
|
9 |
license: apache-2.0
|
10 |
datasets:
|
11 |
+
- the_pile
|
12 |
|
13 |
---
|
14 |
|
|
|
22 |
|
23 |
At this moment you have to use my Github code (https://github.com/BlinkDL/RWKV-LM) to run it.
|
24 |
|
25 |
+
New checkpoint: RWKV-4-Pile-3B-20221110-ctx4096.pth : Fine-tuned to ctx_len = 4096
|
26 |
+
* LAMBADA ppl 5.25, acc 63.96%
|
27 |
+
* PIQA acc 74.16%
|
28 |
+
* SC2016 acc 70.71%
|
29 |
+
* Hellaswag acc_norm 59.89%
|
30 |
+
ctx_len = 4096 n_layer = 32 n_embd = 2560
|
31 |
|
32 |
Final checkpoint: RWKV-4-Pile-3B-20221008-8023.pth : Trained on the Pile for 331B tokens.
|
33 |
* Pile loss 1.9469
|
|
|
35 |
* PIQA acc 73.72%
|
36 |
* SC2016 acc 70.28%
|
37 |
* Hellaswag acc_norm 59.63%
|
38 |
+
ctx_len = 1024 n_layer = 32 n_embd = 2560
|