File size: 2,447 Bytes
b6a5c87 5d58c65 ef91075 5d58c65 b6a5c87 5d58c65 8411e7f ef91075 e224013 b6a5c87 5d58c65 88a9769 d9a21d0 d8368ac f34b018 88a9769 5d58c65 5f12317 8411e7f 9b6d58f 11fe6c3 8411e7f 81a1a9b 361e909 5d58c65 4696f7d ce7047c 4720116 050fd2b 1a6472b 54a2831 1a6472b 54a2831 1a6472b bdeecd1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 |
---
language:
- en
- zh
- fr
- es
- de
- pt
- ru
- it
- ja
- ko
- vi
- ar
tags:
- pytorch
- text-generation
- causal-lm
- rwkv
license: apache-2.0
datasets:
- cerebras/SlimPajama-627B
- EleutherAI/pile
- bigcode/starcoderdata
- oscar-corpus/OSCAR-2301
---
# RWKV-5 World
Use rwkv pip package 0.8.22+ for RWKV-5 inference: https://pypi.org/project/rwkv/ (pipeline = PIPELINE(model, "rwkv_vocab_v20230424") for rwkv-world models)
Online 7B Demo: https://huggingface.co./spaces/BlinkDL/RWKV-Gradio-2
Online 1.5B Demo: https://huggingface.co./spaces/BlinkDL/RWKV-Gradio-1
GUI: https://github.com/josStorer/RWKV-Runner (see Releases)
Convert to HF formet: https://github.com/BBuf/RWKV-World-HF-Tokenizer
For developer: https://github.com/BlinkDL/ChatRWKV/blob/main/API_DEMO_CHAT.py
https://github.com/BlinkDL/ChatRWKV/blob/main/RWKV_v5_demo.py
How it works: https://twitter.com/BlinkDL_AI/status/1685230712247795713
https://www.rwkv.com/
## Model Description
RWKV-5 trained on 100+ world languages (70% English, 15% multilang, 15% code).
World = Some_Pile + Some_SlimPajama + Some_StarCoder + Some_OSCAR + All_Wikipedia + All_ChatGPT_Data_I_can_find
RWKV-5 training: set --my_testing "r2r4" in latest RWKV-LM v4neo: https://github.com/BlinkDL/RWKV-LM
World v1 = 0.59T tokens
World v2 = 1.12T tokens
Imagine what happens when we use more data :)
Recommended fine-tuning format (use \n for newlines):
```
User: xxxxxxxxxxxxxxx
Assistant: xxxxxxxxxxxxxxx
xxxxxxxxxxxxxxx
xxxxxxxxxxxxxxx
User: xxxxxxxxxxxxxxx
xxxxxxxxxxxxxxx
Assistant: xxxxxxxxxxxxxxx
xxxxxxxxxxxxxxx
xxxxxxxxxxxxxxx
xxxxxxxxxxxxxxx
```
A good chat prompt (better replace \n\n in xxx to \n, such that there will be no newlines in xxx):
```
User: hi
Assistant: Hi. I am your assistant and I will provide expert full response in full details. Please feel free to ask any question and I will always answer it.
User: xxx
Assistant:
```
QA prompt (better replace \n\n in xxx to \n, such that there will be no newlines in xxx):
```
Question: xxx
Answer:
```
and
```
Instruction: xxx
Input: xxx
Response:
```
!!! There should not be any space after your final ":" or you will upset the tokenizer and see non-English reponse !!!
!!! There should not be any space after your final ":" or you will upset the tokenizer and see non-English reponse !!!
!!! There should not be any space after your final ":" or you will upset the tokenizer and see non-English reponse !!!
|