license: apache-2.0
datasets:
- Norquinal/claude_multiround_chat_30k
- OpenLeecher/Teatime
RWKV 7B World 128k for novel writing
We proudly announce this is the world first 128k context model based on RWKV architecture today, 2023-08-10.
With RWKV world tokenizer,multi-langs have 1:1 tokenization ratio ,one word to one token. (https://github.com/BlinkDL/ChatRWKV/blob/2a13ddecd81f8fd615b6da3a8f1091a594689e30/tokenizer/rwkv_tokenizer.py#L163)
How to train infinte context model?
This model trained with instructions datasets and chinese web novel and tradition wuxia, more trainning details would be updated.
Tested to summary 85k tokens to 5 keypoints ,can find conversation files in example folders ,more cases are coming.
Full finetuned using this repo to train 128k context model , 4*A800 40hours with 1.3B tokens. https://github.com/SynthiaDL/TrainChatGalRWKV/blob/main/train_world.sh
How to Test?
Using RWKV Runner https://github.com/josStorer/RWKV-Runner to test this model, only need 16G vram to run fp16 or 8G vram fp16i8, use temp 0.1-0.2 topp 0.7 for more precise answer ,temp between 1-2.x is more creatively.