Text Generation
rwkv
causal-lm
ggml
Crataco commited on
Commit
2cc8958
1 Parent(s): ac67118

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +25 -0
README.md CHANGED
@@ -34,8 +34,33 @@ inference: false
34
  **Description:**
35
  - The motivation behind these quantizations was that latestissue's quants were missing the 0.1B and 0.4B models. The rest of the models can be found here: [latestissue/rwkv-4-world-ggml-quantized](https://huggingface.co/latestissue/rwkv-4-world-ggml-quantized)
36
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
  **Notes:**
38
  - rwkv.cpp [[0df970a]](https://github.com/saharNooby/rwkv.cpp/tree/0df970a6adddd4b938795f92e660766d1e2c1c1f) was used for conversion and quantization. First they were converted to f16 ggml files, then quantized.
 
39
 
40
  The original models can be found [here](https://huggingface.co/BlinkDL/rwkv-4-world), and the original model card can be found below.
41
 
 
34
  **Description:**
35
  - The motivation behind these quantizations was that latestissue's quants were missing the 0.1B and 0.4B models. The rest of the models can be found here: [latestissue/rwkv-4-world-ggml-quantized](https://huggingface.co/latestissue/rwkv-4-world-ggml-quantized)
36
 
37
+ # RAM USAGE
38
+ Model | Starting RAM usage (KoboldCpp)
39
+ :--:|:--:
40
+ RWKV-4-World-0.1B.q4_0.bin | 289.3 MiB
41
+ RWKV-4-World-0.1B.q4_1.bin | 294.7 MiB
42
+ RWKV-4-World-0.1B.q5_0.bin | 300.2 MiB
43
+ RWKV-4-World-0.1B.q5_1.bin | 305.7 MiB
44
+ RWKV-4-World-0.1B.q8_0.bin | 333.1 MiB
45
+ RWKV-4-World-0.1B.f16.bin | 415.3 MiB
46
+ |
47
+ RWKV-4-World-0.4B.q4_0.bin | 484.1 MiB
48
+ RWKV-4-World-0.4B.q4_1.bin | 503.7 MiB
49
+ RWKV-4-World-0.4B.q5_0.bin | 523.1 MiB
50
+ RWKV-4-World-0.4B.q5_1.bin | 542.7 MiB
51
+ RWKV-4-World-0.4B.q8_0.bin | 640.2 MiB
52
+ RWKV-4-World-0.4B.f16.bin | 932.7 MiB
53
+ |
54
+ RWKV-4-World-1.5B.q4_0.bin | 1.2 GiB
55
+ RWKV-4-World-1.5B.q4_1.bin | 1.3 GiB
56
+ RWKV-4-World-1.5B.q5_0.bin | 1.4 GiB
57
+ RWKV-4-World-1.5B.q5_1.bin | 1.5 GiB
58
+ RWKV-4-World-1.5B.q8_0.bin | 1.9 GiB
59
+ RWKV-4-World-1.5B.f16.bin | 3.0 GiB
60
+
61
  **Notes:**
62
  - rwkv.cpp [[0df970a]](https://github.com/saharNooby/rwkv.cpp/tree/0df970a6adddd4b938795f92e660766d1e2c1c1f) was used for conversion and quantization. First they were converted to f16 ggml files, then quantized.
63
+ - KoboldCpp [[bc841ec]](https://github.com/LostRuins/koboldcpp/tree/bc841ec30232036a1e231c0b057689abc3aa00cf) was used to test the model.
64
 
65
  The original models can be found [here](https://huggingface.co/BlinkDL/rwkv-4-world), and the original model card can be found below.
66