Crataco
/

RWKV-4-World-Series-GGML

Text Generation

Model card Files Files and versions Community

Crataco commited on Oct 1, 2023

Commit

2cc8958

•

1 Parent(s): ac67118

Update README.md

Files changed (1) hide show

README.md +25 -0

README.md CHANGED Viewed

@@ -34,8 +34,33 @@ inference: false
 **Description:**
 - The motivation behind these quantizations was that latestissue's quants were missing the 0.1B and 0.4B models. The rest of the models can be found here: [latestissue/rwkv-4-world-ggml-quantized](https://huggingface.co/latestissue/rwkv-4-world-ggml-quantized)
 **Notes:**
 - rwkv.cpp [[0df970a]](https://github.com/saharNooby/rwkv.cpp/tree/0df970a6adddd4b938795f92e660766d1e2c1c1f) was used for conversion and quantization. First they were converted to f16 ggml files, then quantized.
 The original models can be found [here](https://huggingface.co/BlinkDL/rwkv-4-world), and the original model card can be found below.

 **Description:**
 - The motivation behind these quantizations was that latestissue's quants were missing the 0.1B and 0.4B models. The rest of the models can be found here: [latestissue/rwkv-4-world-ggml-quantized](https://huggingface.co/latestissue/rwkv-4-world-ggml-quantized)
+# RAM USAGE
+Model | Starting RAM usage (KoboldCpp)
+:--:|:--:
+RWKV-4-World-0.1B.q4_0.bin | 289.3 MiB
+RWKV-4-World-0.1B.q4_1.bin | 294.7 MiB
+RWKV-4-World-0.1B.q5_0.bin | 300.2 MiB
+RWKV-4-World-0.1B.q5_1.bin | 305.7 MiB
+RWKV-4-World-0.1B.q8_0.bin | 333.1 MiB
+RWKV-4-World-0.1B.f16.bin | 415.3 MiB
+|
+RWKV-4-World-0.4B.q4_0.bin | 484.1 MiB
+RWKV-4-World-0.4B.q4_1.bin | 503.7 MiB
+RWKV-4-World-0.4B.q5_0.bin | 523.1 MiB
+RWKV-4-World-0.4B.q5_1.bin | 542.7 MiB
+RWKV-4-World-0.4B.q8_0.bin | 640.2 MiB
+RWKV-4-World-0.4B.f16.bin | 932.7 MiB
+|
+RWKV-4-World-1.5B.q4_0.bin | 1.2 GiB
+RWKV-4-World-1.5B.q4_1.bin | 1.3 GiB
+RWKV-4-World-1.5B.q5_0.bin | 1.4 GiB
+RWKV-4-World-1.5B.q5_1.bin | 1.5 GiB
+RWKV-4-World-1.5B.q8_0.bin | 1.9 GiB
+RWKV-4-World-1.5B.f16.bin | 3.0 GiB
 **Notes:**
 - rwkv.cpp [[0df970a]](https://github.com/saharNooby/rwkv.cpp/tree/0df970a6adddd4b938795f92e660766d1e2c1c1f) was used for conversion and quantization. First they were converted to f16 ggml files, then quantized.
+- KoboldCpp [[bc841ec]](https://github.com/LostRuins/koboldcpp/tree/bc841ec30232036a1e231c0b057689abc3aa00cf) was used to test the model.
 The original models can be found [here](https://huggingface.co/BlinkDL/rwkv-4-world), and the original model card can be found below.