Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
@@ -16,11 +16,14 @@ They run at about 3-6 t/sec on CPU only using llama.cpp
|
|
16 |
And obviously faster on computers with potent GPUs
|
17 |
|
18 |
ALL the models were quantized in this way:
|
|
|
|
|
|
|
19 |
quantize.exe --allow-requantize --output-tensor-type f16 --token-embedding-type f16 model.f16.gguf model.f16.q5.gguf q5_k
|
20 |
quantize.exe --allow-requantize --output-tensor-type f16 --token-embedding-type f16 model.f16.gguf model.f16.q6.gguf q6_k
|
21 |
quantize.exe --allow-requantize --output-tensor-type f16 --token-embedding-type f16 model.f16.gguf model.f16.q6.gguf q8_0
|
22 |
quantize.exe --allow-requantize --pure model.f16.gguf model.f16.q8_p.gguf q8_0
|
23 |
-
|
24 |
|
25 |
* [ZeroWw/Mistral-Nemo-Instruct-2407-GGUF](https://huggingface.co/ZeroWw/Mistral-Nemo-Instruct-2407-GGUF)
|
26 |
* [ZeroWw/L3-8B-Celeste-V1.2-GGUF](https://huggingface.co/ZeroWw/L3-8B-Celeste-V1.2-GGUF)
|
|
|
16 |
And obviously faster on computers with potent GPUs
|
17 |
|
18 |
ALL the models were quantized in this way:
|
19 |
+
```
|
20 |
+
python llama.cpp/convert_hf_to_gguf.py --outtype f16 model --outfile model.f16.gguf
|
21 |
+
|
22 |
quantize.exe --allow-requantize --output-tensor-type f16 --token-embedding-type f16 model.f16.gguf model.f16.q5.gguf q5_k
|
23 |
quantize.exe --allow-requantize --output-tensor-type f16 --token-embedding-type f16 model.f16.gguf model.f16.q6.gguf q6_k
|
24 |
quantize.exe --allow-requantize --output-tensor-type f16 --token-embedding-type f16 model.f16.gguf model.f16.q6.gguf q8_0
|
25 |
quantize.exe --allow-requantize --pure model.f16.gguf model.f16.q8_p.gguf q8_0
|
26 |
+
```
|
27 |
|
28 |
* [ZeroWw/Mistral-Nemo-Instruct-2407-GGUF](https://huggingface.co/ZeroWw/Mistral-Nemo-Instruct-2407-GGUF)
|
29 |
* [ZeroWw/L3-8B-Celeste-V1.2-GGUF](https://huggingface.co/ZeroWw/L3-8B-Celeste-V1.2-GGUF)
|