Spaces:

RobertSinclair
/

README

Running

ZeroWw commited on Jul 21, 2024

Commit

7c2d3a7

verified ·

1 Parent(s): b92636d

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -16,11 +16,14 @@ They run at about 3-6 t/sec on CPU only using llama.cpp
 And obviously faster on computers with potent GPUs
 ALL the models were quantized in this way:
 quantize.exe --allow-requantize --output-tensor-type f16 --token-embedding-type f16 model.f16.gguf model.f16.q5.gguf q5_k
 quantize.exe --allow-requantize --output-tensor-type f16 --token-embedding-type f16 model.f16.gguf model.f16.q6.gguf q6_k
 quantize.exe --allow-requantize --output-tensor-type f16 --token-embedding-type f16 model.f16.gguf model.f16.q6.gguf q8_0
 quantize.exe --allow-requantize --pure model.f16.gguf model.f16.q8_p.gguf q8_0
-and there is also a pure f16 and a pure q8 in every directory.
 * [ZeroWw/Mistral-Nemo-Instruct-2407-GGUF](https://huggingface.co/ZeroWw/Mistral-Nemo-Instruct-2407-GGUF)
 * [ZeroWw/L3-8B-Celeste-V1.2-GGUF](https://huggingface.co/ZeroWw/L3-8B-Celeste-V1.2-GGUF)

 And obviously faster on computers with potent GPUs
 ALL the models were quantized in this way:
+```
+python llama.cpp/convert_hf_to_gguf.py --outtype f16 model --outfile model.f16.gguf
 quantize.exe --allow-requantize --output-tensor-type f16 --token-embedding-type f16 model.f16.gguf model.f16.q5.gguf q5_k
 quantize.exe --allow-requantize --output-tensor-type f16 --token-embedding-type f16 model.f16.gguf model.f16.q6.gguf q6_k
 quantize.exe --allow-requantize --output-tensor-type f16 --token-embedding-type f16 model.f16.gguf model.f16.q6.gguf q8_0
 quantize.exe --allow-requantize --pure model.f16.gguf model.f16.q8_p.gguf q8_0
+```
 * [ZeroWw/Mistral-Nemo-Instruct-2407-GGUF](https://huggingface.co/ZeroWw/Mistral-Nemo-Instruct-2407-GGUF)
 * [ZeroWw/L3-8B-Celeste-V1.2-GGUF](https://huggingface.co/ZeroWw/L3-8B-Celeste-V1.2-GGUF)