ZeroWw commited on
Commit
7c2d3a7
·
verified ·
1 Parent(s): b92636d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -1
README.md CHANGED
@@ -16,11 +16,14 @@ They run at about 3-6 t/sec on CPU only using llama.cpp
16
  And obviously faster on computers with potent GPUs
17
 
18
  ALL the models were quantized in this way:
 
 
 
19
  quantize.exe --allow-requantize --output-tensor-type f16 --token-embedding-type f16 model.f16.gguf model.f16.q5.gguf q5_k
20
  quantize.exe --allow-requantize --output-tensor-type f16 --token-embedding-type f16 model.f16.gguf model.f16.q6.gguf q6_k
21
  quantize.exe --allow-requantize --output-tensor-type f16 --token-embedding-type f16 model.f16.gguf model.f16.q6.gguf q8_0
22
  quantize.exe --allow-requantize --pure model.f16.gguf model.f16.q8_p.gguf q8_0
23
- and there is also a pure f16 and a pure q8 in every directory.
24
 
25
  * [ZeroWw/Mistral-Nemo-Instruct-2407-GGUF](https://huggingface.co/ZeroWw/Mistral-Nemo-Instruct-2407-GGUF)
26
  * [ZeroWw/L3-8B-Celeste-V1.2-GGUF](https://huggingface.co/ZeroWw/L3-8B-Celeste-V1.2-GGUF)
 
16
  And obviously faster on computers with potent GPUs
17
 
18
  ALL the models were quantized in this way:
19
+ ```
20
+ python llama.cpp/convert_hf_to_gguf.py --outtype f16 model --outfile model.f16.gguf
21
+
22
  quantize.exe --allow-requantize --output-tensor-type f16 --token-embedding-type f16 model.f16.gguf model.f16.q5.gguf q5_k
23
  quantize.exe --allow-requantize --output-tensor-type f16 --token-embedding-type f16 model.f16.gguf model.f16.q6.gguf q6_k
24
  quantize.exe --allow-requantize --output-tensor-type f16 --token-embedding-type f16 model.f16.gguf model.f16.q6.gguf q8_0
25
  quantize.exe --allow-requantize --pure model.f16.gguf model.f16.q8_p.gguf q8_0
26
+ ```
27
 
28
  * [ZeroWw/Mistral-Nemo-Instruct-2407-GGUF](https://huggingface.co/ZeroWw/Mistral-Nemo-Instruct-2407-GGUF)
29
  * [ZeroWw/L3-8B-Celeste-V1.2-GGUF](https://huggingface.co/ZeroWw/L3-8B-Celeste-V1.2-GGUF)