ZeroWw
/

microsoft_WizardLM-2-7B-GGUF

Inference Endpoints

Model card Files Files and versions Community

ZeroWw commited on Jun 20, 2024

Commit

53a7f89

·

verified ·

1 Parent(s): a1a5b56

Create README.md

Files changed (1) hide show

README.md +14 -0

README.md ADDED Viewed

	@@ -0,0 +1,14 @@

+---
+license: mit
+language:
+- en
+---
+My own quantizations.
+output and embed tesnors quantized to f16.
+all other tensors quantized to q5_k or q6_k.
+the q8_0 version is pure (all tensors quantized to Q8_0 just for reference)
+Result:
+both f16.q6 and f16.q5 are smaller than q8_0 standard quantization
+and they perform as well as the pure f16.