eachadea
/

ggml-vicuna-7b-1.1

Model card Files Files and versions Community

eachadea commited on Apr 19, 2023

Commit

5db752d

•

1 Parent(s): a5a45bd

Update README.md

Files changed (1) hide show

README.md +6 -1

README.md CHANGED Viewed

@@ -9,8 +9,13 @@ inference: true
 - Based on version 1.1
 - Used PR "More accurate Q4_0 and Q4_1 quantizations #896" (should be closer in quality to unquantized)
 - Uncensored variant is available, but it's based on version 1.0
 - 13B version of this can be found here: https://huggingface.co/eachadea/ggml-vicuna-13b-1.1
-- Choosing between q4_0 and q4_1, the logic of higher number \= better doesn't apply. If you're confused, stick with q4_0.
 <br>
 <br>

 - Based on version 1.1
 - Used PR "More accurate Q4_0 and Q4_1 quantizations #896" (should be closer in quality to unquantized)
 - Uncensored variant is available, but it's based on version 1.0
+- For q4_2, "Q4_2 ARM #1046" was used. Will update regularly if new changes are made.
+- **Choosing between q4_0, q4_1, and q4_2:**
+  - 4_0 is the fastest. The quality is the poorest.
+  - 4_1 is a lot slower. The quality is noticeably better.
+  - 4_2 is almost as fast as 4_0 and about as good as 4_1 **on Apple Silicon**. On Intel/AMD it's hardly better or faster than 4_1.
 - 13B version of this can be found here: https://huggingface.co/eachadea/ggml-vicuna-13b-1.1
 <br>
 <br>