eachadea
/

ggml-vicuna-7b-1.1

Model card Files Files and versions Community

eachadea commited on May 12, 2023

Commit

e47bbbb

•

1 Parent(s): 54d3b22

Update README.md

Files changed (1) hide show

README.md +10 -5

README.md CHANGED Viewed

@@ -1,11 +1,19 @@
 ---
-license: apache-2.0
 inference: true
 ---
 ### Links
-- [13B version of this model](https://huggingface.co/eachadea/ggml-vicuna-13b-1.1)
 - [Set up with gpt4all-chat (one-click setup, available in in-app download menu)](https://gpt4all.io/index.html)
 - [Set up with llama.cpp](https://github.com/ggerganov/llama.cpp)
 - [Set up with oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp-models.md)
@@ -38,9 +46,6 @@ Model | F16 | Q4_0 | Q4_1 | Q4_2 | Q4_3 | Q5_0 | Q5_1 | Q8_0
 q5_1 or 5_0 are the latest and most performant implementations. The former is slightly more accurate at the cost of a bit of performance. Most users should use one of the two.
 If you encounter any kind of compatibility issues, you might want to try the older q4_x
-**NOTE: q4_3 is EOL - avoid using.**
 ---
 # Vicuna Model Card

 ---
 inference: true
 ---
+### NOTE:
+The PR [#1405](https://github.com/ggerganov/llama.cpp/pull/1405) brought breaking changes - none of the old models work with the latest build of llama.cpp.
+Pre-PR #1405 files have been marked as old but remain accessible for those who need them.
+Additionally, `q4_3` and `q4_2` have been completely axed in favor of their 5-bit counterparts (q5_1 and q5_0, respectively).
+New files inference up to 10% faster without any quality reduction.
 ### Links
+- [7B version of this model](https://huggingface.co/eachadea/ggml-vicuna-7b-1.1)
 - [Set up with gpt4all-chat (one-click setup, available in in-app download menu)](https://gpt4all.io/index.html)
 - [Set up with llama.cpp](https://github.com/ggerganov/llama.cpp)
 - [Set up with oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp-models.md)
 q5_1 or 5_0 are the latest and most performant implementations. The former is slightly more accurate at the cost of a bit of performance. Most users should use one of the two.
 If you encounter any kind of compatibility issues, you might want to try the older q4_x
 ---
 # Vicuna Model Card