Update README.md
Browse files
README.md
CHANGED
@@ -1,11 +1,19 @@
|
|
1 |
---
|
2 |
-
license: apache-2.0
|
3 |
inference: true
|
4 |
---
|
5 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
6 |
|
7 |
### Links
|
8 |
-
- [
|
9 |
- [Set up with gpt4all-chat (one-click setup, available in in-app download menu)](https://gpt4all.io/index.html)
|
10 |
- [Set up with llama.cpp](https://github.com/ggerganov/llama.cpp)
|
11 |
- [Set up with oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp-models.md)
|
@@ -38,9 +46,6 @@ Model | F16 | Q4_0 | Q4_1 | Q4_2 | Q4_3 | Q5_0 | Q5_1 | Q8_0
|
|
38 |
q5_1 or 5_0 are the latest and most performant implementations. The former is slightly more accurate at the cost of a bit of performance. Most users should use one of the two.
|
39 |
If you encounter any kind of compatibility issues, you might want to try the older q4_x
|
40 |
|
41 |
-
**NOTE: q4_3 is EOL - avoid using.**
|
42 |
-
|
43 |
-
|
44 |
---
|
45 |
|
46 |
# Vicuna Model Card
|
|
|
1 |
---
|
|
|
2 |
inference: true
|
3 |
---
|
4 |
|
5 |
+
### NOTE:
|
6 |
+
The PR [#1405](https://github.com/ggerganov/llama.cpp/pull/1405) brought breaking changes - none of the old models work with the latest build of llama.cpp.
|
7 |
+
|
8 |
+
Pre-PR #1405 files have been marked as old but remain accessible for those who need them.
|
9 |
+
|
10 |
+
Additionally, `q4_3` and `q4_2` have been completely axed in favor of their 5-bit counterparts (q5_1 and q5_0, respectively).
|
11 |
+
|
12 |
+
New files inference up to 10% faster without any quality reduction.
|
13 |
+
|
14 |
|
15 |
### Links
|
16 |
+
- [7B version of this model](https://huggingface.co/eachadea/ggml-vicuna-7b-1.1)
|
17 |
- [Set up with gpt4all-chat (one-click setup, available in in-app download menu)](https://gpt4all.io/index.html)
|
18 |
- [Set up with llama.cpp](https://github.com/ggerganov/llama.cpp)
|
19 |
- [Set up with oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp-models.md)
|
|
|
46 |
q5_1 or 5_0 are the latest and most performant implementations. The former is slightly more accurate at the cost of a bit of performance. Most users should use one of the two.
|
47 |
If you encounter any kind of compatibility issues, you might want to try the older q4_x
|
48 |
|
|
|
|
|
|
|
49 |
---
|
50 |
|
51 |
# Vicuna Model Card
|