TheBloke commited on
Commit
bc7966b
1 Parent(s): e071d24

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -6
README.md CHANGED
@@ -1,6 +1,8 @@
1
  ---
2
  datasets:
3
  - gozfarb/ShareGPT_Vicuna_unfiltered
 
 
4
  ---
5
 
6
  # VicUnlocked-30B-LoRA GGML
@@ -11,7 +13,7 @@ The files in this repo are the result of merging the above LoRA with the origina
11
 
12
  ## Repositories available
13
 
14
- * [4-bit, 5-bit and 8-bit GGML models for CPU inference](https://huggingface.co/TheBloke/VicUnlocked-30B-LoRA-GGML).
15
  * [4bit's GPTQ 4-bit model for GPU inference](https://huggingface.co/TheBloke/VicUnlocked-30B-LoRA-GPTQ).
16
  * [float16 HF format model for GPU inference and further conversions](https://huggingface.co/TheBloke/VicUnlocked-30B-LoRA-HF).
17
 
@@ -24,11 +26,11 @@ I have quantised the GGML files in this repo with the latest version. Therefore
24
  ## Provided files
25
  | Name | Quant method | Bits | Size | RAM required | Use case |
26
  | ---- | ---- | ---- | ---- | ---- | ----- |
27
- `VicUnlocked-30B-LoRA.ggml.q4_0.bin` | q4_0 | 4bit | 19GB | 21GB | 4-bit. |
28
- `VicUnlocked-30B-LoRA.ggml.q4_1.bin` | q4_1 | 5bit | 23GB | 25GB | 4-bit. Higher accuracy than q4_0 but not as high as q5_0. However has quicker inference than q5 models. |
29
- `VicUnlocked-30B-LoRA.ggml.q5_0.bin` | q5_0 | 5bit | 21GB | 23GB | 5-bit. Higher accuracy, higher resource usage and slower inference. |
30
- `VicUnlocked-30B-LoRA.ggml.q5_1.bin` | q5_1 | 5bit | 23GB | 25GB | 5-bit. Even higher accuracy, and higher resource usage and slower inference. |
31
- `VicUnlocked-30B-LoRA.ggml.q8_0.bin` | q8_0 | 8bit | 35GB | 37GB | 8-bit. Almost indistinguishable from float16. Huge resource use and slow. Not recommended for normal use. |
32
 
33
  ## How to run in `llama.cpp`
34
 
 
1
  ---
2
  datasets:
3
  - gozfarb/ShareGPT_Vicuna_unfiltered
4
+ license: other
5
+ inference: false
6
  ---
7
 
8
  # VicUnlocked-30B-LoRA GGML
 
13
 
14
  ## Repositories available
15
 
16
+ * [4-bit, 5-bit and 8-bit GGML models for CPU (+CUDA) inference](https://huggingface.co/TheBloke/VicUnlocked-30B-LoRA-GGML).
17
  * [4bit's GPTQ 4-bit model for GPU inference](https://huggingface.co/TheBloke/VicUnlocked-30B-LoRA-GPTQ).
18
  * [float16 HF format model for GPU inference and further conversions](https://huggingface.co/TheBloke/VicUnlocked-30B-LoRA-HF).
19
 
 
26
  ## Provided files
27
  | Name | Quant method | Bits | Size | RAM required | Use case |
28
  | ---- | ---- | ---- | ---- | ---- | ----- |
29
+ `VicUnlocked-30B-LoRA.ggml.q4_0.bin` | q4_0 | 4bit | 20.3GB | 23GB | 4-bit. |
30
+ `VicUnlocked-30B-LoRA.ggml.q4_1.bin` | q4_1 | 5bit | 24.4GB | 27GB | 4-bit. Higher accuracy than q4_0 but not as high as q5_0. However has quicker inference than q5 models. |
31
+ `VicUnlocked-30B-LoRA.ggml.q5_0.bin` | q5_0 | 5bit | 22.4GB | 25GB | 5-bit. Higher accuracy, higher resource usage and slower inference. |
32
+ `VicUnlocked-30B-LoRA.ggml.q5_1.bin` | q5_1 | 5bit | 24.4GB | 27GB | 5-bit. Even higher accuracy, and higher resource usage and slower inference. |
33
+ `VicUnlocked-30B-LoRA.ggml.q8_0.bin` | q8_0 | 8bit | 36.6GB | 39GB | 8-bit. Almost indistinguishable from float16. Huge resource use and slow. Not recommended for normal use. |
34
 
35
  ## How to run in `llama.cpp`
36