TheBloke commited on
Commit
6182c72
1 Parent(s): 0afb1ed

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +84 -0
README.md CHANGED
@@ -1,3 +1,87 @@
1
  ---
2
  license: other
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: other
3
+ inference: false
4
  ---
5
+ # Vicuna 7B GPTQ 4-bit 128g
6
+
7
+ This repository contains the [Vicuna 7B model](https://huggingface.co/lmsys/vicuna-7b-delta-v0) quantised using [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa).
8
+
9
+ The original Vicuna 7B repository contains deltas rather than weights. Rather than merging the deltas myself, I used the model files from https://huggingface.co/helloollel/vicuna-7b.
10
+
11
+ ## Provided files
12
+
13
+ Two model files are provided. You don't need both, choose the one you prefer.
14
+
15
+ Details of the files provided:
16
+ * `vicuna-7B-GPTQ-4bit-128g.pt`
17
+ * pt format file, created with the latest [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa) code.
18
+ * Command to create:
19
+ * `python3 llama.py vicuna-7B c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save vicuna-7B-GPTQ-4bit-128g.pt`
20
+ * `vicuna-7B-GPTQ-4bit-128g.safetensors`
21
+ * newer `safetensors` format, with improved file security, created with the latest [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa) code.
22
+ * Command to create:
23
+ * `python3 llama.py vicuna-7B c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save_safetensors vicuna-7B-GPTQ-4bit-128g.safetensors`
24
+
25
+ ## How to run these GPTQ models in `text-generation-webui`
26
+
27
+ These model files were created with the latest GPTQ code, and require that the latest GPTQ-for-LLaMa is used inside the UI.
28
+
29
+ Here are the commands I used to clone the Triton branch of GPTQ-for-LLaMa, clone text-generation-webui, and install GPTQ into the UI:
30
+ ```
31
+ git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa
32
+ git clone https://github.com/oobabooga/text-generation-webui
33
+ mkdir -p text-generation-webui/repositories
34
+ ln -s GPTQ-for-LLaMa text-generation-webui/repositories/GPTQ-for-LLaMa
35
+ ```
36
+
37
+ Then install this model into `text-generation-webui/models` and launch the UI as follows:
38
+ ```
39
+ cd text-generation-webui
40
+ python server.py --model vicuna-7B-GPTQ-4bit-128g --wbits 4 --groupsize 128 # add any other command line args you want
41
+ ```
42
+
43
+ The above commands assume you have installed all dependencies for GPTQ-for-LLaMa and text-generation-webui. Please see their respective repositories for further information.
44
+
45
+ If you are on Windows, or cannot use the Triton branch of GPTQ for any other reason, you can instead use the CUDA branch:
46
+ ```
47
+ git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa -b cuda
48
+ cd GPTQ-for-LLaMa
49
+ python setup_cuda.py install
50
+ ```
51
+ Then link that into `text-generation-webui/repositories` as described above.
52
+
53
+ # Vicuna Model Card
54
+
55
+ ## Model details
56
+
57
+ **Model type:**
58
+ Vicuna is an open-source chatbot trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT.
59
+ It is an auto-regressive language model, based on the transformer architecture.
60
+
61
+ **Model date:**
62
+ Vicuna was trained between March 2023 and April 2023.
63
+
64
+ **Organizations developing the model:**
65
+ The Vicuna team with members from UC Berkeley, CMU, Stanford, and UC San Diego.
66
+
67
+ **Paper or resources for more information:**
68
+ https://vicuna.lmsys.org/
69
+
70
+ **License:**
71
+ Apache License 2.0
72
+
73
+ **Where to send questions or comments about the model:**
74
+ https://github.com/lm-sys/FastChat/issues
75
+
76
+ ## Intended use
77
+ **Primary intended uses:**
78
+ The primary use of Vicuna is research on large language models and chatbots.
79
+
80
+ **Primary intended users:**
81
+ The primary intended users of the model are researchers and hobbyists in natural language processing, machine learning, and artificial intelligence.
82
+
83
+ ## Training dataset
84
+ 70K conversations collected from ShareGPT.com.
85
+
86
+ ## Evaluation dataset
87
+ A preliminary evaluation of the model quality is conducted by creating a set of 80 diverse questions and utilizing GPT-4 to judge the model outputs. See https://vicuna.lmsys.org/ for more details.