TheBloke
/

vicuna-7B-v0-GPTQ

 ---
 license: other
+inference: false
 ---
+# Vicuna 7B GPTQ 4-bit 128g
+This repository contains the [Vicuna 7B model](https://huggingface.co/lmsys/vicuna-7b-delta-v0) quantised using [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa).
+The original Vicuna 7B repository contains deltas rather than weights. Rather than merging the deltas myself, I used the model files from https://huggingface.co/helloollel/vicuna-7b.
+## Provided files
+Two model files are provided. You don't need both, choose the one you prefer.
+Details of the files provided:
+* `vicuna-7B-GPTQ-4bit-128g.pt`
+  * pt format file, created with the latest [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa) code.
+  * Command to create:
+    * `python3 llama.py vicuna-7B c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save vicuna-7B-GPTQ-4bit-128g.pt`
+* `vicuna-7B-GPTQ-4bit-128g.safetensors`
+  * newer `safetensors` format, with improved file security, created with the latest [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa) code.
+  * Command to create:
+    * `python3 llama.py vicuna-7B c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save_safetensors vicuna-7B-GPTQ-4bit-128g.safetensors`
+## How to run these GPTQ models in `text-generation-webui`
+These model files were created with the latest GPTQ code, and require that the latest GPTQ-for-LLaMa is used inside the UI.
+Here are the commands I used to clone the Triton branch of GPTQ-for-LLaMa, clone text-generation-webui, and install GPTQ into the UI:
+```
+git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa
+git clone https://github.com/oobabooga/text-generation-webui
+mkdir -p text-generation-webui/repositories
+ln -s GPTQ-for-LLaMa text-generation-webui/repositories/GPTQ-for-LLaMa
+```
+Then install this model into `text-generation-webui/models` and launch the UI as follows:
+```
+cd text-generation-webui
+python server.py --model vicuna-7B-GPTQ-4bit-128g --wbits 4 --groupsize 128  # add any other command line args you want
+```
+The above commands assume you have installed all dependencies for GPTQ-for-LLaMa and text-generation-webui. Please see their respective repositories for further information.
+If you are on Windows, or cannot use the Triton branch of GPTQ for any other reason, you can instead use the CUDA branch:
+```
+git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa -b cuda
+cd GPTQ-for-LLaMa
+python setup_cuda.py install
+```
+Then link that into `text-generation-webui/repositories` as described above.
+# Vicuna Model Card
+## Model details
+**Model type:**
+Vicuna is an open-source chatbot trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT.
+It is an auto-regressive language model, based on the transformer architecture.
+**Model date:**
+Vicuna was trained between March 2023 and April 2023.
+**Organizations developing the model:**
+The Vicuna team with members from UC Berkeley, CMU, Stanford, and UC San Diego.
+**Paper or resources for more information:**
+https://vicuna.lmsys.org/
+**License:**
+Apache License 2.0
+**Where to send questions or comments about the model:**
+https://github.com/lm-sys/FastChat/issues
+## Intended use
+**Primary intended uses:**
+The primary use of Vicuna is research on large language models and chatbots.
+**Primary intended users:**
+The primary intended users of the model are researchers and hobbyists in natural language processing, machine learning, and artificial intelligence.
+## Training dataset
+70K conversations collected from ShareGPT.com.
+## Evaluation dataset
+A preliminary evaluation of the model quality is conducted by creating a set of 80 diverse questions and utilizing GPT-4 to judge the model outputs. See https://vicuna.lmsys.org/ for more details.