|
--- |
|
license: other |
|
inference: false |
|
--- |
|
# Vicuna 7B GPTQ 4-bit 128g |
|
|
|
This repository contains the [Vicuna 7B model](https://huggingface.co./lmsys/vicuna-7b-delta-v0) quantised using [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa). |
|
|
|
The original Vicuna 7B repository contains deltas rather than weights. Rather than merging the deltas myself, I used the model files from https://huggingface.co./helloollel/vicuna-7b. |
|
|
|
## Provided files |
|
|
|
Two model files are provided. You don't need both, choose the one you prefer. |
|
|
|
Details of the files provided: |
|
* `vicuna-7B-GPTQ-4bit-128g.pt` |
|
* pt format file, created with the latest [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa) code. |
|
* Command to create: |
|
* `python3 llama.py vicuna-7B c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save vicuna-7B-GPTQ-4bit-128g.pt` |
|
* `vicuna-7B-GPTQ-4bit-128g.safetensors` |
|
* newer `safetensors` format, with improved file security, created with the latest [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa) code. |
|
* Command to create: |
|
* `python3 llama.py vicuna-7B c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save_safetensors vicuna-7B-GPTQ-4bit-128g.safetensors` |
|
|
|
## How to run these GPTQ models in `text-generation-webui` |
|
|
|
These model files were created with the latest GPTQ code, and require that the latest GPTQ-for-LLaMa is used inside the UI. |
|
|
|
Here are the commands I used to clone the Triton branch of GPTQ-for-LLaMa, clone text-generation-webui, and install GPTQ into the UI: |
|
``` |
|
git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa |
|
git clone https://github.com/oobabooga/text-generation-webui |
|
mkdir -p text-generation-webui/repositories |
|
ln -s GPTQ-for-LLaMa text-generation-webui/repositories/GPTQ-for-LLaMa |
|
``` |
|
|
|
Then install this model into `text-generation-webui/models` and launch the UI as follows: |
|
``` |
|
cd text-generation-webui |
|
python server.py --model vicuna-7B-GPTQ-4bit-128g --wbits 4 --groupsize 128 # add any other command line args you want |
|
``` |
|
|
|
The above commands assume you have installed all dependencies for GPTQ-for-LLaMa and text-generation-webui. Please see their respective repositories for further information. |
|
|
|
If you are on Windows, or cannot use the Triton branch of GPTQ for any other reason, you can instead use the CUDA branch: |
|
``` |
|
git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa -b cuda |
|
cd GPTQ-for-LLaMa |
|
python setup_cuda.py install |
|
``` |
|
Then link that into `text-generation-webui/repositories` as described above. |
|
|
|
# Vicuna Model Card |
|
|
|
## Model details |
|
|
|
**Model type:** |
|
Vicuna is an open-source chatbot trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT. |
|
It is an auto-regressive language model, based on the transformer architecture. |
|
|
|
**Model date:** |
|
Vicuna was trained between March 2023 and April 2023. |
|
|
|
**Organizations developing the model:** |
|
The Vicuna team with members from UC Berkeley, CMU, Stanford, and UC San Diego. |
|
|
|
**Paper or resources for more information:** |
|
https://vicuna.lmsys.org/ |
|
|
|
**License:** |
|
Apache License 2.0 |
|
|
|
**Where to send questions or comments about the model:** |
|
https://github.com/lm-sys/FastChat/issues |
|
|
|
## Intended use |
|
**Primary intended uses:** |
|
The primary use of Vicuna is research on large language models and chatbots. |
|
|
|
**Primary intended users:** |
|
The primary intended users of the model are researchers and hobbyists in natural language processing, machine learning, and artificial intelligence. |
|
|
|
## Training dataset |
|
70K conversations collected from ShareGPT.com. |
|
|
|
## Evaluation dataset |
|
A preliminary evaluation of the model quality is conducted by creating a set of 80 diverse questions and utilizing GPT-4 to judge the model outputs. See https://vicuna.lmsys.org/ for more details. |
|
|