TheBloke
/

WizardLM-Uncensored-Falcon-40B-GPTQ

@@ -37,73 +37,29 @@ It is also expected to be **VERY SLOW**. This is unavoidable at the moment, but
 To use it you will require:
-1. AutoGPTQ
-2. `pip install einops`
 You can then use it immediately from Python code - see example code below - or from text-generation-webui.
 ## AutoGPTQ
-Please install AutoGPTQ version 0.2.1 or later: `pip install auto-gptq`
-If you have any problems installing AutoGPTQ with CUDA support, you can try compiling manually from source:
 ```
 git clone https://github.com/PanQiWei/AutoGPTQ
 cd AutoGPTQ
-pip install .
 ```
 The manual installation steps will require that you have the [Nvidia CUDA toolkit](https://developer.nvidia.com/cuda-12-0-1-download-archive) installed.
-## text-generation-webui
-There is also provisional AutoGPTQ support in text-generation-webui.
-This requires a text-generation-webui as of commit 204731952ae59d79ea3805a425c73dd171d943c3.
-So please first update text-genration-webui to the latest version.
-## How to download and use this model in text-generation-webui
-1. Launch text-generation-webui with the following command-line arguments: `--autogptq --trust-remote-code`
-2. Click the **Model tab**.
-3. Under **Download custom model or LoRA**, enter `TheBloke/WizardLM-Uncensored-Falcon-40B-GPTQ`.
-4. Click **Download**.
-5. Wait until it says it's finished downloading.
-6. Click the **Refresh** icon next to **Model** in the top left.
-7. In the **Model drop-down**: choose the model you just downloaded, `WizardLM-Uncensored-Falcon-40B-GPTQ`.
-8. Once it says it's loaded, click the **Text Generation tab** and enter a prompt!
-## Prompt template
-Prompt format is WizardLM.
-```
-What is a falcon?  Can I keep one as a pet?
-### Response:
-```
-## About `trust-remote-code`
-Please be aware that this command line argument causes Python code provided by Falcon to be executed on your machine.
-This code is required at the moment because Falcon is too new to be supported by Hugging Face transformers. At some point in the future transformers will support the model natively, and then `trust_remote_code` will no longer be needed.
-In this repo you can see two `.py` files - these are the files that get executed. They are copied from the base repo at [Falcon-7B-Instruct](https://huggingface.co/tiiuae/falcon-7b-instruct).
 ## Simple Python example code
-To run this code you need to install AutoGPTQ from source:
-```
-git clone https://github.com/PanQiWei/AutoGPTQ
-cd AutoGPTQ
-pip install . # This step requires CUDA toolkit installed
-```
-And install einops:
-```
-pip install einops
-```
 You can then run this example code:
 ```python
@@ -129,6 +85,25 @@ output = model.generate(input_ids=tokens, max_new_tokens=100, do_sample=True, te
 print(tokenizer.decode(output[0]))
 ```
 ## Provided files
 **gptq_model-4bit--1g.safetensors**
@@ -145,6 +120,25 @@ It was created without group_size to reduce VRAM usage, and with `desc_act` (act
   * Does not work with any version of GPTQ-for-LLaMa
   * Parameters: Groupsize = None. With act-order / desc_act.
 <!-- footer start -->
 ## Discord

 To use it you will require:
+1. Python 3.10.11
+2. AutoGPTQ v0.2.1 (see below)
+3. Pytorch Stable with CUDA 11.8 (`pip install torch  --index-url https://download.pytorch.org/whl/cu118`)
+4. einops (`pip install einops`)
 You can then use it immediately from Python code - see example code below - or from text-generation-webui.
 ## AutoGPTQ
+You should install AutoGPTQ of version v0.2.1, thus you can try compiling manually from source:
 ```
 git clone https://github.com/PanQiWei/AutoGPTQ
 cd AutoGPTQ
+git checkout v0.2.1
+pip install . --no-cache-dir # This step requires CUDA toolkit installed
 ```
 The manual installation steps will require that you have the [Nvidia CUDA toolkit](https://developer.nvidia.com/cuda-12-0-1-download-archive) installed.
 ## Simple Python example code
+To run this code you need to have the prerequisites installed.
 You can then run this example code:
 ```python
 print(tokenizer.decode(output[0]))
 ```
+## text-generation-webui
+There is also provisional AutoGPTQ support in text-generation-webui.
+This requires a text-generation-webui version of commit `204731952ae59d79ea3805a425c73dd171d943c3` or newer.
+So please first update text-generation-webui to the latest version.
+### How to download and use this model in text-generation-webui
+1. Launch text-generation-webui with the following command-line arguments: `--autogptq --trust-remote-code`
+2. Click the **Model tab**.
+3. Under **Download custom model or LoRA**, enter `TheBloke/WizardLM-Uncensored-Falcon-40B-GPTQ`.
+4. Click **Download**.
+5. Wait until it says it's finished downloading.
+6. Click the **Refresh** icon next to **Model** in the top left.
+7. In the **Model drop-down**: choose the model you just downloaded, `WizardLM-Uncensored-Falcon-40B-GPTQ`.
+8. Once it says it's loaded, click the **Text Generation tab** and enter a prompt!
 ## Provided files
 **gptq_model-4bit--1g.safetensors**
   * Does not work with any version of GPTQ-for-LLaMa
   * Parameters: Groupsize = None. With act-order / desc_act.
+## FAQ
+### Prompt template
+Prompt format is WizardLM.
+```
+What is a falcon?  Can I keep one as a pet?
+### Response:
+```
+### About `trust-remote-code`
+Please be aware that this command line argument causes Python code provided by Falcon to be executed on your machine.
+This code is required at the moment because Falcon is too new to be supported by Hugging Face transformers. At some point in the future transformers will support the model natively, and then `trust_remote_code` will no longer be needed.
+In this repo you can see two `.py` files - these are the files that get executed. They are copied from the base repo at [Falcon-7B-Instruct](https://huggingface.co/tiiuae/falcon-7b-instruct).
 <!-- footer start -->
 ## Discord