TheBloke
/

falcon-40b-instruct-GPTQ

@@ -33,51 +33,39 @@ Want to support me and help pay my cloud computing bill? I also now have a Patre
 ## EXPERIMENTAL
-Please note this is an experimental first model. Support for it is currently quite limited.
 To use it you will require:
 1. AutoGPTQ, from the latest `main` branch and compiled with `pip install .`
 2. `pip install einops`
-You can then use it immediately from Python code - see example code below
-## text-generation-webui
-There is also provisional AutoGPTQ support in text-generation-webui.
-However at the time I'm writing this, a commit is needed to text-generation-webui to enable it to load this model.
-I have [opened a PR here](https://github.com/oobabooga/text-generation-webui/pull/2374); once this is merged, text-generation-webui will support this GPTQ model.
-To get it working before the PR is merged, you will need to:
-1. Edit `text-generation-webui/modules/AutoGPTQ_loader.py`
-2. Make the following change:
-Find the line that says:
 ```
-'use_safetensors': use_safetensors,
 ```
-And after it, add:
-```
-'trust_remote_code': shared.args.trust_remote_code,
-```
-[Once you are done the file should look like this](https://github.com/oobabooga/text-generation-webui/blob/473a57e35219c063d2fc230cfc7b5a118b448b38/modules/AutoGPTQ_loader.py#L33-L39)
-3. Then save and close the file, and launch text-generation-webui as described below
-4. Install the latest AutoGPTQ and compile from source - note that this requires compiling the CUDA kernel, which requires CUDA toolkit. This may be an issue for Windows users.
-```
-git clone https://github.com/PanQiWei/AutoGPTQ
-cd AutoGPTQ
-pip install . # This step requires CUDA toolkit installed
-```
 ## How to download and use this model in text-generation-webui
-1. Launch text-generation-webui with the following command-line arguments: `--autogptq --trust_remote_code`
 2. Click the **Model tab**.
 3. Under **Download custom model or LoRA**, enter `TheBloke/falcon-40B-instruct-GPTQ`.
 4. Click **Download**.

 ## EXPERIMENTAL
+Please note this is an experimental GPTQ model. Support for it is currently quite limited.
+It is also expected to be **VERY SLOW**. This is unavoidable at the moment, but is being looked at.
 To use it you will require:
 1. AutoGPTQ, from the latest `main` branch and compiled with `pip install .`
 2. `pip install einops`
+You can then use it immediately from Python code - see example code below - or from text-generation-webui.
+## AutoGPTQ
+To install AutoGPTQ please follow these instructions:
 ```
+git clone https://github.com/PanQiWei/AutoGPTQ
+cd AutoGPTQ
+pip install .
 ```
+These steps will require that you have the [Nvidia CUDA toolkit](https://developer.nvidia.com/cuda-12-0-1-download-archive) installed.
+## text-generation-webui
+There is also provisional AutoGPTQ support in text-generation-webui.
+This requires text-generation-webui as of commit 204731952ae59d79ea3805a425c73dd171d943c3.
+So please first update text-genration-webui to the latest version.
 ## How to download and use this model in text-generation-webui
+1. Launch text-generation-webui with the following command-line arguments: `--autogptq --trust-remote-code`
 2. Click the **Model tab**.
 3. Under **Download custom model or LoRA**, enter `TheBloke/falcon-40B-instruct-GPTQ`.
 4. Click **Download**.