Improve README instructions.

#4
by kalila - opened
Files changed (1) hide show
  1. README.md +46 -52
README.md CHANGED
@@ -37,73 +37,29 @@ It is also expected to be **VERY SLOW**. This is unavoidable at the moment, but
37
 
38
  To use it you will require:
39
 
40
- 1. AutoGPTQ
41
- 2. `pip install einops`
 
 
42
 
43
  You can then use it immediately from Python code - see example code below - or from text-generation-webui.
44
 
45
  ## AutoGPTQ
46
 
47
- Please install AutoGPTQ version 0.2.1 or later: `pip install auto-gptq`
48
-
49
- If you have any problems installing AutoGPTQ with CUDA support, you can try compiling manually from source:
50
 
51
  ```
52
  git clone https://github.com/PanQiWei/AutoGPTQ
53
  cd AutoGPTQ
54
- pip install .
 
55
  ```
56
 
57
  The manual installation steps will require that you have the [Nvidia CUDA toolkit](https://developer.nvidia.com/cuda-12-0-1-download-archive) installed.
58
 
59
- ## text-generation-webui
60
-
61
- There is also provisional AutoGPTQ support in text-generation-webui.
62
-
63
- This requires a text-generation-webui as of commit 204731952ae59d79ea3805a425c73dd171d943c3.
64
-
65
- So please first update text-genration-webui to the latest version.
66
-
67
- ## How to download and use this model in text-generation-webui
68
-
69
- 1. Launch text-generation-webui with the following command-line arguments: `--autogptq --trust-remote-code`
70
- 2. Click the **Model tab**.
71
- 3. Under **Download custom model or LoRA**, enter `TheBloke/WizardLM-Uncensored-Falcon-40B-GPTQ`.
72
- 4. Click **Download**.
73
- 5. Wait until it says it's finished downloading.
74
- 6. Click the **Refresh** icon next to **Model** in the top left.
75
- 7. In the **Model drop-down**: choose the model you just downloaded, `WizardLM-Uncensored-Falcon-40B-GPTQ`.
76
- 8. Once it says it's loaded, click the **Text Generation tab** and enter a prompt!
77
-
78
- ## Prompt template
79
-
80
- Prompt format is WizardLM.
81
-
82
- ```
83
- What is a falcon? Can I keep one as a pet?
84
- ### Response:
85
- ```
86
-
87
- ## About `trust-remote-code`
88
-
89
- Please be aware that this command line argument causes Python code provided by Falcon to be executed on your machine.
90
-
91
- This code is required at the moment because Falcon is too new to be supported by Hugging Face transformers. At some point in the future transformers will support the model natively, and then `trust_remote_code` will no longer be needed.
92
-
93
- In this repo you can see two `.py` files - these are the files that get executed. They are copied from the base repo at [Falcon-7B-Instruct](https://huggingface.co/tiiuae/falcon-7b-instruct).
94
-
95
  ## Simple Python example code
96
 
97
- To run this code you need to install AutoGPTQ from source:
98
- ```
99
- git clone https://github.com/PanQiWei/AutoGPTQ
100
- cd AutoGPTQ
101
- pip install . # This step requires CUDA toolkit installed
102
- ```
103
- And install einops:
104
- ```
105
- pip install einops
106
- ```
107
 
108
  You can then run this example code:
109
  ```python
@@ -129,6 +85,25 @@ output = model.generate(input_ids=tokens, max_new_tokens=100, do_sample=True, te
129
  print(tokenizer.decode(output[0]))
130
  ```
131
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
132
  ## Provided files
133
 
134
  **gptq_model-4bit--1g.safetensors**
@@ -145,6 +120,25 @@ It was created without group_size to reduce VRAM usage, and with `desc_act` (act
145
  * Does not work with any version of GPTQ-for-LLaMa
146
  * Parameters: Groupsize = None. With act-order / desc_act.
147
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
148
  <!-- footer start -->
149
  ## Discord
150
 
 
37
 
38
  To use it you will require:
39
 
40
+ 1. Python 3.10.11
41
+ 2. AutoGPTQ v0.2.1 (see below)
42
+ 3. Pytorch Stable with CUDA 11.8 (`pip install torch --index-url https://download.pytorch.org/whl/cu118`)
43
+ 4. einops (`pip install einops`)
44
 
45
  You can then use it immediately from Python code - see example code below - or from text-generation-webui.
46
 
47
  ## AutoGPTQ
48
 
49
+ You should install AutoGPTQ of version v0.2.1, thus you can try compiling manually from source:
 
 
50
 
51
  ```
52
  git clone https://github.com/PanQiWei/AutoGPTQ
53
  cd AutoGPTQ
54
+ git checkout v0.2.1
55
+ pip install . --no-cache-dir # This step requires CUDA toolkit installed
56
  ```
57
 
58
  The manual installation steps will require that you have the [Nvidia CUDA toolkit](https://developer.nvidia.com/cuda-12-0-1-download-archive) installed.
59
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
60
  ## Simple Python example code
61
 
62
+ To run this code you need to have the prerequisites installed.
 
 
 
 
 
 
 
 
 
63
 
64
  You can then run this example code:
65
  ```python
 
85
  print(tokenizer.decode(output[0]))
86
  ```
87
 
88
+ ## text-generation-webui
89
+
90
+ There is also provisional AutoGPTQ support in text-generation-webui.
91
+
92
+ This requires a text-generation-webui version of commit `204731952ae59d79ea3805a425c73dd171d943c3` or newer.
93
+
94
+ So please first update text-generation-webui to the latest version.
95
+
96
+ ### How to download and use this model in text-generation-webui
97
+
98
+ 1. Launch text-generation-webui with the following command-line arguments: `--autogptq --trust-remote-code`
99
+ 2. Click the **Model tab**.
100
+ 3. Under **Download custom model or LoRA**, enter `TheBloke/WizardLM-Uncensored-Falcon-40B-GPTQ`.
101
+ 4. Click **Download**.
102
+ 5. Wait until it says it's finished downloading.
103
+ 6. Click the **Refresh** icon next to **Model** in the top left.
104
+ 7. In the **Model drop-down**: choose the model you just downloaded, `WizardLM-Uncensored-Falcon-40B-GPTQ`.
105
+ 8. Once it says it's loaded, click the **Text Generation tab** and enter a prompt!
106
+
107
  ## Provided files
108
 
109
  **gptq_model-4bit--1g.safetensors**
 
120
  * Does not work with any version of GPTQ-for-LLaMa
121
  * Parameters: Groupsize = None. With act-order / desc_act.
122
 
123
+ ## FAQ
124
+
125
+ ### Prompt template
126
+
127
+ Prompt format is WizardLM.
128
+
129
+ ```
130
+ What is a falcon? Can I keep one as a pet?
131
+ ### Response:
132
+ ```
133
+
134
+ ### About `trust-remote-code`
135
+
136
+ Please be aware that this command line argument causes Python code provided by Falcon to be executed on your machine.
137
+
138
+ This code is required at the moment because Falcon is too new to be supported by Hugging Face transformers. At some point in the future transformers will support the model natively, and then `trust_remote_code` will no longer be needed.
139
+
140
+ In this repo you can see two `.py` files - these are the files that get executed. They are copied from the base repo at [Falcon-7B-Instruct](https://huggingface.co/tiiuae/falcon-7b-instruct).
141
+
142
  <!-- footer start -->
143
  ## Discord
144