madhavatreplit commited on
Commit
8b0fee3
1 Parent(s): 572f532

Update README for 8-bit and 4-bit

Browse files

Updating README to add instructions for using in 8-bit and 4-bit.

Files changed (1) hide show
  1. README.md +47 -0
README.md CHANGED
@@ -177,6 +177,53 @@ print(generated_code)
177
 
178
  Experiment with different decoding methods and parameters to get the best results for your use case.
179
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
180
  ### Post Processing
181
 
182
  Note that as with all code generation models, post-processing of the generated code is important. In particular, the following post-processing steps are recommended:
 
177
 
178
  Experiment with different decoding methods and parameters to get the best results for your use case.
179
 
180
+
181
+ ### Loading with 8-bit and 4-bit quantization
182
+
183
+ #### Loading in 8-bit
184
+ You can also load the model in 8-bit with the `load_in_8bit=True` kwarg that uses `bitsandbytes` under the hood.
185
+
186
+ First you need to install the following additional dependanices:
187
+ ``
188
+ accelerate
189
+ bitsandbytes
190
+ ``
191
+
192
+ Then you can load the model in 8bit as follows:
193
+
194
+ ```
195
+ model = AutoModelForCausalLM.from_pretrained("replit/replit-code-v1-3b",
196
+ trust_remote_code=True,
197
+ device_map="auto",
198
+ load_in_8bit=True)
199
+ ```
200
+ The additional kwargs that make this possible are `device_map='auto'` and `load_in_8bit=True`.
201
+
202
+ #### Loading in 4-bit
203
+
204
+ For loading in 4-bit, at the time of writing, support for `load_in_4bit` has not been merged into the latest releases for
205
+ `transformers` and `accelerate`. However you can use it if you install the dependancies the `main` branches of the published repos:
206
+
207
+ ```bash
208
+ pip install git+https://github.com/huggingface/accelerate.git
209
+ pip install git+https://github.com/huggingface/transformers.git
210
+ ```
211
+
212
+ Then load in 4-bit with:
213
+
214
+ ```
215
+ model = AutoModelForCausalLM.from_pretrained("replit/replit-code-v1-3b",
216
+ trust_remote_code=True,
217
+ device_map="auto",
218
+ load_in_4bit=True)
219
+ ```
220
+
221
+ #### References
222
+ - [Hugging Face's Quantization Doc](https://huggingface.co/docs/transformers/main/main_classes/quantization)
223
+ - [Original Blogpost introducing 8-bit](https://huggingface.co/blog/hf-bitsandbytes-integration)
224
+ - [New Blogpost introducing 4-bit](https://huggingface.co/blog/4bit-transformers-bitsandbytes)
225
+
226
+
227
  ### Post Processing
228
 
229
  Note that as with all code generation models, post-processing of the generated code is important. In particular, the following post-processing steps are recommended: