Text Generation
Transformers
Safetensors
English
llama
text-generation-inference
4-bit precision
gptq
TheBloke commited on
Commit
8807957
1 Parent(s): 21c8b9a

Update for Transformers GPTQ support

Browse files
README.md CHANGED
@@ -11,17 +11,20 @@ pipeline_tag: text-generation
11
  ---
12
 
13
  <!-- header start -->
14
- <div style="width: 100%;">
15
- <img src="https://i.imgur.com/EBdldam.jpg" alt="TheBlokeAI" style="width: 100%; min-width: 400px; display: block; margin: auto;">
 
16
  </div>
17
  <div style="display: flex; justify-content: space-between; width: 100%;">
18
  <div style="display: flex; flex-direction: column; align-items: flex-start;">
19
- <p><a href="https://discord.gg/theblokeai">Chat & support: my new Discord server</a></p>
20
  </div>
21
  <div style="display: flex; flex-direction: column; align-items: flex-end;">
22
- <p><a href="https://www.patreon.com/TheBlokeAI">Want to contribute? TheBloke's Patreon page</a></p>
23
  </div>
24
  </div>
 
 
25
  <!-- header end -->
26
 
27
  # Pankaj Mathur's Orca Mini v2 7B GPTQ
@@ -182,6 +185,7 @@ The files provided will work with AutoGPTQ (CUDA and Triton modes), GPTQ-for-LLa
182
  ExLlama works with Llama models in 4-bit. Please see the Provided Files table above for per-file compatibility.
183
 
184
  <!-- footer start -->
 
185
  ## Discord
186
 
187
  For further support, and discussions on these models and AI in general, join us at:
@@ -201,12 +205,15 @@ Donaters will get priority support on any and all AI/LLM/model questions and req
201
  * Patreon: https://patreon.com/TheBlokeAI
202
  * Ko-Fi: https://ko-fi.com/TheBlokeAI
203
 
204
- **Special thanks to**: Luke from CarbonQuill, Aemon Algiz.
 
 
205
 
206
- **Patreon special mentions**: Space Cruiser, Nikolai Manek, Sam, Chris McCloskey, Rishabh Srivastava, Kalila, Spiking Neurons AB, Khalefa Al-Ahmad, WelcomeToTheClub, Chadd, Lone Striker, Viktor Bowallius, Edmond Seymore, Ai Maven, Chris Smitley, Dave, Alexandros Triantafyllidis, Luke @flexchar, Elle, ya boyyy, Talal Aujan, Alex , Jonathan Leane, Deep Realms, Randy H, subjectnull, Preetika Verma, Joseph William Delisle, Michael Levine, chris gileta, K, Oscar Rangel, LangChain4j, Trenton Dambrowitz, Eugene Pentland, Johann-Peter Hartmann, Femi Adebogun, Illia Dulskyi, senxiiz, Daniel P. Andersen, Sean Connelly, Artur Olbinski, RoA, Mano Prime, Derek Yates, Raven Klaugh, David Flickinger, Willem Michiel, Pieter, Willian Hasse, vamX, Luke Pendergrass, webtim, Ghost , Rainer Wilmers, Nathan LeClaire, Will Dee, Cory Kujawski, John Detwiler, Fred von Graf, biorpg, Iucharbius , Imad Khwaja, Pierre Kircher, terasurfer , Asp the Wyvern, John Villwock, theTransient, zynix , Gabriel Tamborski, Fen Risland, Gabriel Puliatti, Matthew Berman, Pyrater, SuperWojo, Stephen Murray, Karl Bernard, Ajan Kanaga, Greatston Gnanesh, Junyu Yang.
207
 
208
  Thank you to all my generous patrons and donaters!
209
 
 
 
210
  <!-- footer end -->
211
 
212
  # Original model card: Pankaj Mathur's Orca Mini v2 7B
@@ -222,7 +229,7 @@ Please note this model has *better code generation capabilities* compare to our
222
 
223
  # Evaluation
224
 
225
- I evaluated orca_mini_v2_7b on a wide range of tasks using [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) from EleutherAI.
226
 
227
  Here are the results on metrics used by [HuggingFaceH4 Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
228
 
@@ -326,12 +333,12 @@ model = LlamaForCausalLM.from_pretrained(
326
 
327
  #generate text function
328
  def generate_text(system, instruction, input=None):
329
-
330
  if input:
331
  prompt = f"### System:\n{system}\n\n### User:\n{instruction}\n\n### Input:\n{input}\n\n### Response:\n"
332
  else:
333
  prompt = f"### System:\n{system}\n\n### User:\n{instruction}\n\n### Response:\n"
334
-
335
  tokens = tokenizer.encode(prompt)
336
  tokens = torch.LongTensor(tokens).unsqueeze(0)
337
  tokens = tokens.to('cuda')
@@ -341,14 +348,14 @@ def generate_text(system, instruction, input=None):
341
  length = len(tokens[0])
342
  with torch.no_grad():
343
  rest = model.generate(
344
- input_ids=tokens,
345
- max_length=length+instance['generate_len'],
346
- use_cache=True,
347
- do_sample=True,
348
  top_p=instance['top_p'],
349
  temperature=instance['temperature'],
350
  top_k=instance['top_k']
351
- )
352
  output = rest[0][length:]
353
  string = tokenizer.decode(output, skip_special_tokens=True)
354
  return f'[!] Response: {string}'
@@ -408,7 +415,7 @@ If you found wizardlm_alpaca_dolly_orca_open_llama_7b useful in your research or
408
 
409
  ```
410
  @misc{mukherjee2023orca,
411
- title={Orca: Progressive Learning from Complex Explanation Traces of GPT-4},
412
  author={Subhabrata Mukherjee and Arindam Mitra and Ganesh Jawahar and Sahaj Agarwal and Hamid Palangi and Ahmed Awadallah},
413
  year={2023},
414
  eprint={2306.02707},
@@ -456,7 +463,7 @@ If you found wizardlm_alpaca_dolly_orca_open_llama_7b useful in your research or
456
  ```
457
  ```
458
  @misc{xu2023wizardlm,
459
- title={WizardLM: Empowering Large Language Models to Follow Complex Instructions},
460
  author={Can Xu and Qingfeng Sun and Kai Zheng and Xiubo Geng and Pu Zhao and Jiazhan Feng and Chongyang Tao and Daxin Jiang},
461
  year={2023},
462
  eprint={2304.12244},
 
11
  ---
12
 
13
  <!-- header start -->
14
+ <!-- 200823 -->
15
+ <div style="width: auto; margin-left: auto; margin-right: auto">
16
+ <img src="https://i.imgur.com/EBdldam.jpg" alt="TheBlokeAI" style="width: 100%; min-width: 400px; display: block; margin: auto;">
17
  </div>
18
  <div style="display: flex; justify-content: space-between; width: 100%;">
19
  <div style="display: flex; flex-direction: column; align-items: flex-start;">
20
+ <p style="margin-top: 0.5em; margin-bottom: 0em;"><a href="https://discord.gg/theblokeai">Chat & support: TheBloke's Discord server</a></p>
21
  </div>
22
  <div style="display: flex; flex-direction: column; align-items: flex-end;">
23
+ <p style="margin-top: 0.5em; margin-bottom: 0em;"><a href="https://www.patreon.com/TheBlokeAI">Want to contribute? TheBloke's Patreon page</a></p>
24
  </div>
25
  </div>
26
+ <div style="text-align:center; margin-top: 0em; margin-bottom: 0em"><p style="margin-top: 0.25em; margin-bottom: 0em;">TheBloke's LLM work is generously supported by a grant from <a href="https://a16z.com">andreessen horowitz (a16z)</a></p></div>
27
+ <hr style="margin-top: 1.0em; margin-bottom: 1.0em;">
28
  <!-- header end -->
29
 
30
  # Pankaj Mathur's Orca Mini v2 7B GPTQ
 
185
  ExLlama works with Llama models in 4-bit. Please see the Provided Files table above for per-file compatibility.
186
 
187
  <!-- footer start -->
188
+ <!-- 200823 -->
189
  ## Discord
190
 
191
  For further support, and discussions on these models and AI in general, join us at:
 
205
  * Patreon: https://patreon.com/TheBlokeAI
206
  * Ko-Fi: https://ko-fi.com/TheBlokeAI
207
 
208
+ **Special thanks to**: Aemon Algiz.
209
+
210
+ **Patreon special mentions**: Sam, theTransient, Jonathan Leane, Steven Wood, webtim, Johann-Peter Hartmann, Geoffrey Montalvo, Gabriel Tamborski, Willem Michiel, John Villwock, Derek Yates, Mesiah Bishop, Eugene Pentland, Pieter, Chadd, Stephen Murray, Daniel P. Andersen, terasurfer, Brandon Frisco, Thomas Belote, Sid, Nathan LeClaire, Magnesian, Alps Aficionado, Stanislav Ovsiannikov, Alex, Joseph William Delisle, Nikolai Manek, Michael Davis, Junyu Yang, K, J, Spencer Kim, Stefan Sabev, Olusegun Samson, transmissions 11, Michael Levine, Cory Kujawski, Rainer Wilmers, zynix, Kalila, Luke @flexchar, Ajan Kanaga, Mandus, vamX, Ai Maven, Mano Prime, Matthew Berman, subjectnull, Vitor Caleffi, Clay Pascal, biorpg, alfie_i, 阿明, Jeffrey Morgan, ya boyyy, Raymond Fosdick, knownsqashed, Olakabola, Leonard Tan, ReadyPlayerEmma, Enrico Ros, Dave, Talal Aujan, Illia Dulskyi, Sean Connelly, senxiiz, Artur Olbinski, Elle, Raven Klaugh, Fen Risland, Deep Realms, Imad Khwaja, Fred von Graf, Will Dee, usrbinkat, SuperWojo, Alexandros Triantafyllidis, Swaroop Kallakuri, Dan Guido, John Detwiler, Pedro Madruga, Iucharbius, Viktor Bowallius, Asp the Wyvern, Edmond Seymore, Trenton Dambrowitz, Space Cruiser, Spiking Neurons AB, Pyrater, LangChain4j, Tony Hughes, Kacper Wikieł, Rishabh Srivastava, David Ziegler, Luke Pendergrass, Andrey, Gabriel Puliatti, Lone Striker, Sebastain Graf, Pierre Kircher, Randy H, NimbleBox.ai, Vadim, danny, Deo Leter
211
 
 
212
 
213
  Thank you to all my generous patrons and donaters!
214
 
215
+ And thank you again to a16z for their generous grant.
216
+
217
  <!-- footer end -->
218
 
219
  # Original model card: Pankaj Mathur's Orca Mini v2 7B
 
229
 
230
  # Evaluation
231
 
232
+ I evaluated orca_mini_v2_7b on a wide range of tasks using [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) from EleutherAI.
233
 
234
  Here are the results on metrics used by [HuggingFaceH4 Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
235
 
 
333
 
334
  #generate text function
335
  def generate_text(system, instruction, input=None):
336
+
337
  if input:
338
  prompt = f"### System:\n{system}\n\n### User:\n{instruction}\n\n### Input:\n{input}\n\n### Response:\n"
339
  else:
340
  prompt = f"### System:\n{system}\n\n### User:\n{instruction}\n\n### Response:\n"
341
+
342
  tokens = tokenizer.encode(prompt)
343
  tokens = torch.LongTensor(tokens).unsqueeze(0)
344
  tokens = tokens.to('cuda')
 
348
  length = len(tokens[0])
349
  with torch.no_grad():
350
  rest = model.generate(
351
+ input_ids=tokens,
352
+ max_length=length+instance['generate_len'],
353
+ use_cache=True,
354
+ do_sample=True,
355
  top_p=instance['top_p'],
356
  temperature=instance['temperature'],
357
  top_k=instance['top_k']
358
+ )
359
  output = rest[0][length:]
360
  string = tokenizer.decode(output, skip_special_tokens=True)
361
  return f'[!] Response: {string}'
 
415
 
416
  ```
417
  @misc{mukherjee2023orca,
418
+ title={Orca: Progressive Learning from Complex Explanation Traces of GPT-4},
419
  author={Subhabrata Mukherjee and Arindam Mitra and Ganesh Jawahar and Sahaj Agarwal and Hamid Palangi and Ahmed Awadallah},
420
  year={2023},
421
  eprint={2306.02707},
 
463
  ```
464
  ```
465
  @misc{xu2023wizardlm,
466
+ title={WizardLM: Empowering Large Language Models to Follow Complex Instructions},
467
  author={Can Xu and Qingfeng Sun and Kai Zheng and Xiubo Geng and Pu Zhao and Jiazhan Feng and Chongyang Tao and Daxin Jiang},
468
  year={2023},
469
  eprint={2304.12244},
config.json CHANGED
@@ -1,24 +1,34 @@
1
  {
2
- "_name_or_path": "huggyllama/llama-7b",
3
- "architectures": [
4
- "LlamaForCausalLM"
5
- ],
6
- "bos_token_id": 1,
7
- "eos_token_id": 2,
8
- "hidden_act": "silu",
9
- "hidden_size": 4096,
10
- "initializer_range": 0.02,
11
- "intermediate_size": 11008,
12
- "max_position_embeddings": 2048,
13
- "max_sequence_length": 2048,
14
- "model_type": "llama",
15
- "num_attention_heads": 32,
16
- "num_hidden_layers": 32,
17
- "pad_token_id": 0,
18
- "rms_norm_eps": 1e-06,
19
- "tie_word_embeddings": false,
20
- "torch_dtype": "float32",
21
- "transformers_version": "4.29.1",
22
- "use_cache": true,
23
- "vocab_size": 32000
 
 
 
 
 
 
 
 
 
 
24
  }
 
1
  {
2
+ "_name_or_path": "huggyllama/llama-7b",
3
+ "architectures": [
4
+ "LlamaForCausalLM"
5
+ ],
6
+ "bos_token_id": 1,
7
+ "eos_token_id": 2,
8
+ "hidden_act": "silu",
9
+ "hidden_size": 4096,
10
+ "initializer_range": 0.02,
11
+ "intermediate_size": 11008,
12
+ "max_position_embeddings": 2048,
13
+ "max_sequence_length": 2048,
14
+ "model_type": "llama",
15
+ "num_attention_heads": 32,
16
+ "num_hidden_layers": 32,
17
+ "pad_token_id": 0,
18
+ "rms_norm_eps": 1e-06,
19
+ "tie_word_embeddings": false,
20
+ "torch_dtype": "float32",
21
+ "transformers_version": "4.29.1",
22
+ "use_cache": true,
23
+ "vocab_size": 32000,
24
+ "quantization_config": {
25
+ "bits": 4,
26
+ "group_size": 128,
27
+ "damp_percent": 0.01,
28
+ "desc_act": false,
29
+ "sym": true,
30
+ "true_sequential": true,
31
+ "model_file_base_name": "model",
32
+ "quant_method": "gptq"
33
+ }
34
  }
orca-mini-v2_7b-GPTQ-4bit-128g.no-act.order.safetensors → model.safetensors RENAMED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:fd42ac2c47b351c29abd5c5a0c14fd5512b984b443e7034dfd12a89bd46e61cc
3
- size 4520875496
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:41b0305aa46cefa8f0397b68990f08f1f5467c22fe217a67cc31383499c12ee0
3
+ size 4520875552
quantize_config.json CHANGED
@@ -1,8 +1,9 @@
1
  {
2
- "bits": 4,
3
- "group_size": 128,
4
- "damp_percent": 0.01,
5
- "desc_act": false,
6
- "sym": true,
7
- "true_sequential": true
 
8
  }
 
1
  {
2
+ "bits": 4,
3
+ "group_size": 128,
4
+ "damp_percent": 0.01,
5
+ "desc_act": false,
6
+ "sym": true,
7
+ "true_sequential": true,
8
+ "model_file_base_name": "model"
9
  }