fix documentation for loading the model, since the fused attention module doesnt work here either. (#4)

- fix documentation for loading the model, since the fused attention module doesnt work here either. (af510ccccb74ef78f8738bacbf4a13d6fc5b2e0a)

Co-authored-by: Moshe Berchansky <[email protected]>

Files changed (1) hide show

README.md CHANGED Viewed

@@ -122,6 +122,7 @@ tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
 model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
         model_basename=model_basename,
         use_safetensors=True,
         trust_remote_code=False,
         device="cuda:0",

 model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
         model_basename=model_basename,
+        inject_fused_attention=False, # Required for TheBloke/FreeWilly2-GPTQ model at this time.
         use_safetensors=True,
         trust_remote_code=False,
         device="cuda:0",