Issues running model
Since the model_basename is not originally provided in the example code, I tried this:
from transformers import AutoTokenizer, pipeline, logging
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
import argparse
model_name_or_path = "TheBloke/starcoderplus-GPTQ"
model_basename = "gptq_model-4bit--1g.safetensors"
use_triton = False
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
model_basename=model_basename,
use_safetensors=True,
trust_remote_code=True,
device="cuda:0",
use_triton=use_triton,
quantize_config=None)
print("\n\n*** Generate:")
inputs = tokenizer.encode("def print_hello_world():", return_tensors="pt").to("cuda:0")
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))
But I always get the following:
FileNotFoundError: could not find model TheBloke/starcoderplus-GPTQ
When I remove the model_basename parameter, it downloads, but I get the following error with generate:
The safetensors archive passed at ~/.cache/huggingface/hub/models--TheBloke--starcoderplus-GPTQ/snapshots/aa67ff4fad65fc88f6281f3a2bcc0d648105ef96/gptq_model-4bit--1g.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.
*** Generate:
TypeError: generate() takes 1 positional argument but 2 were given
I am just using the original code provided, with no other alterations. I am able to load other models from your HF repos with autogptq but not this one specifically
Hmm you shouldn't need model_basename for this. Maybe that's an AutoGPTQ bug.
When it is required, you leave out the .safetensors
from the end, so it'smodel_basename=gptq_model-4bit--1g
Thank you for the insight. Do you have an idea of why the generate() issue is occurring when I remove the model_basename? Here is my code when I do:
from transformers import AutoTokenizer, pipeline, logging
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
import argparse
model_name_or_path = "TheBloke/starcoderplus-GPTQ"
use_triton = False
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
use_safetensors=True,
trust_remote_code=True,
device="cuda:0",
use_triton=use_triton,
quantize_config=None)
print("\n\n*** Generate:")
inputs = tokenizer.encode("def print_hello_world():", return_tensors="pt").to("cuda:0")
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))
As discussed on Discord:
This is caused by this bug: https://github.com/PanQiWei/AutoGPTQ/pull/135
Workaround is model.generate(inputs=inputs)
Fix: i just needed to set inputs=inputs on the generate command, TheBloke submitted a fix but did not have his MR accepted yet on the autogptq github.
Ensure the script uses the no-model-basename version I provided above.