the answer is not terminated correctly

#58
by captainst - opened

I am using the 8-bit quantized version:

model = transformers.AutoModelForCausalLM.from_pretrained(
model_path,
trust_remote_code=True,
torch_dtype=torch.float16,
device_map="auto",
load_in_8bit=True
)

And inference:

fmt_ex = """

Instruction: Please write a poem with less than 200 words.

Response:

"""

with torch.autocast('cuda', dtype=torch.float16):
print(
pipe(fmt_ex,
max_new_tokens=256,
do_sample=True,
use_cache=True))

I can see that the last word is ended by "#", but then some random characters appear:

He will know I am here and I am meant to be by his side\n#d-link dl6000 # dlink dl6000 \n# dl6000\n- #n\n#d-link #dlinkdl6000 # dl60\n#dlink # dlinkdl6000 # dl6000 #dl\n- #dlk\n#d-link dl6\n#dlk\n##dlink dl6\n\n#d-link #dlink #dlinkdl6000 #dl\n#d

It seems to be a problem with the special tokens.

it turns out that I needed to set "stopping_criteria" when building the pipeline. I did not realize this point since many huggingface models already implemented this in their custom code.

captainst changed discussion status to closed

Sign up or log in to comment