google/gemma-2-9b-it · nonsense response when bsz>1

Jun 30, 2024

It seems that both 9b-it and 27b-it will generate nonsense responses when using a bsz > 1 for inference.
Would you kindly look into that?
Thank you!

lysandre

Google org Jul 1, 2024

Hey @OliverNova, thanks for your report! Can you please make sure you're using the latest transformers version (v4.42.3)?

If it still happens, do you mind sharing a reproducible code snippet for us to take a look at?

prajdabre

Jul 12, 2024

Hi @lysandre here is my code snippet:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-9b")
model = AutoModelForCausalLM.from_pretrained(
    "google/gemma-2-9b",
    device_map="auto",
    torch_dtype=torch.bfloat16
)

input_text = ["Write me a poem about Machine Learning.", "I want to eat a"]
input_ids = tokenizer.batch_encode_plus(input_text, return_tensors="pt", padding=True)

for t in input_ids:
    if torch.is_tensor(input_ids[t]):
        input_ids[t] = input_ids[t].to("cuda:0")

outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs))

With a bsz of 2, the second input doesnt seem to get processed at all. This wasnt the case with gemma1.

kbganesh

Aug 6, 2024

Is there any update on this, I am having the same issue.

prajdabre

Aug 6, 2024

Yes. Try using attn_implementation as eager when decoding.

https://github.com/huggingface/transformers/issues/31931

Renu11

Google org Aug 13, 2024

@jinjieni , I hope the issue has been resolved. Please let us know if any further assistance is needed. Thanks!