google/gemma-7b-it · how to extract model response from the output of tokenizer

Feb 24, 2024

I can use this model successfully. I am using it in chat mode to create the prompt and get the model result as follows:

chat = [
    { "role": "user", "content": "What is the name of the first moon lander?" },
]
prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
inputs = tokenizer.encode(prompt, add_special_tokens=True, return_tensors="pt")
outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=500)
print(tokenizer.decode(outputs[0]))

When the output is printed, it contains the prompt in chat format and also model generated test. How can I extract the model generated from this test?

Are there any tools in tokenizer to convert the model.generate() output to a dictionary or a JSON or similar data structure that can be easily worked with?

vivn

Feb 24, 2024

I meet this problem,I learn to this https://github.com/ygivenx/google-gemma/blob/main/get_started.py,it is work

hiyouga

Feb 25, 2024

You can simply use

tokenizer.decode(outputs[0, len(inputs):])

to extract the response.

suryabhupa

Google org Mar 4, 2024

Hope this helps, closing this issue as it seems resolved, thanks @hiyouga !

suryabhupa changed discussion status to closed Mar 4, 2024