Display_Prompt = no
My understanding is this parameter should suppress the output of the Prompt. It appears to do nothing, or I'm putting it in the wrong place. Anyone know how to use it?
Can you share more details please?
Basically the prompt is always appearing in the output from the LLM.
Happens with both 1b and 3b instruct.
I am not using pipeline.
It happens whether I use a chat template / system, user approach or not.
I can share some code tomorrow if that helps. I'm basically looking for a parameter to just get the response displayed.
@thegamecat just split the output upto input length and take the rest as response:
response = tokenizer.decode(output[0][len(inputs[0]):])
The number of tokens is already exceeded by that point.
What do you mean exceeded? Just increase the max_new_tokens to a larger value then.