Not passing attention_mask in model.generate
Hi, I wonder why there's no need to pass the attention_mask (the commented line below) in model.generate during inference. Thanks!
outputs = self.model.generate(
input_ids=model_inputs['input_ids'],
pixel_values=model_inputs['pixel_values'],
# attention_mask=model_inputs['attention_mask'],
max_new_tokens=100,
early_stopping=False,
do_sample=False,
)
hi, Florence-2 language model is encoder-decoder, and the attention_mask for inputs are all ones
But wouldn't we want an attention mask for padded tokens? We don't want to attend over padded tokens in the encoder or am I misunderstanding?
I've also noticed issue with padding tokens attention - model accuracy was different in single sample vs batch inference, because attention mask for pad tokens is 1 when doing batch inference. I did a small code change in Florence-2-base code that allows to pass text attention mask to the model https://huggingface.co./microsoft/Florence-2-base/discussions/17