Is attention mask wrong for batch generation?
#33
by
qingsonglv
- opened
For batch generation, the attention_mask is set to a single 1 referring to this line: https://huggingface.co./THUDM/chatglm-6b/blob/main/modeling_chatglm.py#L948
However, for a batch with various lengths, the left padded tokens are not masked in this case.
position id has the same problem I guess.
I tried to fix it with this PR: https://huggingface.co./THUDM/chatglm-6b/discussions/35
seems like my fault... there's no bug
qingsonglv
changed discussion status to
closed