Is attention mask wrong for batch generation?

#33

by qingsonglv - opened Apr 10, 2023

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org Apr 10, 2023

For batch generation, the attention_mask is set to a single 1 referring to this line: https://huggingface.co./THUDM/chatglm-6b/blob/main/modeling_chatglm.py#L948

However, for a batch with various lengths, the left padded tokens are not masked in this case.

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org Apr 10, 2023

position id has the same problem I guess.

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org Apr 10, 2023

•

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org Apr 11, 2023

seems like my fault... there's no bug

qingsonglv changed discussion status to closed Apr 11, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment