The current implementation of tokenizer cannot adopt left-padding
#2
by
hiyouga
- opened
For batched inference, a left-padded sequence is required, but the tokenizer class does not support left-padding.
According to the source code, the argument padding_side
has no effect in the __init__
method.
>>> tok = AutoTokenizer.from_pretrained("Qwen/Qwen-7B", trust_remote_code=True, use_fast=False, padding_side="left")
>>> tok.padding_side
'right'
https://huggingface.co./Qwen/Qwen-7B/blob/main/tokenization_qwen.py#L33
活捉大佬
Thank you for raising this problem. We have updated the code, and this should be fixed not. Please reopen this if the problem still exists.
jklj077
changed discussion status to
closed