about pad_token_id
#3
by
ToughStone
- opened
I got an error in loading the model:
size mismatch for model.decoder.embed_positions.weight: copying a param with shape torch.Size([1026, 768]) from checkpoint, the shape in current model is torch.Size([1025, 768]).
When creating the position embedded layer, the dimension is set to 1024+pad_ token_ id+1. In chinese vocabulary, pad_ token_ id=0, while in english it is 1. Where the problem is?
How did you load the model and tokenizer? These should be both loaded from bart-base-chinese.
yf
changed discussion status to
closed