about pad_token_id

#3
by ToughStone - opened

I got an error in loading the model:
size mismatch for model.decoder.embed_positions.weight: copying a param with shape torch.Size([1026, 768]) from checkpoint, the shape in current model is torch.Size([1025, 768]).
When creating the position embedded layer, the dimension is set to 1024+pad_ token_ id+1. In chinese vocabulary, pad_ token_ id=0, while in english it is 1. Where the problem is?

Fudan NLP org

How did you load the model and tokenizer? These should be both loaded from bart-base-chinese.

yf changed discussion status to closed

Sign up or log in to comment