Tokenizer not adding BOS
#4
by
andreasgrv
- opened
Hi,
According to the generation config of this model:
{
"_from_model_config": true,
"bos_token_id": 1,
"eos_token_id": 2,
"transformers_version": "4.39.3"
}
There is a BOS with ID=1. However, with transformers==4.49.0
:
from transformers import AutoTokenizer
model = "HuggingFaceFW/ablation-model-fineweb-edu"
tokenizer = AutoTokenizer.from_pretrained(model)
tokenizer.encode('', return_tensors='pt', add_special_tokens=True)
Out[63]: tensor([], size=(1, 0))
Is this expected? (I would expect BOS to be added). I checked also if this changes if I set use_fast=False
, but nothing changes.