Text Generation
Transformers
Safetensors
English
falcon_mamba
conversational
Inference Endpoints

There are doubts about the settings of eos_token, bos_token, pad_token

#7
by cl-modelcloud - opened

The three token settings in the tokenizer_config.json file are as follows,

"eos_token": "<|end_of_text|>",
"bos_token": "<|begin_of_text|>",
"pad_token": "<|end_of_text|>",

but in the config.json file,

"bos_token_id": 0,
"eos_token_id": 11,
"pad_token_id": 0,

These three token_ids correspond to
"bos_token_id": ">>TITLE<<",
"eos_token_id": "<|end_of_text|>",
"pad_token_id": ">>TITLE<<",

Which setting is correct?

Hi,
Thanks for spotting this ambiguity
It has been corrected now

Gkunsch changed discussion status to closed

Sign up or log in to comment