Ġ in tokenizer
#20
by
Sm1Ling
- opened
Why are there so many characters "Ġ" in tokenizer?
my understanding is that this character simply indicates the beginning of a word. I think its presence improves model's behavior around word boundaries.
I appreciate your answer a lot!
Sm1Ling
changed discussion status to
closed