On how much English token was the model trained onn
#5
by
aslawliet
- opened
On how much English token was the model trained on?
🤔I would say less than 3T tokens; that's for sure.
It seems to randomly mix in Chinese words when I didn't ask for it annoyingly, maybe the model is better in Chinese I don't speak it. Might be due to the GGUF version though? It seems to make a GGUF model you need to give it some examples, and I think it sometimes makes it worse at some tasks if it's not good example. Might be worth testing if providing it with only English or mixed examples makes the quant better and release separate version