Questions for training

by alibidaran - opened

Did you train your tokenizer for Persian corpus?
How many train steps do yo preserve for training llama?

Yes, the steps taken to train the model include:
1- Tokenizer training on persina corpus
2- Training LoRA model adapter on persian corpus
3- Instruct tuning on translated ALPACA and some similar corpus

I have the ALPACA Persian-style dataset, but I don't know how many training steps are required for training LLAMA2. I trained LLAMA2 for various English datasets, but Farsi even with a trained tokenizer, doesn't give me a considerable result.

Do you train adapter on Farsi dataset.
I trained it on 200 milions token.

mostafaamiri changed discussion status to closed

Sign up or log in to comment