Which model to use for continual pretraining

#1
by ayushkewl - opened

Hi, I plan to use the model for continual pretraining, I am wondering which model to use and how. Should it be the decayed LR model with the largest number ba and train using the context extension yaml first and then the LR decay yaml for a part of the data? Or am I missing something.

Sign up or log in to comment