Which model to use for continual pretraining

by ayushkewl - opened 5 days ago

5 days ago

Hi, I plan to use the model for continual pretraining, I am wondering which model to use and how. Should it be the decayed LR model with the largest number ba and train using the context extension yaml first and then the LR decay yaml for a part of the data? Or am I missing something.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment