Training set up
Hi Peter,
Thank for sharing this. Do you have any examples Colab / Notebook on how to set up this type of model for training using the trainer API? I tried to follow the same process that I used for Longformer but the training metrics were 0 almost of all the time.
Thank you so much.
Hi Jorge,
Unfortunately, I don't have a notebook that I can immediately share (I use one for several different things with API tokens etc., in there); after I get that cleaned up-which might take a while, I am happy to share that.
That said, however, using Patrick Von Platen's LED notebook should work okay. The main difference between that and what I use is the addition of deepspeed, which you may want to try out! Btw, it is worth noting that in this size large model, I can only get to train with 16384 tokens input on an A100 GPU runtime.
Hope that helps!
This is definitely helpful. Thank you Peter.