Adding Flash attention support for StableLMEpochForCausalLM

#8
by joel-wj - opened

Thank you for sharing the good model.
I want to further train this model to fit my data. I am trying to add data at the pretraining level, but it seems to take a long time because there is no flash attention in the modeling code. Can you please add a class Attention with flash attention in modeling_stablelm_epoch.py and add some code for supporting flash attention in stablelm model?

@joel-wj flash-attn v2 support has just been added. Thanks for the request, and sorry for the hold-up!

jon-tow changed discussion status to closed

Sign up or log in to comment