abeja/gpt-neox-japanese-2.7b · Does gradient checkpointing work with this model?

May 14, 2023

Thank you for publishing such a wonderful model.
I am experiencing an issue where setting gradient_checkpointing=True in the TrainingArguments does not seem to reduce the VRAM usage during training.

Though my understanding may not be thorough, when I compare the source code of modeling_gpt_neox.py with modeling_gpt_neox_japanese.py, it appears that the latter does not include the conditional statement concerning self.gradient_checkpointing as seen here:
https://github.com/huggingface/transformers/blob/118e9810687dd713b6be07af79e80eeb1d916908/src/transformers/models/gpt_neox/modeling_gpt_neox.py#L546

Is this an intentional modification or perhaps an oversight? I would appreciate any insights you might have regarding this.

transformers v4.29.1

SO0529

ABEJA, Inc. org May 18, 2023

Thanks for looking at all the details and asking the question.
The difference regarding gradient_checkpointing is not intentional. At the time we submitted our pull request, GPT NeoX had the same configuration, but the gradient checkpointing has been corrected in the following commit, and the difference is now in place.
https://github.com/huggingface/transformers/commit/225c36fbe5ae2bdb1880da52e093c7e53596a7d1

oshizo

May 20, 2023

Thank you for your response! I now understand the situation.
It might be helpful if you could incorporate the support for gradient_checkpointing or provide a warning when this flag is set to True.

SO0529

ABEJA, Inc. org May 22, 2023

We cannot promise a completion date, but we have started preparing for PR. Thank you for reminding up of the update opportunity!