Differences between modeling_qwen.py in nekomata-14b and Qwen-14b Repositories

#1
by shoey-ucci - opened

There appears to be a difference between the modeling_qwen.py file in the nekomata-14b repository and the one in the qwen-14b repository. You can find them at the following links:
https://huggingface.co./Qwen/Qwen-14B/blob/main/modeling_qwen.py#L522-L525
https://huggingface.co./rinna/nekomata-14b/blob/main/modeling_qwen.py#L522-L527

This discrepancy may be impacting the use of nekomata-14b with the latest https://github.com/QwenLM/Qwen repository's LoRA fine-tune implementation in a PyTorch 2 environment.
When attempting this, I encountered a
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation.

Hi @shoey-ucci , thank you for pointing it out.
I have just synced the modeling code with the latest official code.

tianyuz changed discussion status to closed

Sign up or log in to comment