Training Mistral-7B-v0.1 with Sliding Window = Null
Hello ,
I noticed that in the recent release of Mistral-7B-Instruct-v0.2, the sliding_window
parameter is set to null
. I'm curious to know if it's possible to apply the same setting when training the Mistral-7B-v0.1 model.
Could you please provide some guidance on this? Is there any specific reason why sliding_window
is set to null
in the newer version, and what would be the implications if we apply the same setting to the older version?
Thank you in advance for your help.
Hi @yardenhoch
Thanks for the issue! You can manually set a new value for sliding_window
in model's config. If you have cloned the repo locally you can modify the config file manually, otherwise you can do model.config.sliding_window = xxx
before launching training
@ybelkada Thank you for your response.
To clarify, does this mean that even if my prompt is longer than 4096 tokens, all the words will be processed together when sliding_window
is set to null
? I'm trying to understand the implications of this setting on longer prompts.
Hi @yardenhoch
To clarify, does this mean that even if my prompt is longer than 4096 tokens, all the words will be processed together when sliding_window is set to null?
I think so yes! all tokens will be processed together in case sliding_window is set to null