About Training Detail
#4
by
XinC6
- opened
Hello,
While reproducing OpenR1-Qwen-7B using your dataset Openr1-Math-220K default, I encountered an issue where the sequence length of the dataset is too long (some even exceed 25,000+ tokens), leading to excessive VRAM consumption.
Would you be able to share more training details or suggest any possible solutions to this issue? I would greatly appreciate your response!
Best regards.
The config we use on 8xH100 is here: https://github.com/huggingface/open-r1/blob/main/recipes/OpenR1-Qwen-7B/sft/config.yaml
If you have OOM error (and you already have bs=1) you can try to use 8bit optimizer or offloading to cpu.