About Training Detail

#4
by XinC6 - opened

Hello,

While reproducing OpenR1-Qwen-7B using your dataset Openr1-Math-220K default, I encountered an issue where the sequence length of the dataset is too long (some even exceed 25,000+ tokens), leading to excessive VRAM consumption.

Would you be able to share more training details or suggest any possible solutions to this issue? I would greatly appreciate your response!

Best regards.

Open R1 org

The config we use on 8xH100 is here: https://github.com/huggingface/open-r1/blob/main/recipes/OpenR1-Qwen-7B/sft/config.yaml
If you have OOM error (and you already have bs=1) you can try to use 8bit optimizer or offloading to cpu.

Sign up or log in to comment