About Training Detail

by XinC6 - opened 5 days ago

5 days ago

Hello,

While reproducing OpenR1-Qwen-7B using your dataset Openr1-Math-220K default, I encountered an issue where the sequence length of the dataset is too long (some even exceed 25,000+ tokens), leading to excessive VRAM consumption.

Would you be able to share more training details or suggest any possible solutions to this issue? I would greatly appreciate your response!

Best regards.

eliebak

Open R1 org 3 days ago

The config we use on 8xH100 is here: https://github.com/huggingface/open-r1/blob/main/recipes/OpenR1-Qwen-7B/sft/config.yaml
If you have OOM error (and you already have bs=1) you can try to use 8bit optimizer or offloading to cpu.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment