model RM how?

by NickyNicky - opened Jan 23

Jan 23

Hello, thank you for the model.
I am interested in knowing more about the training of this model.
I trained the 'Qwen2-0.5B-GRPO' model and saw that it uses the gemma model for rewards.
21 steps

https://huggingface.co./NickyNicky/Qwen2-0.5B-GRPO_final

How did you train him and what were your scoring criteria for the prompts?
Is it possible to obtain the colab as you trained it?
Is it possible to obtain the datasets?

thank you so much.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment