Thank you very much for this model, I have questions

by NickyNicky - opened 15 days ago

Discussion

NickyNicky

15 days ago

I would like to know how they made fine tune?

Did you use the huggingface trl GRPO libraries?

Could you provide the libraries for the training?

thank you so much

madiator

Open Thoughts org 15 days ago

We used llama factory. Code coming soon in https://github.com/open-thoughts/open-thoughts

Mykes

7 days ago

Thank you for your model! Did you use only SFT or another methods (like DPO, KTO or PPO)?

ryanmarten

Open Thoughts org 7 days ago

We only used SFT for this model

JJitsev

Open Thoughts org 3 days ago

It is important to emphasize here - we use ONLY SFT for training on the data (114k samples of reasoning traces from R1). There is no RL loss involved in training.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment