Basic question: How would you reproduce the training of this model?

#1
by treehugg3 - opened

Hi there, I have a somewhat basic question: I am interested in training my own reward model (probably just on top of Llama 3.1 due to resource constraints) on your INF-ORM-Preference-Magnitude-80K dataset with some additional conversations that are more domain-specific.

Would you mind providing a pointer either to your training code or to a library that would enable me to replicate this work? Like, would I be able to get similar results with RewardTrainer from trl or would I need some other custom training library?

infly-ai org

Hi, we utilized the Megatron-LM for training and modified it to speed up training. This framework is private and internal for the company, but you can still use the open-source training framework. I believe this will not affect the final performance.

Thank you for the response!

treehugg3 changed discussion status to closed

Sign up or log in to comment