infly/INF-ORM-Llama3.1-70B · Basic question: How would you reproduce the training of this model?

14 days ago

Hi there, I have a somewhat basic question: I am interested in training my own reward model (probably just on top of Llama 3.1 due to resource constraints) on your INF-ORM-Preference-Magnitude-80K dataset with some additional conversations that are more domain-specific.

Would you mind providing a pointer either to your training code or to a library that would enable me to replicate this work? Like, would I be able to get similar results with RewardTrainer from trl or would I need some other custom training library?

MinghaoYang

infly-ai org 13 days ago

Hi, we utilized the Megatron-LM for training and modified it to speed up training. This framework is private and internal for the company, but you can still use the open-source training framework. I believe this will not affect the final performance.

treehugg3

12 days ago

Thank you for the response!

treehugg3 changed discussion status to closed 12 days ago