Ask questions about training data construction

#8
by zzzzz2023 - opened

Hello, I have seen the code of your model. I would like to know the construction way of label in training, and how to better calculate the loss by process reward.@Zhenru Thank you for your answer

loss in the model code is calculated as loss_fct(logits.view(-1, self.num_labels), labels.view(-1)),But here the logits are the probabilities of tokens in the assistant, how should labels be constructed and logits directly calculate the cross entropy

Sign up or log in to comment