Ask questions about training data construction
#8
by
zzzzz2023
- opened
Hello, I have seen the code of your model. I would like to know the construction way of label in training, and how to better calculate the loss by process reward.@Zhenru Thank you for your answer
loss in the model code is calculated as loss_fct(logits.view(-1, self.num_labels), labels.view(-1)),But here the logits are the probabilities of tokens in the assistant, how should labels be constructed and logits directly calculate the cross entropy