trl-lib
/

Qwen2-0.5B-Reward-Math-Sheperd

Token Classification

Generated from Trainer

stepwise-reward-trainer

text-generation-inference

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

the model repeating the answer.

#1

by Imran1 - opened 17 days ago

Imran1

17 days ago

I think model need to train at least for 1 epoch. anyhow, great work.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment