Does this model apply SFT or SFT+RL during post-training?
i would also like to know
· Sign up or log in to comment