Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model Paper • 2501.02790 • Published 22 days ago • 9 • 2