yyqoni 's Collections

DenseRewardRLHF-PPO

This repository contains the released models for our paper Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model.