DenseRewardRLHF-PPO
Collection
This repository contains the released models for our paper Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model.
•
18 items
•
Updated
•
1