Preference-grounded Token-level Guidance for Language Model Fine-tuning Paper • 2306.00398 • Published Jun 1, 2023
A Dense Reward View on Aligning Text-to-Image Diffusion with Preference Paper • 2402.08265 • Published Feb 13, 2024
Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model Paper • 2501.02790 • Published 6 days ago • 8
DenseRewardRLHF-PPO Collection This repository contains the released models for our paper Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model. • 18 items • Updated 1 day ago • 1
Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model Paper • 2501.02790 • Published 6 days ago • 8