Regularized-Preference-Optimization
Collection
The models trained in https://github.com/YSLIU627/Regularized-Preference-Optimization
•
4 items
•
Updated
This models uses OpenRLHF Codebase for the average loss with the method Regularized-Preference-Optimization
. The SFT loss coefficient is 0.2
. The relevant paper is (https://arxiv.org/abs/2405.16436).