Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer
Paper
•
2405.16436
•
Published
•
1
The models trained in https://github.com/YSLIU627/Regularized-Preference-Optimization