The models trained in https://github.com/YSLIU627/Regularized-Preference-Optimization
-
Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer
Paper • 2405.16436 • Published • 1 -
ZHLiu627/zephyr-7b-gemma-rpo-avg
Updated • 31 -
ZHLiu627/zephyr-gemma-rpo
Text Generation • Updated • 4 -
ZHLiu627/beta_ultra_rdpo_full_eta0.005_beta0.01_no_decay_new
Updated • 5