Regularized-Preference-Optimization - a ZHLiu627 Collection

ZHLiu627 's Collections

Regularized-Preference-Optimization

Regularized-Preference-Optimization

updated 4 days ago

The models trained in https://github.com/YSLIU627/Regularized-Preference-Optimization

Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer

Paper • 2405.16436 • Published May 26, 2024 • 1
ZHLiu627/zephyr-7b-gemma-rpo-avg

Updated 6 days ago • 31
ZHLiu627/zephyr-gemma-rpo

Text Generation • Updated Aug 1, 2024 • 6
ZHLiu627/beta_ultra_rdpo_full_eta0.005_beta0.01_no_decay_new

Updated 7 days ago • 5