S4nto/lora-dpo-finetuned-model-beta-0.1-rate-1e5-stage2-iter40000-sft Text Generation • Updated May 16 • 13
S4nto/lora-dpo-finetuned-model-beta-0.5-rate-2e6-stage2-iter40000-sft Text Generation • Updated May 15 • 12
S4nto/lora-dpo-finetuned-model-beta-0.1-rate-1e6-stage2-iter40000-sft Text Generation • Updated May 15 • 10
S4nto/lora-dpo-finetuned-model-beta-0.5-rate-1e6-stage2-iter40000-sft Text Generation • Updated May 15 • 10