zephyr-7b-dpo-full-gpt_consistent-reward-scale-1-rpo / model-00001-of-00003.safetensors

Commit History

Training in progress, step 400
374e156
verified

sfulay commited on

Training in progress, step 300
2c7d48d
verified

sfulay commited on

Training in progress, step 200
133d893
verified

sfulay commited on

Training in progress, step 100
242749d
verified

sfulay commited on