metadata
base_model: rasyosef/phi-1_5-sft
library_name: peft
license: mit
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: phi-1_5-dpo
results: []
datasets:
- HuggingFaceH4/ultrafeedback_binarized
- argilla/distilabel-intel-orca-dpo-pairs
- jondurbin/py-dpo-v0.1
- argilla/distilabel-math-preference-dpo
phi-1_5-dpo
This model is a fine-tuned version of rasyosef/phi-1_5-sft on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.5013
- Rewards/chosen: -1.0250
- Rewards/rejected: -2.3893
- Rewards/accuracies: 0.7283
- Rewards/margins: 1.3643
- Logps/rejected: -162.0916
- Logps/chosen: -128.1033
- Logits/rejected: 5.3082
- Logits/chosen: 5.1890
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 300
- num_epochs: 3
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6899 | 0.1241 | 138 | 0.6769 | -0.0153 | -0.0504 | 0.625 | 0.0351 | -138.7025 | -118.0066 | 4.5710 | 4.4532 |
0.6309 | 0.2482 | 276 | 0.6035 | -0.2012 | -0.5586 | 0.7120 | 0.3575 | -143.7850 | -119.8655 | 4.5167 | 4.3940 |
0.5756 | 0.3723 | 414 | 0.5669 | -0.3693 | -0.9842 | 0.7174 | 0.6149 | -148.0405 | -121.5467 | 4.6242 | 4.5060 |
0.5715 | 0.4964 | 552 | 0.5446 | -0.4109 | -1.1855 | 0.7283 | 0.7745 | -150.0534 | -121.9633 | 4.7324 | 4.6143 |
0.5449 | 0.6205 | 690 | 0.5331 | -0.4666 | -1.3090 | 0.7446 | 0.8424 | -151.2884 | -122.5196 | 4.8229 | 4.7080 |
0.5536 | 0.7446 | 828 | 0.5136 | -0.4885 | -1.3825 | 0.7446 | 0.8940 | -152.0234 | -122.7389 | 4.8867 | 4.7737 |
0.5253 | 0.8687 | 966 | 0.5057 | -0.5613 | -1.5446 | 0.7554 | 0.9832 | -153.6442 | -123.4672 | 4.9287 | 4.8080 |
0.5249 | 0.9928 | 1104 | 0.5054 | -0.5101 | -1.4656 | 0.75 | 0.9555 | -152.8544 | -122.9549 | 4.8704 | 4.7521 |
0.4631 | 1.1169 | 1242 | 0.5067 | -0.6889 | -1.7678 | 0.75 | 1.0789 | -155.8768 | -124.7426 | 4.8470 | 4.7276 |
0.4524 | 1.2410 | 1380 | 0.5006 | -0.7467 | -1.9049 | 0.7446 | 1.1582 | -157.2474 | -125.3205 | 4.9447 | 4.8239 |
0.424 | 1.3651 | 1518 | 0.5036 | -0.7638 | -2.0144 | 0.7337 | 1.2505 | -158.3425 | -125.4923 | 4.9235 | 4.8002 |
0.4428 | 1.4892 | 1656 | 0.5004 | -0.7790 | -2.0132 | 0.7446 | 1.2342 | -158.3307 | -125.6437 | 4.9576 | 4.8375 |
0.4424 | 1.6133 | 1794 | 0.4944 | -0.8220 | -2.0517 | 0.7391 | 1.2297 | -158.7152 | -126.0739 | 4.9736 | 4.8553 |
0.4358 | 1.7374 | 1932 | 0.5022 | -0.8091 | -1.9993 | 0.7228 | 1.1902 | -158.1918 | -125.9447 | 5.0894 | 4.9702 |
0.4426 | 1.8615 | 2070 | 0.4992 | -0.8254 | -2.0308 | 0.7228 | 1.2054 | -158.5065 | -126.1077 | 5.0943 | 4.9780 |
0.4226 | 1.9856 | 2208 | 0.4971 | -0.8701 | -2.1434 | 0.7283 | 1.2733 | -159.6329 | -126.5553 | 5.1222 | 5.0011 |
0.3684 | 2.1097 | 2346 | 0.5032 | -0.9201 | -2.2281 | 0.7228 | 1.3081 | -160.4799 | -127.0545 | 5.2209 | 5.1031 |
0.3695 | 2.2338 | 2484 | 0.5022 | -0.9332 | -2.2651 | 0.7228 | 1.3319 | -160.8495 | -127.1860 | 5.2170 | 5.0977 |
0.3693 | 2.3579 | 2622 | 0.5022 | -0.9418 | -2.2839 | 0.7283 | 1.3421 | -161.0379 | -127.2717 | 5.2390 | 5.1169 |
0.3659 | 2.4820 | 2760 | 0.5037 | -0.9820 | -2.3392 | 0.7228 | 1.3572 | -161.5908 | -127.6742 | 5.2392 | 5.1148 |
0.3557 | 2.6061 | 2898 | 0.5031 | -1.0001 | -2.3531 | 0.7228 | 1.3529 | -161.7294 | -127.8552 | 5.2704 | 5.1488 |
0.3491 | 2.7302 | 3036 | 0.5053 | -1.0242 | -2.3803 | 0.7228 | 1.3562 | -162.0017 | -128.0954 | 5.2880 | 5.1693 |
0.3512 | 2.8543 | 3174 | 0.5036 | -1.0265 | -2.3833 | 0.7174 | 1.3568 | -162.0320 | -128.1190 | 5.2965 | 5.1768 |
0.3458 | 2.9784 | 3312 | 0.5013 | -1.0250 | -2.3893 | 0.7283 | 1.3643 | -162.0916 | -128.1033 | 5.3082 | 5.1890 |
Framework versions
- PEFT 0.11.1
- Transformers 4.42.4
- Pytorch 2.3.1+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1