zephyr-7b-dpo-qlora
This model is a fine-tuned version of dball/zephyr-7b-sft-qlora on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:
- Loss: 0.5058
- Rewards/chosen: -2.0144
- Rewards/rejected: -3.0238
- Rewards/accuracies: 0.7350
- Rewards/margins: 1.0093
- Logps/rejected: -550.9584
- Logps/chosen: -469.9345
- Logits/rejected: 1.9679
- Logits/chosen: 1.2121
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 1
- eval_batch_size: 2
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 8
- total_train_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Logits/chosen | Logits/rejected | Logps/chosen | Logps/rejected | Validation Loss | Rewards/accuracies | Rewards/chosen | Rewards/margins | Rewards/rejected |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6934 | 0.01 | 100 | -2.5261 | -2.4383 | -268.4692 | -248.5731 | 0.6931 | 0.5105 | 0.0002 | 0.0001 | 0.0001 |
0.6924 | 0.03 | 200 | -2.5247 | -2.4368 | -268.3451 | -248.5511 | 0.6926 | 0.5605 | 0.0014 | 0.0011 | 0.0003 |
0.691 | 0.04 | 300 | -2.5253 | -2.4378 | -267.5839 | -248.1753 | 0.6907 | 0.6440 | 0.0091 | 0.0050 | 0.0041 |
0.6876 | 0.05 | 400 | -2.5230 | -2.4351 | -264.4353 | -246.3089 | 0.6845 | 0.6580 | 0.0405 | 0.0178 | 0.0227 |
0.6799 | 0.07 | 500 | -2.4660 | -2.3755 | -264.9495 | -249.9276 | 0.6707 | 0.6815 | 0.0354 | 0.0489 | -0.0135 |
0.6577 | 0.08 | 600 | -2.3601 | -2.2541 | -280.7885 | -272.3604 | 0.6462 | 0.6750 | -0.1230 | 0.1148 | -0.2378 |
0.6365 | 0.09 | 700 | -2.3136 | -2.2013 | -277.0453 | -272.2037 | 0.6345 | 0.6860 | -0.0856 | 0.1507 | -0.2362 |
0.6519 | 0.1 | 800 | -2.1835 | -2.0482 | -317.9223 | -320.8872 | 0.6240 | 0.6630 | -0.4943 | 0.2287 | -0.7231 |
0.6547 | 0.12 | 900 | -2.2184 | -2.0783 | -325.8177 | -331.4542 | 0.6203 | 0.6695 | -0.5733 | 0.2555 | -0.8287 |
0.5841 | 0.13 | 1000 | -2.2086 | -2.0689 | -322.0998 | -334.5816 | 0.6071 | 0.6820 | -0.5361 | 0.3239 | -0.8600 |
0.5877 | 0.14 | 1100 | -1.3836 | -1.1053 | -383.4380 | -410.8678 | 0.5947 | 0.6855 | -1.1495 | 0.4734 | -1.6229 |
0.5552 | 0.16 | 1200 | -0.7372 | -0.3614 | -411.0459 | -437.9200 | 0.5909 | 0.6880 | -1.4256 | 0.4678 | -1.8934 |
0.5492 | 0.17 | 1300 | -0.5949 | -0.1933 | -414.6323 | -446.2910 | 0.5791 | 0.6935 | -1.4614 | 0.5157 | -1.9771 |
0.5789 | 0.18 | 1400 | -0.5846 | -0.1908 | -356.4832 | -384.9109 | 0.5771 | 0.7035 | -0.8799 | 0.4834 | -1.3633 |
0.5456 | 0.2 | 1500 | -0.1574 | 0.3098 | -386.9436 | -427.7158 | 0.5646 | 0.7035 | -1.1845 | 0.6068 | -1.7913 |
0.4722 | 0.21 | 1600 | 0.0346 | 0.5395 | -400.9113 | -442.8174 | 0.5598 | 0.7075 | -1.3242 | 0.6181 | -1.9424 |
0.5072 | 0.22 | 1700 | 0.4657 | 1.0411 | -418.8860 | -465.2537 | 0.5574 | 0.7060 | -1.5040 | 0.6628 | -2.1667 |
0.5284 | 0.24 | 1800 | 0.6528 | 1.2404 | -423.3542 | -469.1293 | 0.5534 | 0.7070 | -1.5486 | 0.6568 | -2.2055 |
0.5623 | 0.25 | 1900 | 0.3058 | 0.7808 | -439.5539 | -491.0526 | 0.5625 | 0.7055 | -1.7106 | 0.7141 | -2.4247 |
0.6092 | 0.26 | 2000 | 0.0079 | 0.5199 | -370.0728 | -413.7089 | 0.5501 | 0.7085 | -1.0158 | 0.6354 | -1.6513 |
0.5726 | 0.27 | 2100 | 0.4405 | 0.9981 | -415.4569 | -464.3842 | 0.5433 | 0.7150 | -1.4697 | 0.6884 | -2.1580 |
0.5323 | 0.29 | 2200 | 0.7445 | 1.3533 | -400.2244 | -457.4451 | 0.5483 | 0.7150 | -1.3173 | 0.7713 | -2.0886 |
0.5148 | 0.3 | 2300 | 0.5107 | 1.1454 | -400.4308 | -450.4646 | 0.5387 | 0.7275 | -1.3194 | 0.6994 | -2.0188 |
0.4112 | 0.31 | 2400 | 0.6648 | 1.2866 | -430.5040 | -490.7723 | 0.5401 | 0.7200 | -1.6201 | 0.8018 | -2.4219 |
0.5246 | 0.33 | 2500 | 1.0914 | 1.7388 | -481.2729 | -538.2222 | 0.5413 | 0.7220 | -2.1278 | 0.7686 | -2.8964 |
0.5657 | 0.34 | 2600 | 0.9886 | 1.6571 | -437.1172 | -495.0003 | 0.5373 | 0.7200 | -1.6863 | 0.7779 | -2.4642 |
0.5216 | 0.35 | 2700 | 1.1290 | 1.7936 | -467.4365 | -522.5278 | 0.5357 | 0.7260 | -1.9895 | 0.7500 | -2.7395 |
0.5865 | 0.37 | 2800 | 1.1019 | 1.7565 | -478.5605 | -529.6149 | 0.5351 | 0.7260 | -2.1007 | 0.7096 | -2.8103 |
0.5252 | 0.38 | 2900 | 0.9108 | 1.5686 | -426.6496 | -492.7397 | 0.5376 | 0.7205 | -1.5816 | 0.8600 | -2.4416 |
0.5381 | 0.39 | 3000 | 1.0233 | 1.7206 | -422.6485 | -485.7741 | 0.5306 | 0.7230 | -1.5416 | 0.8303 | -2.3719 |
0.4587 | 0.41 | 3100 | 1.1221 | 1.8445 | -413.6005 | -467.0778 | 0.5222 | 0.7260 | -1.4511 | 0.7339 | -2.1850 |
0.5173 | 0.42 | 3200 | 0.8981 | 1.6186 | -403.9989 | -462.4095 | 0.5277 | 0.7260 | -1.3551 | 0.7832 | -2.1383 |
0.5851 | 0.43 | 3300 | 1.2860 | 2.0344 | -437.1258 | -498.6931 | 0.5181 | 0.7325 | -1.6864 | 0.8148 | -2.5011 |
0.5811 | 0.44 | 3400 | 1.0162 | 1.7238 | -428.5590 | -492.4408 | 0.5166 | 0.7335 | -1.6007 | 0.8379 | -2.4386 |
0.4892 | 0.46 | 3500 | 1.3014 | 2.0709 | -415.6104 | -480.9519 | 0.5257 | 0.7280 | -1.4712 | 0.8525 | -2.3237 |
0.5438 | 0.47 | 3600 | 1.4150 | 2.2020 | -428.1592 | -493.0664 | 0.5252 | 0.7275 | -1.5967 | 0.8482 | -2.4449 |
0.5677 | 0.48 | 3700 | 1.6843 | 2.4678 | -465.7504 | -529.8630 | 0.5152 | 0.7275 | -1.9726 | 0.8402 | -2.8128 |
0.5471 | 0.5 | 3800 | 1.4352 | 2.2022 | -475.7978 | -551.5833 | 0.5240 | 0.7255 | -2.0731 | 0.9569 | -3.0300 |
0.5193 | 0.51 | 3900 | 1.3990 | 2.1469 | -485.6194 | -559.7596 | 0.5185 | 0.7340 | -2.1713 | 0.9405 | -3.1118 |
0.5764 | 0.52 | 4000 | 1.1192 | 1.8653 | -469.0576 | -545.9298 | 0.5177 | 0.7310 | -2.0057 | 0.9678 | -2.9735 |
0.504 | 0.54 | 4100 | 1.0344 | 1.7948 | -450.8565 | -523.1135 | 0.5180 | 0.7270 | -1.8237 | 0.9217 | -2.7453 |
0.4846 | 0.55 | 4200 | 1.3329 | 2.1064 | -480.6317 | -553.0635 | 0.5168 | 0.7260 | -2.1214 | 0.9234 | -3.0448 |
0.426 | 0.56 | 4300 | 1.2900 | 2.0377 | -469.9074 | -543.4855 | 0.5096 | 0.7325 | -2.0142 | 0.9349 | -2.9490 |
0.5289 | 0.58 | 4400 | 1.0286 | 1.7669 | -464.7332 | -542.2659 | 0.5143 | 0.7260 | -1.9624 | 0.9744 | -2.9368 |
0.4542 | 0.59 | 4500 | 1.1395 | 1.8775 | -464.9223 | -541.3861 | 0.5102 | 0.7335 | -1.9643 | 0.9637 | -2.9280 |
0.4839 | 0.6 | 4600 | 1.1472 | 1.8858 | -468.8564 | -546.4150 | 0.5094 | 0.7305 | -2.0037 | 0.9747 | -2.9783 |
0.5562 | 0.62 | 4700 | 1.1999 | 1.9384 | -471.0873 | -546.7677 | 0.5076 | 0.7340 | -2.0260 | 0.9559 | -2.9819 |
0.4964 | 0.63 | 4800 | 1.3968 | 2.1538 | -485.7305 | -561.4290 | 0.5078 | 0.7335 | -2.1724 | 0.9561 | -3.1285 |
0.4879 | 0.64 | 4900 | 1.3802 | 2.1324 | -489.5623 | -571.5599 | 0.5125 | 0.7310 | -2.2107 | 1.0191 | -3.2298 |
0.4916 | 0.65 | 5000 | 1.3780 | 2.1161 | -478.1451 | -558.6430 | 0.5087 | 0.7300 | -2.0966 | 1.0041 | -3.1006 |
0.5806 | 0.67 | 5100 | 1.3595 | 2.0897 | -491.2838 | -572.3604 | 0.5089 | 0.7305 | -2.2279 | 1.0099 | -3.2378 |
0.5027 | 0.68 | 5200 | 1.0714 | 1.8014 | -458.1095 | -531.8434 | 0.5038 | 0.7375 | -1.8962 | 0.9364 | -2.8326 |
0.4554 | 0.69 | 5300 | 1.1555 | 1.8905 | -463.9870 | -540.6600 | 0.5052 | 0.7330 | -1.9550 | 0.9658 | -2.9208 |
0.4521 | 0.71 | 5400 | 1.1076 | 1.8437 | -467.6124 | -543.2982 | 0.5039 | 0.7370 | -1.9912 | 0.9559 | -2.9472 |
0.5869 | 0.72 | 5500 | 1.1574 | 1.8865 | -485.5281 | -564.9521 | 0.5054 | 0.7360 | -2.1704 | 0.9933 | -3.1637 |
0.5924 | 0.73 | 5600 | 0.8215 | 1.5325 | -450.2935 | -527.0139 | 0.5064 | 0.7320 | -1.8180 | 0.9663 | -2.7843 |
0.4275 | 0.75 | 5700 | 0.9960 | 1.7229 | -469.1932 | -549.8819 | 0.5055 | 0.7340 | -2.0070 | 1.0060 | -3.0130 |
0.4746 | 0.76 | 5800 | 1.1168 | 1.8507 | -489.1825 | -573.2806 | 0.5072 | 0.7300 | -2.2069 | 1.0401 | -3.2470 |
0.5033 | 0.77 | 5900 | 0.9675 | 1.7071 | -458.1062 | -536.0162 | 0.5061 | 0.7275 | -1.8962 | 0.9782 | -2.8744 |
0.4517 | 0.79 | 6000 | 0.8156 | 1.5613 | -441.7279 | -516.7132 | 0.5105 | 0.7265 | -1.7324 | 0.9489 | -2.6813 |
0.5071 | 0.8 | 6100 | 0.9370 | 1.6895 | -454.8272 | -534.7506 | 0.5116 | 0.7275 | -1.8634 | 0.9983 | -2.8617 |
0.6455 | 0.81 | 6200 | 0.9542 | 1.7120 | -456.4508 | -536.0126 | 0.5110 | 0.7250 | -1.8796 | 0.9947 | -2.8743 |
0.4796 | 0.82 | 6300 | 1.0203 | 1.7784 | -460.9879 | -543.0519 | 0.5112 | 0.7260 | -1.9250 | 1.0197 | -2.9447 |
0.5568 | 0.84 | 6400 | 1.1152 | 1.8764 | -463.8810 | -545.5328 | 0.5086 | 0.7275 | -1.9539 | 1.0156 | -2.9695 |
0.4335 | 0.85 | 6500 | 1.1822 | 1.9425 | -468.9681 | -550.4982 | 0.5067 | 0.7295 | -2.0048 | 1.0144 | -3.0192 |
0.5263 | 0.86 | 6600 | 1.1806 | 1.9390 | -465.3099 | -546.2759 | 0.5066 | 0.7310 | -1.9682 | 1.0087 | -2.9769 |
0.5263 | 0.88 | 6700 | 1.1794 | 1.9366 | -465.6784 | -546.6119 | 0.5066 | 0.7320 | -1.9719 | 1.0084 | -2.9803 |
0.4939 | 0.89 | 6800 | 1.2238 | 1.9795 | -470.5374 | -551.8629 | 0.5063 | 0.7325 | -2.0205 | 1.0123 | -3.0328 |
0.5763 | 0.9 | 6900 | 1.2027 | 1.9579 | -469.4713 | -550.4863 | 0.5060 | 0.7330 | -2.0098 | 1.0092 | -3.0191 |
0.5062 | 0.92 | 7000 | 1.2018 | 1.9574 | -468.7946 | -549.6514 | 0.5059 | 0.7320 | -2.0030 | 1.0077 | -3.0107 |
0.4432 | 0.93 | 7100 | 1.2115 | 1.9675 | -469.8141 | -550.7594 | 0.5059 | 0.7330 | -2.0132 | 1.0085 | -3.0218 |
0.5294 | 0.94 | 7200 | 1.2123 | 1.9679 | -469.9014 | -550.8820 | 0.5059 | 0.7315 | -2.0141 | 1.0089 | -3.0230 |
0.4488 | 0.96 | 7300 | 1.2130 | 1.9688 | -469.9289 | -550.9682 | 0.5058 | 0.7320 | -2.0144 | 1.0095 | -3.0239 |
0.4747 | 0.97 | 7400 | 1.2122 | 1.9679 | -469.9052 | -550.9178 | 0.5057 | 0.7325 | -2.0142 | 1.0092 | -3.0234 |
0.4494 | 0.98 | 7500 | 1.2121 | 1.9679 | -469.9345 | -550.9584 | 0.5058 | 0.7350 | -2.0144 | 1.0093 | -3.0238 |
0.5319 | 0.99 | 7600 | 1.2121 | 1.9679 | -469.9345 | -550.9584 | 0.5058 | 0.7350 | -2.0144 | 1.0093 | -3.0238 |
Framework versions
- PEFT 0.7.1
- Transformers 4.36.2
- Pytorch 2.1.2
- Datasets 2.14.6
- Tokenizers 0.15.0
- Downloads last month
- 15
Model tree for dball/zephyr-7b-dpo-qlora
Base model
mistralai/Mistral-7B-v0.1