--- license: apache-2.0 library_name: peft tags: - trl - dpo - alignment-handbook - generated_from_trainer base_model: TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T model-index: - name: tinyllama-1.1b-sum-dpo-qlora results: [] --- # tinyllama-1.1b-sum-dpo-qlora This model is a fine-tuned version of [TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T) on an unknown dataset. It achieves the following results on the evaluation set: - Logits/chosen: -3.0239 - Logits/rejected: -3.0176 - Logps/chosen: -166.7881 - Logps/rejected: -187.0472 - Loss: 0.6482 - Rewards/accuracies: 0.6171 - Rewards/chosen: -0.9538 - Rewards/margins: 0.1656 - Rewards/rejected: -1.1194 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-06 - train_batch_size: 4 - eval_batch_size: 8 - seed: 42 - distributed_type: multi-GPU - gradient_accumulation_steps: 4 - total_train_batch_size: 16 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Logits/chosen | Logits/rejected | Logps/chosen | Logps/rejected | Validation Loss | Rewards/accuracies | Rewards/chosen | Rewards/margins | Rewards/rejected | |:-------------:|:-----:|:----:|:-------------:|:---------------:|:------------:|:--------------:|:---------------:|:------------------:|:--------------:|:---------------:|:----------------:| | 0.6926 | 0.02 | 100 | -3.4980 | -3.4962 | -70.9186 | -74.6392 | 0.6930 | 0.5193 | 0.0049 | 0.0002 | 0.0047 | | 0.6919 | 0.03 | 200 | -3.4925 | -3.4908 | -69.9505 | -73.7540 | 0.6926 | 0.5678 | 0.0146 | 0.0011 | 0.0135 | | 0.6888 | 0.05 | 300 | -3.4861 | -3.4843 | -67.8994 | -72.0238 | 0.6911 | 0.5748 | 0.0351 | 0.0043 | 0.0308 | | 0.6864 | 0.07 | 400 | -3.4827 | -3.4809 | -69.7504 | -74.3218 | 0.6890 | 0.5627 | 0.0166 | 0.0087 | 0.0079 | | 0.6864 | 0.09 | 500 | -3.4687 | -3.4669 | -69.0559 | -74.2092 | 0.6864 | 0.5716 | 0.0235 | 0.0146 | 0.0090 | | 0.6729 | 0.1 | 600 | -3.4506 | -3.4489 | -71.3562 | -77.1629 | 0.6837 | 0.5869 | 0.0005 | 0.0211 | -0.0206 | | 0.6745 | 0.12 | 700 | -3.4487 | -3.4467 | -78.9372 | -85.9956 | 0.6786 | 0.5955 | -0.0753 | 0.0336 | -0.1089 | | 0.6681 | 0.14 | 800 | -3.4169 | -3.4151 | -90.1915 | -98.6570 | 0.6738 | 0.5955 | -0.1878 | 0.0477 | -0.2355 | | 0.6661 | 0.16 | 900 | -3.3755 | -3.3740 | -88.5994 | -97.6376 | 0.6715 | 0.5922 | -0.1719 | 0.0534 | -0.2253 | | 0.6686 | 0.17 | 1000 | -3.3483 | -3.3467 | -111.1606 | -121.9167 | 0.6681 | 0.5936 | -0.3975 | 0.0706 | -0.4681 | | 0.665 | 0.19 | 1100 | -3.3477 | -3.3463 | -92.1750 | -101.6747 | 0.6708 | 0.5950 | -0.2076 | 0.0580 | -0.2657 | | 0.6549 | 0.21 | 1200 | -3.3173 | -3.3159 | -107.3321 | -119.3906 | 0.6631 | 0.5974 | -0.3592 | 0.0836 | -0.4428 | | 0.6536 | 0.22 | 1300 | -3.2737 | -3.2722 | -121.8111 | -135.5439 | 0.6591 | 0.5978 | -0.5040 | 0.1004 | -0.6044 | | 0.6303 | 0.24 | 1400 | -3.2790 | -3.2775 | -111.6529 | -124.7296 | 0.6593 | 0.6055 | -0.4024 | 0.0938 | -0.4962 | | 0.6611 | 0.26 | 1500 | -3.2472 | -3.2454 | -132.2458 | -148.1280 | 0.6527 | 0.6138 | -0.6084 | 0.1219 | -0.7302 | | 0.6395 | 0.28 | 1600 | -3.2525 | -3.2505 | -126.2706 | -141.6170 | 0.6536 | 0.6155 | -0.5486 | 0.1165 | -0.6651 | | 0.678 | 0.29 | 1700 | -3.2125 | -3.2107 | -117.8728 | -131.2285 | 0.6587 | 0.6169 | -0.4646 | 0.0966 | -0.5612 | | 0.629 | 0.31 | 1800 | -3.1113 | -3.1087 | -146.8860 | -164.9026 | 0.6489 | 0.6187 | -0.7548 | 0.1432 | -0.8980 | | 0.6622 | 0.33 | 1900 | -3.1419 | -3.1399 | -125.9992 | -140.6700 | 0.6555 | 0.6069 | -0.5459 | 0.1097 | -0.6556 | | 0.64 | 0.34 | 2000 | -3.1847 | -3.1824 | -140.1714 | -156.3843 | 0.6523 | 0.6101 | -0.6876 | 0.1252 | -0.8128 | | 0.6479 | 0.36 | 2100 | -3.1160 | -3.1130 | -150.8988 | -167.6336 | 0.6537 | 0.6104 | -0.7949 | 0.1304 | -0.9253 | | 0.6023 | 0.38 | 2200 | -3.1479 | -3.1449 | -137.7163 | -153.7927 | 0.6536 | 0.6034 | -0.6631 | 0.1238 | -0.7869 | | 0.5962 | 0.4 | 2300 | -3.1012 | -3.0975 | -159.4141 | -177.2301 | 0.6523 | 0.6078 | -0.8800 | 0.1412 | -1.0212 | | 0.6176 | 0.41 | 2400 | -3.0320 | -3.0265 | -172.7089 | -192.7748 | 0.6506 | 0.6027 | -1.0130 | 0.1637 | -1.1767 | | 0.6255 | 0.43 | 2500 | -3.0629 | -3.0584 | -156.9642 | -175.3398 | 0.6507 | 0.6101 | -0.8555 | 0.1468 | -1.0023 | | 0.6075 | 0.45 | 2600 | -3.0877 | -3.0839 | -146.0736 | -162.3147 | 0.6547 | 0.6046 | -0.7466 | 0.1254 | -0.8721 | | 0.6282 | 0.47 | 2700 | -3.1221 | -3.1185 | -140.7325 | -157.2624 | 0.6531 | 0.6101 | -0.6932 | 0.1283 | -0.8216 | | 0.6495 | 0.48 | 2800 | -3.0926 | -3.0887 | -148.7372 | -166.3009 | 0.6517 | 0.6080 | -0.7733 | 0.1387 | -0.9119 | | 0.6202 | 0.5 | 2900 | -3.0787 | -3.0744 | -152.9659 | -170.9832 | 0.6512 | 0.6048 | -0.8156 | 0.1432 | -0.9588 | | 0.6252 | 0.52 | 3000 | -3.0824 | -3.0782 | -148.4267 | -166.3868 | 0.6505 | 0.6055 | -0.7702 | 0.1426 | -0.9128 | | 0.6082 | 0.53 | 3100 | -3.0723 | -3.0678 | -149.2047 | -167.4548 | 0.6500 | 0.6115 | -0.7779 | 0.1455 | -0.9235 | | 0.6072 | 0.55 | 3200 | -3.0863 | -3.0819 | -147.0810 | -164.9669 | 0.6499 | 0.6090 | -0.7567 | 0.1419 | -0.8986 | | 0.6142 | 0.57 | 3300 | -3.0087 | -3.0026 | -179.2665 | -200.5992 | 0.6468 | 0.6176 | -1.0786 | 0.1764 | -1.2549 | | 0.602 | 0.59 | 3400 | -3.0674 | -3.0624 | -150.3082 | -168.4087 | 0.6504 | 0.6136 | -0.7890 | 0.1440 | -0.9330 | | 0.605 | 0.6 | 3500 | -3.0590 | -3.0538 | -154.1790 | -172.9109 | 0.6497 | 0.6122 | -0.8277 | 0.1503 | -0.9780 | | 0.6263 | 0.62 | 3600 | -3.0721 | -3.0672 | -149.9757 | -168.0735 | 0.6508 | 0.6043 | -0.7857 | 0.1440 | -0.9297 | | 0.5961 | 0.64 | 3700 | -3.0151 | -3.0090 | -169.4567 | -189.3689 | 0.6492 | 0.6136 | -0.9805 | 0.1622 | -1.1426 | | 0.6273 | 0.65 | 3800 | -3.0117 | -3.0057 | -167.9805 | -187.6573 | 0.6494 | 0.6141 | -0.9657 | 0.1598 | -1.1255 | | 0.6183 | 0.67 | 3900 | -3.0137 | -3.0077 | -167.4417 | -187.2734 | 0.6488 | 0.6166 | -0.9603 | 0.1613 | -1.1217 | | 0.6051 | 0.69 | 4000 | -2.9974 | -2.9908 | -176.3739 | -197.1255 | 0.6482 | 0.6178 | -1.0496 | 0.1705 | -1.2202 | | 0.5867 | 0.71 | 4100 | -3.0151 | -3.0088 | -169.1084 | -189.3998 | 0.6484 | 0.6125 | -0.9770 | 0.1659 | -1.1429 | | 0.6554 | 0.72 | 4200 | -3.0270 | -3.0209 | -164.2755 | -184.0126 | 0.6489 | 0.6176 | -0.9287 | 0.1604 | -1.0891 | | 0.6053 | 0.74 | 4300 | -3.0362 | -3.0303 | -159.9774 | -179.4446 | 0.6489 | 0.6097 | -0.8857 | 0.1577 | -1.0434 | | 0.6153 | 0.76 | 4400 | -3.0351 | -3.0292 | -160.5470 | -180.1235 | 0.6489 | 0.6120 | -0.8914 | 0.1588 | -1.0502 | | 0.6145 | 0.78 | 4500 | -3.0378 | -3.0319 | -160.1720 | -179.6728 | 0.6490 | 0.6113 | -0.8876 | 0.1580 | -1.0457 | | 0.5798 | 0.79 | 4600 | -3.0308 | -3.0247 | -162.6813 | -182.4701 | 0.6488 | 0.6148 | -0.9127 | 0.1609 | -1.0736 | | 0.6218 | 0.81 | 4700 | -3.0307 | -3.0246 | -163.0493 | -182.9482 | 0.6486 | 0.6152 | -0.9164 | 0.1620 | -1.0784 | | 0.6102 | 0.83 | 4800 | -3.0259 | -3.0197 | -164.8939 | -184.9769 | 0.6484 | 0.6150 | -0.9348 | 0.1639 | -1.0987 | | 0.6176 | 0.84 | 4900 | -3.0273 | -3.0211 | -165.7554 | -185.9428 | 0.6483 | 0.6157 | -0.9435 | 0.1649 | -1.1084 | | 0.5907 | 0.86 | 5000 | -3.0259 | -3.0196 | -167.1301 | -187.4627 | 0.6482 | 0.6164 | -0.9572 | 0.1664 | -1.1236 | | 0.6534 | 0.88 | 5100 | -3.0211 | -3.0148 | -167.2241 | -187.5712 | 0.6481 | 0.6155 | -0.9581 | 0.1665 | -1.1246 | | 0.5973 | 0.9 | 5200 | -3.0194 | -3.0130 | -166.8823 | -187.1679 | 0.6483 | 0.6169 | -0.9547 | 0.1659 | -1.1206 | | 0.5975 | 0.91 | 5300 | -3.0248 | -3.0185 | -166.6118 | -186.8759 | 0.6482 | 0.6162 | -0.9520 | 0.1657 | -1.1177 | | 0.5986 | 0.93 | 5400 | -3.0249 | -3.0186 | -166.6502 | -186.8928 | 0.6483 | 0.6190 | -0.9524 | 0.1655 | -1.1179 | | 0.6025 | 0.95 | 5500 | -3.0252 | -3.0189 | -166.7467 | -186.9980 | 0.6483 | 0.6169 | -0.9534 | 0.1655 | -1.1189 | | 0.6149 | 0.96 | 5600 | -3.0244 | -3.0181 | -166.7859 | -187.1137 | 0.6480 | 0.6155 | -0.9538 | 0.1663 | -1.1201 | | 0.6275 | 0.98 | 5700 | -3.0245 | -3.0182 | -166.6791 | -186.9484 | 0.6482 | 0.6178 | -0.9527 | 0.1657 | -1.1184 | | 0.5876 | 1.0 | 5800 | -3.0239 | -3.0176 | -166.7881 | -187.0472 | 0.6482 | 0.6171 | -0.9538 | 0.1656 | -1.1194 | ### Framework versions - PEFT 0.7.1 - Transformers 4.39.3 - Pytorch 2.1.2 - Datasets 2.18.0 - Tokenizers 0.15.2