tinyllama-1.1b-sum-dpo-full_LR3e-8_BS32_3epochs_old

This model is a fine-tuned version of martimfasantos/tinyllama-1.1b-sum-sft-full_old on the openai/summarize_from_feedback dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6867
  • Rewards/chosen: -0.0478
  • Rewards/rejected: -0.0620
  • Rewards/accuracies: 0.5936
  • Rewards/margins: 0.0142
  • Logps/rejected: -69.3779
  • Logps/chosen: -63.4876
  • Logits/rejected: -3.0580
  • Logits/chosen: -3.0637

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 3e-08
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6931 0.0345 100 0.6932 0.0001 0.0001 0.4930 -0.0000 -63.1672 -58.7024 -3.1577 -3.1633
0.6931 0.0689 200 0.6932 0.0001 0.0001 0.4888 -0.0001 -63.1661 -58.7066 -3.1577 -3.1634
0.6931 0.1034 300 0.6932 0.0000 0.0001 0.4933 -0.0001 -63.1693 -58.7071 -3.1578 -3.1634
0.6931 0.1378 400 0.6932 0.0001 0.0001 0.4809 -0.0000 -63.1727 -58.7061 -3.1575 -3.1632
0.6931 0.1723 500 0.6931 0.0002 0.0002 0.5098 0.0000 -63.1633 -58.6928 -3.1577 -3.1634
0.6931 0.2068 600 0.6932 0.0002 0.0002 0.4937 -0.0000 -63.1596 -58.6920 -3.1574 -3.1630
0.6929 0.2412 700 0.6931 0.0003 0.0002 0.4905 0.0001 -63.1582 -58.6817 -3.1572 -3.1629
0.6929 0.2757 800 0.6931 0.0004 0.0003 0.5237 0.0001 -63.1485 -58.6703 -3.1566 -3.1622
0.6927 0.3101 900 0.6931 0.0006 0.0004 0.5186 0.0001 -63.1378 -58.6559 -3.1564 -3.1620
0.6925 0.3446 1000 0.6930 0.0008 0.0004 0.5279 0.0003 -63.1375 -58.6361 -3.1554 -3.1610
0.6924 0.3790 1100 0.6930 0.0009 0.0005 0.5560 0.0004 -63.1285 -58.6220 -3.1548 -3.1604
0.692 0.4135 1200 0.6929 0.0011 0.0006 0.5407 0.0005 -63.1206 -58.5973 -3.1539 -3.1595
0.6914 0.4480 1300 0.6928 0.0013 0.0007 0.5383 0.0006 -63.1120 -58.5819 -3.1528 -3.1584
0.6917 0.4824 1400 0.6927 0.0016 0.0006 0.5648 0.0009 -63.1160 -58.5533 -3.1518 -3.1574
0.6914 0.5169 1500 0.6926 0.0016 0.0006 0.5574 0.0010 -63.1243 -58.5539 -3.1505 -3.1561
0.6916 0.5513 1600 0.6926 0.0018 0.0007 0.5576 0.0012 -63.1145 -58.5288 -3.1493 -3.1549
0.6906 0.5858 1700 0.6925 0.0019 0.0004 0.5625 0.0014 -63.1358 -58.5250 -3.1471 -3.1527
0.6908 0.6203 1800 0.6923 0.0019 0.0002 0.5551 0.0017 -63.1602 -58.5198 -3.1456 -3.1513
0.6903 0.6547 1900 0.6922 0.0019 -0.0001 0.5720 0.0020 -63.1895 -58.5253 -3.1437 -3.1493
0.6895 0.6892 2000 0.6920 0.0016 -0.0007 0.5795 0.0023 -63.2502 -58.5471 -3.1418 -3.1475
0.6891 0.7236 2100 0.6919 0.0017 -0.0009 0.5818 0.0026 -63.2700 -58.5423 -3.1394 -3.1450
0.6906 0.7581 2200 0.6918 0.0013 -0.0016 0.5737 0.0028 -63.3380 -58.5865 -3.1376 -3.1432
0.6893 0.7926 2300 0.6917 0.0011 -0.0020 0.5730 0.0031 -63.3761 -58.6009 -3.1358 -3.1414
0.6899 0.8270 2400 0.6915 0.0006 -0.0028 0.5764 0.0034 -63.4591 -58.6538 -3.1338 -3.1394
0.6894 0.8615 2500 0.6914 0.0002 -0.0034 0.5743 0.0036 -63.5245 -58.6934 -3.1315 -3.1372
0.6883 0.8959 2600 0.6912 -0.0003 -0.0043 0.5764 0.0040 -63.6123 -58.7457 -3.1297 -3.1354
0.6875 0.9304 2700 0.6911 -0.0010 -0.0053 0.5781 0.0043 -63.7097 -58.8142 -3.1282 -3.1338
0.6871 0.9649 2800 0.6910 -0.0016 -0.0061 0.5760 0.0045 -63.7868 -58.8701 -3.1261 -3.1317
0.6871 0.9993 2900 0.6909 -0.0024 -0.0072 0.5762 0.0048 -63.8972 -58.9496 -3.1231 -3.1287
0.6874 1.0338 3000 0.6907 -0.0032 -0.0084 0.5834 0.0051 -64.0164 -59.0348 -3.1212 -3.1268
0.6859 1.0682 3100 0.6906 -0.0042 -0.0096 0.5806 0.0054 -64.1398 -59.1344 -3.1190 -3.1247
0.6842 1.1027 3200 0.6904 -0.0051 -0.0109 0.5839 0.0058 -64.2725 -59.2256 -3.1161 -3.1218
0.6884 1.1371 3300 0.6903 -0.0066 -0.0127 0.5874 0.0061 -64.4506 -59.3731 -3.1139 -3.1196
0.6858 1.1716 3400 0.6902 -0.0080 -0.0142 0.5785 0.0062 -64.5965 -59.5071 -3.1116 -3.1173
0.6859 1.2061 3500 0.6900 -0.0099 -0.0166 0.5832 0.0066 -64.8362 -59.7041 -3.1101 -3.1158
0.685 1.2405 3600 0.6899 -0.0115 -0.0185 0.5783 0.0069 -65.0265 -59.8637 -3.1069 -3.1126
0.6839 1.2750 3700 0.6898 -0.0129 -0.0202 0.5820 0.0072 -65.1978 -60.0064 -3.1049 -3.1106
0.6824 1.3094 3800 0.6896 -0.0145 -0.0220 0.5832 0.0076 -65.3850 -60.1580 -3.1023 -3.1080
0.6847 1.3439 3900 0.6895 -0.0161 -0.0240 0.5834 0.0078 -65.5760 -60.3265 -3.1007 -3.1064
0.6865 1.3784 4000 0.6894 -0.0179 -0.0261 0.5876 0.0081 -65.7873 -60.5061 -3.0990 -3.1047
0.6826 1.4128 4100 0.6892 -0.0197 -0.0282 0.5899 0.0085 -65.9972 -60.6782 -3.0968 -3.1025
0.6801 1.4473 4200 0.6890 -0.0209 -0.0299 0.5922 0.0090 -66.1658 -60.8002 -3.0952 -3.1009
0.6814 1.4817 4300 0.6890 -0.0227 -0.0318 0.5878 0.0091 -66.3577 -60.9789 -3.0926 -3.0983
0.683 1.5162 4400 0.6888 -0.0239 -0.0334 0.5913 0.0094 -66.5158 -61.1062 -3.0910 -3.0967
0.679 1.5507 4500 0.6887 -0.0255 -0.0352 0.5948 0.0097 -66.7038 -61.2636 -3.0892 -3.0949
0.6834 1.5851 4600 0.6886 -0.0275 -0.0375 0.5934 0.0100 -66.9283 -61.4618 -3.0871 -3.0928
0.685 1.6196 4700 0.6884 -0.0284 -0.0387 0.5929 0.0103 -67.0469 -61.5498 -3.0853 -3.0910
0.683 1.6540 4800 0.6883 -0.0294 -0.0400 0.5960 0.0106 -67.1815 -61.6491 -3.0831 -3.0889
0.6781 1.6885 4900 0.6882 -0.0307 -0.0416 0.5950 0.0109 -67.3424 -61.7858 -3.0820 -3.0877
0.6813 1.7229 5000 0.6881 -0.0317 -0.0426 0.5943 0.0110 -67.4448 -61.8785 -3.0805 -3.0863
0.6823 1.7574 5100 0.6880 -0.0328 -0.0440 0.5950 0.0112 -67.5799 -61.9921 -3.0789 -3.0846
0.6798 1.7919 5200 0.6879 -0.0341 -0.0457 0.5987 0.0116 -67.7483 -62.1205 -3.0772 -3.0829
0.6798 1.8263 5300 0.6877 -0.0353 -0.0472 0.5953 0.0119 -67.8958 -62.2422 -3.0757 -3.0814
0.6784 1.8608 5400 0.6876 -0.0368 -0.0489 0.5969 0.0122 -68.0724 -62.3875 -3.0742 -3.0798
0.6853 1.8952 5500 0.6876 -0.0377 -0.0500 0.5946 0.0123 -68.1765 -62.4820 -3.0735 -3.0792
0.6769 1.9297 5600 0.6875 -0.0392 -0.0517 0.5941 0.0125 -68.3471 -62.6278 -3.0713 -3.0771
0.6788 1.9642 5700 0.6874 -0.0399 -0.0526 0.5941 0.0127 -68.4439 -62.7029 -3.0701 -3.0759
0.6798 1.9986 5800 0.6873 -0.0410 -0.0538 0.5925 0.0128 -68.5632 -62.8140 -3.0694 -3.0752
0.683 2.0331 5900 0.6872 -0.0418 -0.0549 0.5934 0.0131 -68.6699 -62.8917 -3.0677 -3.0735
0.6766 2.0675 6000 0.6872 -0.0425 -0.0555 0.5918 0.0130 -68.7314 -62.9600 -3.0675 -3.0732
0.6756 2.1020 6100 0.6871 -0.0428 -0.0561 0.5922 0.0133 -68.7950 -62.9959 -3.0660 -3.0717
0.6805 2.1365 6200 0.6871 -0.0435 -0.0568 0.5904 0.0133 -68.8622 -63.0611 -3.0654 -3.0711
0.6797 2.1709 6300 0.6871 -0.0443 -0.0577 0.5929 0.0134 -68.9493 -63.1378 -3.0645 -3.0703
0.6802 2.2054 6400 0.6870 -0.0442 -0.0577 0.5913 0.0135 -68.9530 -63.1312 -3.0641 -3.0698
0.6802 2.2398 6500 0.6870 -0.0445 -0.0581 0.5934 0.0136 -68.9891 -63.1579 -3.0633 -3.0690
0.6806 2.2743 6600 0.6870 -0.0448 -0.0585 0.5925 0.0136 -69.0289 -63.1964 -3.0624 -3.0682
0.6755 2.3088 6700 0.6869 -0.0453 -0.0590 0.5918 0.0137 -69.0814 -63.2383 -3.0618 -3.0675
0.6826 2.3432 6800 0.6869 -0.0455 -0.0593 0.5962 0.0138 -69.1095 -63.2637 -3.0612 -3.0669
0.6786 2.3777 6900 0.6869 -0.0459 -0.0598 0.5892 0.0139 -69.1580 -63.3046 -3.0607 -3.0664
0.6798 2.4121 7000 0.6868 -0.0463 -0.0602 0.5934 0.0139 -69.2011 -63.3391 -3.0601 -3.0658
0.6762 2.4466 7100 0.6868 -0.0466 -0.0606 0.5936 0.0140 -69.2414 -63.3699 -3.0598 -3.0656
0.6782 2.4810 7200 0.6868 -0.0470 -0.0611 0.5918 0.0141 -69.2927 -63.4167 -3.0595 -3.0652
0.6821 2.5155 7300 0.6868 -0.0472 -0.0612 0.5943 0.0140 -69.3050 -63.4345 -3.0589 -3.0647
0.6806 2.5500 7400 0.6868 -0.0473 -0.0614 0.5908 0.0141 -69.3214 -63.4432 -3.0588 -3.0646
0.6824 2.5844 7500 0.6867 -0.0475 -0.0616 0.5918 0.0142 -69.3426 -63.4585 -3.0589 -3.0647
0.6789 2.6189 7600 0.6868 -0.0477 -0.0618 0.5915 0.0141 -69.3578 -63.4788 -3.0584 -3.0642
0.6768 2.6533 7700 0.6867 -0.0475 -0.0618 0.5946 0.0144 -69.3650 -63.4617 -3.0582 -3.0640
0.6808 2.6878 7800 0.6867 -0.0477 -0.0619 0.5918 0.0142 -69.3712 -63.4863 -3.0584 -3.0642
0.6782 2.7223 7900 0.6867 -0.0478 -0.0621 0.5925 0.0143 -69.3874 -63.4902 -3.0581 -3.0639
0.6794 2.7567 8000 0.6867 -0.0479 -0.0621 0.5897 0.0142 -69.3922 -63.5035 -3.0580 -3.0638
0.674 2.7912 8100 0.6867 -0.0479 -0.0621 0.5911 0.0142 -69.3883 -63.4992 -3.0580 -3.0638
0.6766 2.8256 8200 0.6866 -0.0478 -0.0622 0.5899 0.0144 -69.4003 -63.4938 -3.0581 -3.0639
0.6821 2.8601 8300 0.6867 -0.0479 -0.0622 0.5890 0.0143 -69.3970 -63.4998 -3.0579 -3.0637
0.6795 2.8946 8400 0.6867 -0.0478 -0.0621 0.5904 0.0142 -69.3868 -63.4954 -3.0580 -3.0637
0.679 2.9290 8500 0.6867 -0.0479 -0.0622 0.5925 0.0143 -69.3981 -63.4995 -3.0579 -3.0637
0.6816 2.9635 8600 0.6867 -0.0478 -0.0621 0.5922 0.0144 -69.3946 -63.4907 -3.0579 -3.0637
0.6751 2.9979 8700 0.6867 -0.0478 -0.0620 0.5936 0.0142 -69.3779 -63.4876 -3.0580 -3.0637

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.1.2
  • Datasets 2.19.2
  • Tokenizers 0.19.1
Downloads last month
20
Safetensors
Model size
1.1B params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for martimfasantos/tinyllama-1.1b-sum-dpo-full_LR3e-8_BS32_3epochs_old

Dataset used to train martimfasantos/tinyllama-1.1b-sum-dpo-full_LR3e-8_BS32_3epochs_old