UTI2_M2_1000steps_1e5rate_01beta_CSFTDPO

This model is a fine-tuned version of tsavage68/UTI_M2_1000steps_1e7rate_SFT on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.9857
  • Rewards/chosen: -5.2195
  • Rewards/rejected: -3.4974
  • Rewards/accuracies: 0.0400
  • Rewards/margins: -1.7222
  • Logps/rejected: -74.3298
  • Logps/chosen: -72.1167
  • Logits/rejected: 1.1725
  • Logits/chosen: 1.1724

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.8718 0.3333 25 0.8484 -2.9302 -14.2948 0.8400 11.3646 -182.3043 -49.2239 -2.7108 -2.7153
1.8764 0.6667 50 2.0267 -3.9079 -2.1496 0.0500 -1.7583 -60.8517 -59.0005 -0.0119 -0.0121
2.368 1.0 75 2.1981 -4.0485 -2.1677 0.1300 -1.8808 -61.0330 -60.4067 -0.7578 -0.7578
1.802 1.3333 100 2.2809 -4.0920 -2.1613 0.1600 -1.9306 -60.9696 -60.8411 -0.8665 -0.8665
1.8302 1.6667 125 2.1253 -4.1468 -2.3140 0.1000 -1.8328 -62.4957 -61.3891 -0.6683 -0.6683
2.109 2.0 150 2.0797 -4.2257 -2.4259 0.1000 -1.7999 -63.6147 -62.1788 -0.5669 -0.5669
1.7801 2.3333 175 2.0029 -4.2312 -2.4934 0.0500 -1.7378 -64.2898 -62.2331 -0.6146 -0.6146
2.0161 2.6667 200 2.1079 -4.1571 -2.3364 0.1000 -1.8207 -62.7205 -61.4927 -0.6148 -0.6148
2.1333 3.0 225 2.0488 -4.3309 -2.5546 0.0700 -1.7763 -64.9022 -63.2307 -0.4279 -0.4279
1.9667 3.3333 250 2.0994 -4.1512 -2.3367 0.0900 -1.8144 -62.7236 -61.4335 -0.6099 -0.6099
1.975 3.6667 275 2.0435 -4.3243 -2.5523 0.0600 -1.7720 -64.8788 -63.1645 -0.4185 -0.4184
1.8051 4.0 300 1.9829 -4.3085 -2.5886 0.0400 -1.7199 -65.2420 -63.0064 -0.4027 -0.4027
1.953 4.3333 325 2.0072 -4.3371 -2.5954 0.0500 -1.7417 -65.3105 -63.2929 -0.4070 -0.4070
2.2799 4.6667 350 2.1923 -7.3999 -5.5246 0.1300 -1.8754 -94.6021 -93.9210 -3.4528 -3.4531
1.921 5.0 375 2.2218 -5.5567 -3.6593 0.1300 -1.8974 -75.9492 -75.4888 -1.5346 -1.5339
1.8429 5.3333 400 1.9854 -7.6870 -5.9651 0.0400 -1.7218 -99.0076 -96.7912 -3.1616 -3.1613
1.8022 5.6667 425 1.9533 -4.2767 -2.5861 0.0200 -1.6907 -65.2171 -62.6890 0.9412 0.9412
2.3129 6.0 450 1.9431 -4.4284 -2.7482 0.0200 -1.6803 -66.8379 -64.2059 0.4988 0.4988
1.906 6.3333 475 2.0904 -7.0674 -5.2585 0.0900 -1.8088 -91.9414 -90.5951 -3.6276 -3.6276
1.6599 6.6667 500 2.3257 -4.5302 -2.5743 0.1600 -1.9559 -65.0988 -65.2237 0.2828 0.2828
2.1192 7.0 525 2.4249 -4.6675 -2.6590 0.1900 -2.0086 -65.9460 -66.5970 0.4401 0.4401
1.734 7.3333 550 2.4649 -4.6820 -2.6533 0.2100 -2.0287 -65.8892 -66.7413 0.4168 0.4168
2.0797 7.6667 575 1.9457 -5.0708 -3.3879 0.0200 -1.6829 -73.2348 -70.6292 1.0740 1.0740
1.9905 8.0 600 1.8612 -5.3637 -3.7940 0.0600 -1.5697 -77.2963 -73.5585 1.4106 1.4106
1.9525 8.3333 625 1.9808 -5.1006 -3.3827 0.0400 -1.7179 -73.1830 -70.9278 1.1564 1.1564
2.0246 8.6667 650 2.0176 -5.0560 -3.3053 0.0500 -1.7507 -72.4090 -70.4813 1.0910 1.0910
1.9163 9.0 675 1.9146 -5.2114 -3.5636 0.0600 -1.6478 -74.9921 -72.0358 1.2619 1.2618
1.9831 9.3333 700 2.1370 -4.9749 -3.1338 0.1100 -1.8411 -70.6938 -69.6701 0.9305 0.9305
2.1009 9.6667 725 2.0270 -5.0976 -3.3389 0.0500 -1.7587 -72.7453 -70.8974 1.0811 1.0810
1.8532 10.0 750 1.9858 -5.1569 -3.4344 0.0400 -1.7226 -73.6998 -71.4908 1.1467 1.1467
1.8101 10.3333 775 1.9913 -5.1561 -3.4284 0.0400 -1.7277 -73.6404 -71.4823 1.1431 1.1431
1.7788 10.6667 800 1.9572 -5.2409 -3.5461 0.0200 -1.6948 -74.8174 -72.3310 1.2172 1.2171
1.9172 11.0 825 1.9851 -5.1923 -3.4705 0.0400 -1.7218 -74.0612 -71.8445 1.1654 1.1654
1.9927 11.3333 850 1.9926 -5.1865 -3.4579 0.0400 -1.7287 -73.9347 -71.7869 1.1538 1.1538
1.7894 11.6667 875 1.9762 -5.2363 -3.5228 0.0300 -1.7135 -74.5845 -72.2844 1.1749 1.1749
1.7495 12.0 900 1.9855 -5.2126 -3.4905 0.0400 -1.7220 -74.2616 -72.0471 1.1714 1.1713
1.8748 12.3333 925 1.9857 -5.2150 -3.4928 0.0400 -1.7222 -74.2844 -72.0716 1.1714 1.1713
1.8576 12.6667 950 1.9853 -5.2202 -3.4983 0.0400 -1.7218 -74.3394 -72.1231 1.1732 1.1732
1.9874 13.0 975 1.9855 -5.2193 -3.4973 0.0400 -1.7219 -74.3294 -72.1140 1.1725 1.1724
1.8102 13.3333 1000 1.9857 -5.2195 -3.4974 0.0400 -1.7222 -74.3298 -72.1167 1.1725 1.1724

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.0.0+cu117
  • Datasets 2.19.2
  • Tokenizers 0.19.1
Downloads last month
4
Safetensors
Model size
7.24B params
Tensor type
FP16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for tsavage68/UTI2_M2_1000steps_1e5rate_01beta_CSFTDPO