UTI2_M2_1000steps_1e6rate_01beta_CSFTDPO

This model is a fine-tuned version of tsavage68/UTI_M2_1000steps_1e7rate_SFT on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2525
  • Rewards/chosen: -4.6992
  • Rewards/rejected: -18.9442
  • Rewards/accuracies: 0.8800
  • Rewards/margins: 14.2450
  • Logps/rejected: -228.7986
  • Logps/chosen: -66.9138
  • Logits/rejected: -1.9681
  • Logits/chosen: -2.0652

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.1981 0.3333 25 0.1707 0.4309 -7.1069 0.8800 7.5378 -110.4247 -15.6125 -2.4808 -2.4822
0.5206 0.6667 50 0.0694 0.4032 -9.1284 0.9000 9.5316 -130.6399 -15.8895 -2.4773 -2.4842
0.0347 1.0 75 0.3666 -3.1187 -8.7205 0.7800 5.6018 -126.5613 -51.1087 -2.4899 -2.4995
0.0788 1.3333 100 0.4184 0.4123 -15.2113 0.8800 15.6236 -191.4688 -15.7981 -2.4522 -2.4834
0.052 1.6667 125 0.4180 0.4111 -15.2137 0.8800 15.6248 -191.4934 -15.8105 -2.4523 -2.4835
0.1213 2.0 150 0.3560 -3.5267 -16.7763 0.8800 13.2496 -207.1189 -55.1884 -1.9818 -2.0298
0.2125 2.3333 175 0.1987 -5.3431 -19.4504 0.8900 14.1074 -233.8604 -73.3521 -2.0857 -2.1454
0.5195 2.6667 200 0.2487 -5.1181 -17.0835 0.8700 11.9655 -210.1916 -71.1022 -1.9512 -2.0125
0.0347 3.0 225 0.5257 -4.6932 -16.4184 0.8700 11.7251 -203.5397 -66.8538 -1.9297 -1.9821
0.0347 3.3333 250 0.5252 -4.6368 -16.5979 0.8700 11.9611 -205.3350 -66.2896 -1.9309 -1.9836
0.0693 3.6667 275 0.5277 -4.6207 -16.6952 0.8700 12.0744 -206.3079 -66.1288 -1.9307 -1.9833
0.3528 4.0 300 0.2783 -5.2353 -17.9334 0.8800 12.6980 -218.6897 -72.2747 -2.1007 -2.2018
0.0866 4.3333 325 0.5393 -5.1111 -18.7048 0.8700 13.5937 -226.4044 -71.0329 -2.0012 -2.0945
0.0347 4.6667 350 0.5419 -5.1092 -18.7024 0.8700 13.5933 -226.3804 -71.0133 -2.0016 -2.0948
0.0693 5.0 375 0.2439 -4.7227 -18.8694 0.8900 14.1466 -228.0500 -67.1490 -1.9662 -2.0633
0.0693 5.3333 400 0.2601 -4.7346 -18.8758 0.8800 14.1412 -228.1138 -67.2672 -1.9665 -2.0635
0.052 5.6667 425 0.2475 -4.7163 -18.8815 0.8800 14.1653 -228.1716 -67.0841 -1.9664 -2.0634
0.0866 6.0 450 0.2467 -4.7194 -18.8925 0.8800 14.1731 -228.2814 -67.1153 -1.9668 -2.0638
0.0173 6.3333 475 0.2504 -4.7284 -18.8991 0.8800 14.1708 -228.3476 -67.2055 -1.9666 -2.0636
0.1386 6.6667 500 0.2482 -4.7124 -18.9127 0.8800 14.2002 -228.4828 -67.0460 -1.9668 -2.0640
0.0173 7.0 525 0.2503 -4.7152 -18.9086 0.8800 14.1934 -228.4421 -67.0739 -1.9667 -2.0637
0.0866 7.3333 550 0.2469 -4.7026 -18.9134 0.8800 14.2107 -228.4901 -66.9480 -1.9667 -2.0638
0.0347 7.6667 575 0.2506 -4.7014 -18.9296 0.8800 14.2282 -228.6524 -66.9354 -1.9673 -2.0644
0.0866 8.0 600 0.2593 -4.7150 -18.9165 0.8800 14.2016 -228.5215 -67.0713 -1.9673 -2.0643
0.104 8.3333 625 0.2524 -4.7020 -18.9396 0.8800 14.2376 -228.7525 -66.9418 -1.9674 -2.0645
0.0173 8.6667 650 0.2503 -4.7003 -18.9479 0.8800 14.2476 -228.8348 -66.9245 -1.9674 -2.0645
0.0693 9.0 675 0.2511 -4.7024 -18.9458 0.8800 14.2433 -228.8138 -66.9459 -1.9682 -2.0653
0.0693 9.3333 700 0.2497 -4.7013 -18.9371 0.8800 14.2358 -228.7275 -66.9345 -1.9677 -2.0647
0.0693 9.6667 725 0.2438 -4.6908 -18.9453 0.8900 14.2545 -228.8092 -66.8294 -1.9677 -2.0648
0.0866 10.0 750 0.2475 -4.6928 -18.9485 0.8800 14.2557 -228.8416 -66.8499 -1.9679 -2.0651
0.0866 10.3333 775 0.2441 -4.7016 -18.9488 0.8900 14.2472 -228.8439 -66.9371 -1.9684 -2.0656
0.0866 10.6667 800 0.2484 -4.7051 -18.9371 0.8800 14.2320 -228.7267 -66.9721 -1.9685 -2.0656
0.0693 11.0 825 0.2421 -4.6951 -18.9478 0.8900 14.2526 -228.8337 -66.8728 -1.9685 -2.0656
0.052 11.3333 850 0.2506 -4.7064 -18.9338 0.8800 14.2275 -228.6942 -66.9851 -1.9684 -2.0655
0.0693 11.6667 875 0.2504 -4.6954 -18.9498 0.8800 14.2544 -228.8539 -66.8751 -1.9685 -2.0657
0.0693 12.0 900 0.2470 -4.6944 -18.9471 0.8800 14.2527 -228.8271 -66.8655 -1.9680 -2.0651
0.0347 12.3333 925 0.2470 -4.6947 -18.9470 0.8800 14.2523 -228.8264 -66.8687 -1.9680 -2.0651
0.0693 12.6667 950 0.2525 -4.6997 -18.9435 0.8800 14.2438 -228.7915 -66.9187 -1.9681 -2.0652
0.052 13.0 975 0.2525 -4.6991 -18.9443 0.8800 14.2452 -228.7995 -66.9127 -1.9681 -2.0652
0.0693 13.3333 1000 0.2525 -4.6992 -18.9442 0.8800 14.2450 -228.7986 -66.9138 -1.9681 -2.0652

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.0.0+cu117
  • Datasets 2.19.2
  • Tokenizers 0.19.1
Downloads last month
4
Safetensors
Model size
7.24B params
Tensor type
FP16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for tsavage68/UTI2_M2_1000steps_1e6rate_01beta_CSFTDPO