UTI2_M2_1000steps_1e5rate_05beta_CSFTDPO

This model is a fine-tuned version of tsavage68/UTI_M2_1000steps_1e7rate_SFT on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5498
  • Rewards/chosen: -13.0511
  • Rewards/rejected: -16.6516
  • Rewards/accuracies: 0.2100
  • Rewards/margins: 3.6004
  • Logps/rejected: -42.6771
  • Logps/chosen: -30.6447
  • Logits/rejected: 0.0583
  • Logits/chosen: 0.0501

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
2.0624 0.3333 25 2.5226 -1.8449 -0.4090 0.0700 -1.4358 -10.1920 -8.2322 -2.5575 -2.5577
3.5044 0.6667 50 4.5890 -5.8971 -3.0050 0.0800 -2.8921 -15.3840 -16.3367 -1.8026 -1.8027
3.7907 1.0 75 2.9383 -6.0760 -3.7525 0.0300 -2.3235 -16.8790 -16.6946 -3.0195 -3.0194
2.0779 1.3333 100 3.1415 -5.1412 -2.7057 0.0400 -2.4355 -14.7854 -14.8249 -2.3865 -2.3865
2.1244 1.6667 125 3.0972 -4.8789 -2.4628 0.0300 -2.4161 -14.2995 -14.3002 -1.8462 -1.8462
2.7225 2.0 150 3.0294 -4.9954 -2.6129 0.0300 -2.3825 -14.5998 -14.5333 -1.9868 -1.9868
1.5732 2.3333 175 2.9701 -7.2452 -4.8894 0.0300 -2.3558 -19.1528 -19.0329 -2.3919 -2.3919
4.1424 2.6667 200 3.0335 -4.8641 -2.4777 0.0300 -2.3864 -14.3295 -14.2707 -1.7217 -1.7217
1.0287 3.0 225 3.1359 -4.7700 -2.3364 0.0400 -2.4335 -14.0468 -14.0824 -1.6971 -1.6971
1.6456 3.3333 250 3.2455 -4.7269 -2.2495 0.0500 -2.4774 -13.8730 -13.9962 -1.3352 -1.3352
2.0659 3.6667 275 2.7595 -7.9375 -6.1451 0.0900 -1.7924 -21.6641 -20.4174 0.7532 0.7530
2.1423 4.0 300 3.1729 -6.0067 -3.5583 0.0500 -2.4484 -16.4906 -16.5559 -1.2394 -1.2394
2.4459 4.3333 325 3.2141 -8.9788 -6.5133 0.0500 -2.4655 -22.4005 -22.5000 -3.0378 -3.0378
3.0794 4.6667 350 3.1602 -14.0839 -11.6405 0.0400 -2.4434 -32.6550 -32.7103 -2.5895 -2.5895
3.9364 5.0 375 3.0988 -14.3023 -11.8867 0.0300 -2.4156 -33.1474 -33.1471 -2.5474 -2.5474
1.9883 5.3333 400 3.1292 -14.9159 -12.4872 0.0400 -2.4286 -34.3484 -34.3742 -2.5028 -2.5028
2.4843 5.6667 425 2.9658 -16.9208 -14.5704 0.0300 -2.3504 -38.5147 -38.3840 -2.2091 -2.2091
3.7325 6.0 450 3.0443 -18.2416 -15.8503 0.0300 -2.3913 -41.0746 -41.0257 -2.0492 -2.0492
1.9807 6.3333 475 3.2023 -10.0704 -7.6142 0.0500 -2.4562 -24.6023 -24.6833 -2.7758 -2.7758
2.4126 6.6667 500 2.9411 -11.3553 -9.1152 0.0300 -2.2401 -27.6043 -27.2530 -2.9774 -2.9775
3.7081 7.0 525 2.6725 -16.3050 -14.2905 0.0300 -2.0145 -37.9550 -37.1525 1.0618 1.0616
2.9707 7.3333 550 2.2326 -12.0067 -11.0286 0.0600 -0.9781 -31.4312 -28.5558 1.7090 1.7071
1.1998 7.6667 575 2.6023 -6.3024 -4.7549 0.0600 -1.5476 -18.8837 -17.1473 0.5970 0.5966
1.0342 8.0 600 0.5746 -11.7381 -22.2043 0.2000 10.4662 -53.7825 -28.0186 1.7281 1.7098
0.5377 8.3333 625 1.7320 -11.1869 -19.3957 0.1900 8.2088 -48.1653 -26.9162 0.6525 0.6424
0.9725 8.6667 650 1.1219 -8.1698 -11.3167 0.1900 3.1468 -32.0073 -20.8821 0.3043 0.2998
1.6977 9.0 675 0.6161 -9.9974 -12.2635 0.1800 2.2661 -33.9009 -24.5372 0.0440 0.0415
0.5025 9.3333 700 0.6105 -9.9405 -12.8803 0.1900 2.9398 -35.1346 -24.4235 0.0064 0.0013
0.5025 9.6667 725 0.6058 -10.3113 -13.4162 0.1900 3.1049 -36.2063 -25.1651 -0.9500 -0.9577
0.5908 10.0 750 0.5619 -12.0967 -14.9911 0.2000 2.8944 -39.3562 -28.7358 0.0605 0.0537
0.5199 10.3333 775 0.5780 -12.4712 -15.8265 0.2000 3.3553 -41.0270 -29.4849 0.0794 0.0714
0.5199 10.6667 800 0.5527 -12.6098 -16.1000 0.2100 3.4901 -41.5739 -29.7621 0.0901 0.0819
0.5372 11.0 825 0.5500 -13.0464 -16.6460 0.2100 3.5996 -42.6660 -30.6353 0.0593 0.0511
0.5199 11.3333 850 0.5498 -13.0532 -16.6508 0.2100 3.5976 -42.6755 -30.6489 0.0585 0.0503
0.6065 11.6667 875 0.5497 -13.0545 -16.6501 0.2100 3.5956 -42.6742 -30.6515 0.0585 0.0503
0.5719 12.0 900 0.5498 -13.0510 -16.6497 0.2100 3.5986 -42.6733 -30.6445 0.0587 0.0505
0.4159 12.3333 925 0.5495 -13.0537 -16.6558 0.2100 3.6020 -42.6855 -30.6499 0.0585 0.0503
0.6238 12.6667 950 0.5499 -13.0443 -16.6468 0.2100 3.6025 -42.6676 -30.6311 0.0586 0.0504
0.6065 13.0 975 0.5498 -13.0511 -16.6516 0.2100 3.6004 -42.6771 -30.6447 0.0583 0.0501
0.5025 13.3333 1000 0.5498 -13.0511 -16.6516 0.2100 3.6004 -42.6771 -30.6447 0.0583 0.0501

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.0.0+cu117
  • Datasets 2.19.2
  • Tokenizers 0.19.1
Downloads last month
4
Safetensors
Model size
7.24B params
Tensor type
FP16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for tsavage68/UTI2_M2_1000steps_1e5rate_05beta_CSFTDPO