UTI2_M2_1000steps_1e5rate_03beta_CSFTDPO

This model is a fine-tuned version of tsavage68/UTI_M2_1000steps_1e7rate_SFT on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.0951
  • Rewards/chosen: -5.6444
  • Rewards/rejected: -4.1898
  • Rewards/accuracies: 0.0300
  • Rewards/margins: -1.4546
  • Logps/rejected: -23.3400
  • Logps/chosen: -23.3572
  • Logits/rejected: -3.4660
  • Logits/chosen: -3.4660

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
1.5384 0.3333 25 0.5865 -0.9393 -4.6336 0.2000 3.6943 -24.8193 -7.6735 -2.6286 -2.6335
1.9313 0.6667 50 3.3244 -8.8014 -7.0309 0.0800 -1.7705 -32.8104 -33.8805 -2.2793 -2.2791
2.2334 1.0 75 2.1234 -7.8647 -6.3982 0.0400 -1.4665 -30.7014 -30.7581 -2.3195 -2.3195
1.5885 1.3333 100 2.2933 -7.5972 -6.0666 0.0600 -1.5306 -29.5961 -29.8666 -2.4436 -2.4436
1.5616 1.6667 125 2.1780 -7.3269 -5.8384 0.0500 -1.4885 -28.8354 -28.9655 -2.4142 -2.4142
1.9139 2.0 150 2.0829 -7.6465 -6.1969 0.0300 -1.4496 -30.0302 -30.0307 -2.3439 -2.3439
1.2991 2.3333 175 2.1037 -7.8995 -6.4415 0.0300 -1.4580 -30.8457 -30.8743 -2.2606 -2.2606
2.6811 2.6667 200 2.0701 -8.1244 -6.6820 0.0300 -1.4424 -31.6472 -31.6237 -2.1655 -2.1655
0.8733 3.0 225 2.1232 -8.1779 -6.7120 0.0400 -1.4660 -31.7472 -31.8022 -2.1965 -2.1965
1.2195 3.3333 250 2.1795 -8.2279 -6.7396 0.0500 -1.4883 -31.8392 -31.9687 -2.2261 -2.2261
1.4853 3.6667 275 1.9880 -2.8100 -1.4100 0.0300 -1.4000 -14.0739 -13.9091 -0.4257 -0.4257
1.1822 4.0 300 2.0289 -2.9362 -1.5140 0.0300 -1.4222 -14.4206 -14.3298 0.3805 0.3805
1.7017 4.3333 325 3.1499 -4.3610 -2.5853 0.0800 -1.7757 -17.9917 -19.0793 -0.3043 -0.3043
2.0845 4.6667 350 2.1225 -4.8636 -3.3976 0.0400 -1.4660 -20.6994 -20.7544 -3.2196 -3.2196
2.604 5.0 375 2.1166 -4.5986 -3.1359 0.0400 -1.4627 -19.8270 -19.8711 -3.0726 -3.0726
1.421 5.3333 400 2.1357 -4.3907 -2.9196 0.0500 -1.4712 -19.1058 -19.1783 -3.0071 -3.0071
1.7226 5.6667 425 2.0272 -3.5428 -2.1216 0.0300 -1.4213 -16.4459 -16.3519 -2.2781 -2.2781
2.4447 6.0 450 2.0689 -3.6109 -2.1692 0.0300 -1.4417 -16.6047 -16.5789 -2.3727 -2.3727
1.411 6.3333 475 2.2094 -4.2537 -2.7538 0.0500 -1.5000 -18.5531 -18.7215 -2.8542 -2.8542
2.0897 6.6667 500 2.0544 -2.9041 -1.4694 0.0300 -1.4347 -14.2719 -14.2228 1.1911 1.1911
2.1201 7.0 525 1.9272 -2.9728 -1.6125 0.0100 -1.3604 -14.7488 -14.4519 0.5166 0.5166
2.0408 7.3333 550 1.9687 -3.0107 -1.6220 0.0200 -1.3887 -14.7806 -14.5782 0.5712 0.5712
0.9684 7.6667 575 2.1804 -5.6469 -4.1579 0.0500 -1.4890 -23.2336 -23.3653 -3.1061 -3.1061
2.1055 8.0 600 2.0889 -5.5077 -4.0567 0.0300 -1.4510 -22.8961 -22.9014 -3.1369 -3.1369
1.9687 8.3333 625 2.1354 -5.4589 -3.9878 0.0500 -1.4711 -22.6667 -22.7388 -3.2956 -3.2956
2.5668 8.6667 650 2.1562 -5.5048 -4.0251 0.0500 -1.4798 -22.7908 -22.8919 -3.4770 -3.4770
1.5897 9.0 675 2.1046 -5.4925 -4.0344 0.0300 -1.4581 -22.8218 -22.8507 -3.3988 -3.3988
1.9547 9.3333 700 2.1229 -5.5252 -4.0594 0.0400 -1.4658 -22.9052 -22.9596 -3.4297 -3.4297
1.9507 9.6667 725 2.1021 -5.5176 -4.0607 0.0300 -1.4569 -22.9095 -22.9343 -3.3848 -3.3848
1.6239 10.0 750 2.1085 -5.5157 -4.0561 0.0400 -1.4596 -22.8942 -22.9280 -3.3875 -3.3875
1.8544 10.3333 775 2.1030 -5.5459 -4.0885 0.0300 -1.4574 -23.0023 -23.0289 -3.4001 -3.4001
2.4013 10.6667 800 2.0931 -5.6326 -4.1789 0.0300 -1.4537 -23.3036 -23.3178 -3.4385 -3.4385
1.2873 11.0 825 2.0988 -5.6290 -4.1728 0.0300 -1.4562 -23.2832 -23.3059 -3.4535 -3.4535
1.8211 11.3333 850 2.0989 -5.6433 -4.1865 0.0300 -1.4568 -23.3289 -23.3535 -3.4376 -3.4376
1.6314 11.6667 875 2.0969 -5.6415 -4.1859 0.0300 -1.4556 -23.3269 -23.3475 -3.4609 -3.4609
1.5862 12.0 900 2.0963 -5.6447 -4.1892 0.0300 -1.4555 -23.3379 -23.3581 -3.4652 -3.4652
2.6075 12.3333 925 2.0963 -5.6449 -4.1894 0.0300 -1.4555 -23.3386 -23.3587 -3.4643 -3.4643
1.0943 12.6667 950 2.0954 -5.6448 -4.1898 0.0300 -1.4550 -23.3400 -23.3584 -3.4660 -3.4660
1.2314 13.0 975 2.0953 -5.6446 -4.1898 0.0300 -1.4548 -23.3399 -23.3577 -3.4660 -3.4660
2.0533 13.3333 1000 2.0951 -5.6444 -4.1898 0.0300 -1.4546 -23.3400 -23.3572 -3.4660 -3.4660

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.0.0+cu117
  • Datasets 2.19.2
  • Tokenizers 0.19.1
Downloads last month
6
Safetensors
Model size
7.24B params
Tensor type
FP16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for tsavage68/UTI2_M2_1000steps_1e5rate_03beta_CSFTDPO