UTI2_M2_1000steps_1e7rate_05beta_CSFTDPO

This model is a fine-tuned version of tsavage68/UTI_M2_1000steps_1e7rate_SFT on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5476
  • Rewards/chosen: 0.0699
  • Rewards/rejected: -3.0830
  • Rewards/accuracies: 0.2100
  • Rewards/margins: 3.1530
  • Logps/rejected: -15.5400
  • Logps/chosen: -4.4026
  • Logits/rejected: -2.6403
  • Logits/chosen: -2.6398

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-07
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.7009 0.3333 25 0.6667 0.0016 -0.0631 0.1900 0.0647 -9.5002 -4.5393 -2.7063 -2.7055
0.5794 0.6667 50 0.5690 0.0605 -0.5674 0.2100 0.6279 -10.5088 -4.4215 -2.6950 -2.6942
0.5772 1.0 75 0.5638 -0.0374 -1.8778 0.2000 1.8404 -13.1295 -4.6173 -2.6653 -2.6647
0.5715 1.3333 100 0.5485 0.0321 -2.2707 0.2100 2.3028 -13.9154 -4.4783 -2.6560 -2.6555
0.5545 1.6667 125 0.5476 0.1013 -2.5349 0.2100 2.6363 -14.4438 -4.3398 -2.6499 -2.6494
0.5545 2.0 150 0.5476 0.0902 -2.9376 0.2100 3.0278 -15.2492 -4.3621 -2.6442 -2.6437
0.5545 2.3333 175 0.5476 0.0846 -2.9244 0.2100 3.0090 -15.2229 -4.3733 -2.6424 -2.6419
0.4852 2.6667 200 0.5476 0.0848 -2.9648 0.2100 3.0495 -15.3035 -4.3729 -2.6423 -2.6417
0.6412 3.0 225 0.5476 0.0853 -2.9694 0.2100 3.0547 -15.3127 -4.3718 -2.6421 -2.6415
0.5545 3.3333 250 0.5476 0.0892 -2.9671 0.2100 3.0563 -15.3081 -4.3640 -2.6429 -2.6424
0.5372 3.6667 275 0.5476 0.0803 -2.9507 0.2100 3.0310 -15.2754 -4.3819 -2.6416 -2.6410
0.5892 4.0 300 0.5476 0.0791 -3.0080 0.2100 3.0871 -15.3899 -4.3842 -2.6421 -2.6415
0.4679 4.3333 325 0.5476 0.0770 -3.0043 0.2100 3.0814 -15.3826 -4.3884 -2.6420 -2.6415
0.5718 4.6667 350 0.5476 0.0767 -3.0040 0.2100 3.0808 -15.3820 -4.3890 -2.6414 -2.6409
0.5199 5.0 375 0.5476 0.0830 -3.0444 0.2100 3.1274 -15.4628 -4.3765 -2.6415 -2.6410
0.5025 5.3333 400 0.5476 0.0784 -3.0520 0.2100 3.1304 -15.4779 -4.3857 -2.6406 -2.6401
0.5199 5.6667 425 0.5476 0.0772 -3.0417 0.2100 3.1189 -15.4575 -4.3882 -2.6418 -2.6412
0.5025 6.0 450 0.5476 0.0775 -3.0690 0.2100 3.1465 -15.5119 -4.3875 -2.6403 -2.6398
0.5718 6.3333 475 0.5476 0.0722 -3.0608 0.2100 3.1330 -15.4956 -4.3980 -2.6403 -2.6398
0.5718 6.6667 500 0.5476 0.0733 -3.0661 0.2100 3.1394 -15.5061 -4.3958 -2.6403 -2.6397
0.5025 7.0 525 0.5476 0.0687 -3.0692 0.2100 3.1379 -15.5123 -4.4051 -2.6407 -2.6402
0.5199 7.3333 550 0.5476 0.0691 -3.0762 0.2100 3.1454 -15.5265 -4.4042 -2.6401 -2.6396
0.5372 7.6667 575 0.5476 0.0728 -3.0945 0.2100 3.1672 -15.5629 -4.3970 -2.6414 -2.6409
0.5718 8.0 600 0.5476 0.0736 -3.0806 0.2100 3.1541 -15.5351 -4.3953 -2.6405 -2.6400
0.5372 8.3333 625 0.5476 0.0806 -3.0954 0.2100 3.1759 -15.5647 -4.3813 -2.6410 -2.6405
0.4332 8.6667 650 0.5476 0.0762 -3.0922 0.2100 3.1684 -15.5583 -4.3900 -2.6412 -2.6407
0.5372 9.0 675 0.5476 0.0738 -3.0924 0.2100 3.1662 -15.5587 -4.3948 -2.6408 -2.6403
0.5025 9.3333 700 0.5476 0.0702 -3.0892 0.2100 3.1594 -15.5524 -4.4020 -2.6405 -2.6400
0.5025 9.6667 725 0.5476 0.0641 -3.0956 0.2100 3.1597 -15.5651 -4.4142 -2.6410 -2.6405
0.5892 10.0 750 0.5476 0.0696 -3.0933 0.2100 3.1630 -15.5606 -4.4032 -2.6403 -2.6398
0.5199 10.3333 775 0.5476 0.0764 -3.0810 0.2100 3.1574 -15.5361 -4.3897 -2.6404 -2.6399
0.5199 10.6667 800 0.5476 0.0750 -3.0945 0.2100 3.1695 -15.5629 -4.3925 -2.6399 -2.6394
0.5372 11.0 825 0.5477 0.0727 -3.0777 0.2100 3.1504 -15.5293 -4.3970 -2.6405 -2.6399
0.5199 11.3333 850 0.5477 0.0760 -3.0775 0.2100 3.1534 -15.5289 -4.3905 -2.6402 -2.6397
0.6065 11.6667 875 0.5476 0.0737 -3.0877 0.2100 3.1615 -15.5495 -4.3950 -2.6404 -2.6398
0.5718 12.0 900 0.5476 0.0713 -3.0915 0.2100 3.1628 -15.5570 -4.3999 -2.6403 -2.6398
0.4159 12.3333 925 0.5476 0.0687 -3.0820 0.2100 3.1507 -15.5379 -4.4051 -2.6403 -2.6398
0.6238 12.6667 950 0.5476 0.0699 -3.0830 0.2100 3.1530 -15.5400 -4.4026 -2.6403 -2.6398
0.6065 13.0 975 0.5476 0.0699 -3.0830 0.2100 3.1530 -15.5400 -4.4026 -2.6403 -2.6398
0.5025 13.3333 1000 0.5476 0.0699 -3.0830 0.2100 3.1530 -15.5400 -4.4026 -2.6403 -2.6398

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.0.0+cu117
  • Datasets 2.19.2
  • Tokenizers 0.19.1
Downloads last month
4
Safetensors
Model size
7.24B params
Tensor type
FP16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for tsavage68/UTI2_M2_1000steps_1e7rate_05beta_CSFTDPO