UTI2_M2_1000steps_1e8rate_01beta_CSFTDPO

This model is a fine-tuned version of tsavage68/UTI_M2_1000steps_1e7rate_SFT on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6638
  • Rewards/chosen: 0.0017
  • Rewards/rejected: -0.0583
  • Rewards/accuracies: 0.8700
  • Rewards/margins: 0.0600
  • Logps/rejected: -39.9390
  • Logps/chosen: -19.9040
  • Logits/rejected: -2.6810
  • Logits/chosen: -2.6784

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-08
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6931 0.3333 25 0.6929 0.0006 0.0000 0.1900 0.0006 -39.3560 -19.9153 -2.6832 -2.6806
0.6898 0.6667 50 0.6940 -0.0024 -0.0009 0.4700 -0.0015 -39.3651 -19.9455 -2.6823 -2.6798
0.6943 1.0 75 0.6915 0.0010 -0.0025 0.5200 0.0035 -39.3809 -19.9114 -2.6828 -2.6803
0.6889 1.3333 100 0.6916 0.0004 -0.0029 0.5100 0.0033 -39.3855 -19.9178 -2.6827 -2.6801
0.6891 1.6667 125 0.6877 0.0008 -0.0102 0.6000 0.0110 -39.4581 -19.9132 -2.6824 -2.6798
0.6876 2.0 150 0.6854 0.0015 -0.0143 0.6900 0.0158 -39.4992 -19.9067 -2.6812 -2.6786
0.681 2.3333 175 0.6793 0.0011 -0.0270 0.7900 0.0280 -39.6258 -19.9108 -2.6817 -2.6791
0.6782 2.6667 200 0.6775 0.0056 -0.0263 0.8000 0.0319 -39.6187 -19.8653 -2.6821 -2.6795
0.6763 3.0 225 0.6739 0.0035 -0.0355 0.8200 0.0390 -39.7113 -19.8864 -2.6810 -2.6784
0.6716 3.3333 250 0.6697 0.0024 -0.0454 0.8600 0.0478 -39.8106 -19.8975 -2.6814 -2.6788
0.6724 3.6667 275 0.6698 0.0055 -0.0420 0.8600 0.0475 -39.7765 -19.8664 -2.6809 -2.6783
0.6706 4.0 300 0.6686 0.0042 -0.0458 0.8600 0.0500 -39.8142 -19.8793 -2.6812 -2.6786
0.6644 4.3333 325 0.6649 0.0063 -0.0515 0.8600 0.0578 -39.8714 -19.8588 -2.6808 -2.6782
0.6626 4.6667 350 0.6650 0.0053 -0.0523 0.8300 0.0575 -39.8789 -19.8689 -2.6814 -2.6788
0.6656 5.0 375 0.6636 0.0063 -0.0541 0.8800 0.0604 -39.8971 -19.8587 -2.6809 -2.6783
0.6672 5.3333 400 0.6643 0.0062 -0.0527 0.8600 0.0589 -39.8830 -19.8595 -2.6801 -2.6775
0.6627 5.6667 425 0.6650 0.0019 -0.0557 0.8300 0.0575 -39.9129 -19.9029 -2.6806 -2.6780
0.6641 6.0 450 0.6646 0.0073 -0.0510 0.8500 0.0583 -39.8660 -19.8482 -2.6807 -2.6781
0.6608 6.3333 475 0.6632 0.0072 -0.0541 0.8700 0.0612 -39.8970 -19.8500 -2.6801 -2.6776
0.6733 6.6667 500 0.6626 0.0067 -0.0559 0.8400 0.0626 -39.9148 -19.8543 -2.6808 -2.6782
0.6596 7.0 525 0.6628 0.0064 -0.0558 0.8500 0.0622 -39.9138 -19.8576 -2.6813 -2.6787
0.6612 7.3333 550 0.6627 0.0063 -0.0561 0.8700 0.0624 -39.9173 -19.8583 -2.6810 -2.6784
0.665 7.6667 575 0.6627 0.0054 -0.0570 0.8700 0.0624 -39.9263 -19.8680 -2.6814 -2.6788
0.6647 8.0 600 0.6641 0.0050 -0.0545 0.8400 0.0595 -39.9014 -19.8714 -2.6805 -2.6779
0.6644 8.3333 625 0.6627 0.0046 -0.0577 0.8700 0.0623 -39.9335 -19.8757 -2.6811 -2.6786
0.6589 8.6667 650 0.6634 0.0071 -0.0537 0.8600 0.0608 -39.8933 -19.8503 -2.6819 -2.6793
0.6626 9.0 675 0.6628 0.0073 -0.0547 0.8800 0.0620 -39.9033 -19.8483 -2.6816 -2.6789
0.6608 9.3333 700 0.6649 0.0049 -0.0528 0.8600 0.0577 -39.8841 -19.8726 -2.6813 -2.6787
0.6695 9.6667 725 0.6628 0.0040 -0.0581 0.9000 0.0622 -39.9376 -19.8814 -2.6812 -2.6786
0.6683 10.0 750 0.6635 0.0033 -0.0573 0.8600 0.0606 -39.9296 -19.8888 -2.6811 -2.6785
0.6657 10.3333 775 0.6627 0.0039 -0.0584 0.8700 0.0623 -39.9402 -19.8824 -2.6810 -2.6784
0.6668 10.6667 800 0.6636 0.0029 -0.0575 0.8700 0.0604 -39.9309 -19.8927 -2.6810 -2.6784
0.6605 11.0 825 0.6640 0.0015 -0.0580 0.8700 0.0595 -39.9363 -19.9063 -2.6810 -2.6784
0.6634 11.3333 850 0.6638 0.0017 -0.0583 0.8700 0.0600 -39.9390 -19.9040 -2.6810 -2.6784
0.6696 11.6667 875 0.6638 0.0017 -0.0583 0.8700 0.0600 -39.9390 -19.9040 -2.6810 -2.6784
0.6609 12.0 900 0.6638 0.0017 -0.0583 0.8700 0.0600 -39.9390 -19.9040 -2.6810 -2.6784
0.6652 12.3333 925 0.6638 0.0017 -0.0583 0.8700 0.0600 -39.9390 -19.9040 -2.6810 -2.6784
0.6614 12.6667 950 0.6638 0.0017 -0.0583 0.8700 0.0600 -39.9390 -19.9040 -2.6810 -2.6784
0.6639 13.0 975 0.6638 0.0017 -0.0583 0.8700 0.0600 -39.9390 -19.9040 -2.6810 -2.6784
0.6632 13.3333 1000 0.6638 0.0017 -0.0583 0.8700 0.0600 -39.9390 -19.9040 -2.6810 -2.6784

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.0.0+cu117
  • Datasets 2.19.2
  • Tokenizers 0.19.1
Downloads last month
4
Safetensors
Model size
7.24B params
Tensor type
FP16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for tsavage68/UTI2_M2_1000steps_1e8rate_01beta_CSFTDPO