UTI2_M2_1000steps_1e7rate_03beta_CSFTDPO

This model is a fine-tuned version of tsavage68/UTI_M2_1000steps_1e7rate_SFT on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5476
  • Rewards/chosen: 0.0650
  • Rewards/rejected: -3.0224
  • Rewards/accuracies: 0.2100
  • Rewards/margins: 3.0873
  • Logps/rejected: -19.4485
  • Logps/chosen: -4.3259
  • Logits/rejected: -2.6124
  • Logits/chosen: -2.6122

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-07
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6972 0.3333 25 0.6778 -0.0051 -0.0396 0.1700 0.0345 -9.5061 -4.5596 -2.7055 -2.7048
0.6009 0.6667 50 0.5879 0.0346 -0.3529 0.2100 0.3875 -10.5502 -4.4271 -2.6953 -2.6946
0.5832 1.0 75 0.5855 -0.0014 -1.5604 0.1900 1.5590 -14.5753 -4.5473 -2.6506 -2.6501
0.6192 1.3333 100 0.5646 -0.0198 -2.1667 0.1900 2.1469 -16.5963 -4.6084 -2.6315 -2.6312
0.5545 1.6667 125 0.5481 0.0095 -2.4683 0.2100 2.4778 -17.6015 -4.5108 -2.6275 -2.6272
0.5545 2.0 150 0.5667 0.0007 -2.7764 0.2000 2.7771 -18.6286 -4.5402 -2.6199 -2.6196
0.5545 2.3333 175 0.5761 -0.0123 -2.7676 0.2000 2.7554 -18.5994 -4.5834 -2.6198 -2.6195
0.4852 2.6667 200 0.5476 0.0753 -2.8789 0.2100 2.9542 -18.9703 -4.2915 -2.6142 -2.6139
0.6412 3.0 225 0.5476 0.0753 -2.8789 0.2100 2.9542 -18.9703 -4.2915 -2.6142 -2.6139
0.5545 3.3333 250 0.5476 0.0755 -2.8942 0.2100 2.9697 -19.0213 -4.2908 -2.6143 -2.6140
0.5372 3.6667 275 0.5476 0.0793 -2.8876 0.2100 2.9669 -18.9993 -4.2780 -2.6151 -2.6149
0.5892 4.0 300 0.5476 0.0771 -2.9194 0.2100 2.9965 -19.1054 -4.2856 -2.6143 -2.6140
0.4679 4.3333 325 0.5476 0.0771 -2.9195 0.2100 2.9965 -19.1055 -4.2856 -2.6143 -2.6140
0.5718 4.6667 350 0.5476 0.0724 -2.9271 0.2100 2.9995 -19.1309 -4.3011 -2.6145 -2.6142
0.5199 5.0 375 0.5476 0.0715 -2.9612 0.2100 3.0327 -19.2446 -4.3040 -2.6121 -2.6119
0.5025 5.3333 400 0.5476 0.0695 -2.9553 0.2100 3.0248 -19.2251 -4.3110 -2.6134 -2.6132
0.5199 5.6667 425 0.5476 0.0667 -2.9587 0.2100 3.0253 -19.2362 -4.3202 -2.6138 -2.6135
0.5025 6.0 450 0.5476 0.0733 -2.9890 0.2100 3.0623 -19.3371 -4.2980 -2.6130 -2.6128
0.5718 6.3333 475 0.5476 0.0738 -2.9747 0.2100 3.0485 -19.2896 -4.2964 -2.6133 -2.6130
0.5718 6.6667 500 0.5476 0.0629 -2.9781 0.2100 3.0410 -19.3011 -4.3329 -2.6138 -2.6136
0.5025 7.0 525 0.5476 0.0646 -3.0038 0.2100 3.0685 -19.3868 -4.3270 -2.6132 -2.6130
0.5199 7.3333 550 0.5476 0.0729 -2.9967 0.2100 3.0696 -19.3629 -4.2994 -2.6125 -2.6123
0.5372 7.6667 575 0.5476 0.0722 -3.0204 0.2100 3.0926 -19.4418 -4.3018 -2.6128 -2.6125
0.5718 8.0 600 0.5476 0.0668 -3.0048 0.2100 3.0716 -19.3899 -4.3198 -2.6129 -2.6127
0.5372 8.3333 625 0.5476 0.0640 -3.0120 0.2100 3.0760 -19.4140 -4.3291 -2.6127 -2.6125
0.4332 8.6667 650 0.5476 0.0701 -3.0149 0.2100 3.0851 -19.4237 -4.3087 -2.6119 -2.6117
0.5372 9.0 675 0.5476 0.0744 -3.0163 0.2100 3.0906 -19.4282 -4.2946 -2.6121 -2.6119
0.5025 9.3333 700 0.5476 0.0677 -3.0179 0.2100 3.0856 -19.4337 -4.3167 -2.6122 -2.6119
0.5025 9.6667 725 0.5476 0.0653 -3.0228 0.2100 3.0881 -19.4499 -4.3247 -2.6125 -2.6123
0.5892 10.0 750 0.5476 0.0635 -3.0214 0.2100 3.0848 -19.4451 -4.3309 -2.6120 -2.6117
0.5199 10.3333 775 0.5476 0.0649 -3.0176 0.2100 3.0825 -19.4326 -4.3261 -2.6123 -2.6120
0.5199 10.6667 800 0.5476 0.0649 -3.0260 0.2100 3.0908 -19.4605 -4.3262 -2.6118 -2.6116
0.5372 11.0 825 0.5476 0.0654 -3.0196 0.2100 3.0850 -19.4392 -4.3243 -2.6122 -2.6119
0.5199 11.3333 850 0.5476 0.0645 -3.0236 0.2100 3.0881 -19.4525 -4.3275 -2.6122 -2.6120
0.6065 11.6667 875 0.5476 0.0651 -3.0232 0.2100 3.0883 -19.4513 -4.3254 -2.6124 -2.6122
0.5718 12.0 900 0.5476 0.0648 -3.0221 0.2100 3.0869 -19.4478 -4.3265 -2.6124 -2.6121
0.4159 12.3333 925 0.5476 0.0650 -3.0224 0.2100 3.0873 -19.4485 -4.3259 -2.6124 -2.6122
0.6238 12.6667 950 0.5476 0.0650 -3.0224 0.2100 3.0873 -19.4485 -4.3259 -2.6124 -2.6122
0.6065 13.0 975 0.5476 0.0650 -3.0224 0.2100 3.0873 -19.4485 -4.3259 -2.6124 -2.6122
0.5025 13.3333 1000 0.5476 0.0650 -3.0224 0.2100 3.0873 -19.4485 -4.3259 -2.6124 -2.6122

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.0.0+cu117
  • Datasets 2.19.2
  • Tokenizers 0.19.1
Downloads last month
6
Safetensors
Model size
7.24B params
Tensor type
FP16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for tsavage68/UTI2_M2_1000steps_1e7rate_03beta_CSFTDPO