UTI2_M2_1000steps_1e7rate_03beta_CSFTDPO
This model is a fine-tuned version of tsavage68/UTI_M2_1000steps_1e7rate_SFT on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.5476
- Rewards/chosen: 0.0650
- Rewards/rejected: -3.0224
- Rewards/accuracies: 0.2100
- Rewards/margins: 3.0873
- Logps/rejected: -19.4485
- Logps/chosen: -4.3259
- Logits/rejected: -2.6124
- Logits/chosen: -2.6122
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-07
- train_batch_size: 2
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6972 | 0.3333 | 25 | 0.6778 | -0.0051 | -0.0396 | 0.1700 | 0.0345 | -9.5061 | -4.5596 | -2.7055 | -2.7048 |
0.6009 | 0.6667 | 50 | 0.5879 | 0.0346 | -0.3529 | 0.2100 | 0.3875 | -10.5502 | -4.4271 | -2.6953 | -2.6946 |
0.5832 | 1.0 | 75 | 0.5855 | -0.0014 | -1.5604 | 0.1900 | 1.5590 | -14.5753 | -4.5473 | -2.6506 | -2.6501 |
0.6192 | 1.3333 | 100 | 0.5646 | -0.0198 | -2.1667 | 0.1900 | 2.1469 | -16.5963 | -4.6084 | -2.6315 | -2.6312 |
0.5545 | 1.6667 | 125 | 0.5481 | 0.0095 | -2.4683 | 0.2100 | 2.4778 | -17.6015 | -4.5108 | -2.6275 | -2.6272 |
0.5545 | 2.0 | 150 | 0.5667 | 0.0007 | -2.7764 | 0.2000 | 2.7771 | -18.6286 | -4.5402 | -2.6199 | -2.6196 |
0.5545 | 2.3333 | 175 | 0.5761 | -0.0123 | -2.7676 | 0.2000 | 2.7554 | -18.5994 | -4.5834 | -2.6198 | -2.6195 |
0.4852 | 2.6667 | 200 | 0.5476 | 0.0753 | -2.8789 | 0.2100 | 2.9542 | -18.9703 | -4.2915 | -2.6142 | -2.6139 |
0.6412 | 3.0 | 225 | 0.5476 | 0.0753 | -2.8789 | 0.2100 | 2.9542 | -18.9703 | -4.2915 | -2.6142 | -2.6139 |
0.5545 | 3.3333 | 250 | 0.5476 | 0.0755 | -2.8942 | 0.2100 | 2.9697 | -19.0213 | -4.2908 | -2.6143 | -2.6140 |
0.5372 | 3.6667 | 275 | 0.5476 | 0.0793 | -2.8876 | 0.2100 | 2.9669 | -18.9993 | -4.2780 | -2.6151 | -2.6149 |
0.5892 | 4.0 | 300 | 0.5476 | 0.0771 | -2.9194 | 0.2100 | 2.9965 | -19.1054 | -4.2856 | -2.6143 | -2.6140 |
0.4679 | 4.3333 | 325 | 0.5476 | 0.0771 | -2.9195 | 0.2100 | 2.9965 | -19.1055 | -4.2856 | -2.6143 | -2.6140 |
0.5718 | 4.6667 | 350 | 0.5476 | 0.0724 | -2.9271 | 0.2100 | 2.9995 | -19.1309 | -4.3011 | -2.6145 | -2.6142 |
0.5199 | 5.0 | 375 | 0.5476 | 0.0715 | -2.9612 | 0.2100 | 3.0327 | -19.2446 | -4.3040 | -2.6121 | -2.6119 |
0.5025 | 5.3333 | 400 | 0.5476 | 0.0695 | -2.9553 | 0.2100 | 3.0248 | -19.2251 | -4.3110 | -2.6134 | -2.6132 |
0.5199 | 5.6667 | 425 | 0.5476 | 0.0667 | -2.9587 | 0.2100 | 3.0253 | -19.2362 | -4.3202 | -2.6138 | -2.6135 |
0.5025 | 6.0 | 450 | 0.5476 | 0.0733 | -2.9890 | 0.2100 | 3.0623 | -19.3371 | -4.2980 | -2.6130 | -2.6128 |
0.5718 | 6.3333 | 475 | 0.5476 | 0.0738 | -2.9747 | 0.2100 | 3.0485 | -19.2896 | -4.2964 | -2.6133 | -2.6130 |
0.5718 | 6.6667 | 500 | 0.5476 | 0.0629 | -2.9781 | 0.2100 | 3.0410 | -19.3011 | -4.3329 | -2.6138 | -2.6136 |
0.5025 | 7.0 | 525 | 0.5476 | 0.0646 | -3.0038 | 0.2100 | 3.0685 | -19.3868 | -4.3270 | -2.6132 | -2.6130 |
0.5199 | 7.3333 | 550 | 0.5476 | 0.0729 | -2.9967 | 0.2100 | 3.0696 | -19.3629 | -4.2994 | -2.6125 | -2.6123 |
0.5372 | 7.6667 | 575 | 0.5476 | 0.0722 | -3.0204 | 0.2100 | 3.0926 | -19.4418 | -4.3018 | -2.6128 | -2.6125 |
0.5718 | 8.0 | 600 | 0.5476 | 0.0668 | -3.0048 | 0.2100 | 3.0716 | -19.3899 | -4.3198 | -2.6129 | -2.6127 |
0.5372 | 8.3333 | 625 | 0.5476 | 0.0640 | -3.0120 | 0.2100 | 3.0760 | -19.4140 | -4.3291 | -2.6127 | -2.6125 |
0.4332 | 8.6667 | 650 | 0.5476 | 0.0701 | -3.0149 | 0.2100 | 3.0851 | -19.4237 | -4.3087 | -2.6119 | -2.6117 |
0.5372 | 9.0 | 675 | 0.5476 | 0.0744 | -3.0163 | 0.2100 | 3.0906 | -19.4282 | -4.2946 | -2.6121 | -2.6119 |
0.5025 | 9.3333 | 700 | 0.5476 | 0.0677 | -3.0179 | 0.2100 | 3.0856 | -19.4337 | -4.3167 | -2.6122 | -2.6119 |
0.5025 | 9.6667 | 725 | 0.5476 | 0.0653 | -3.0228 | 0.2100 | 3.0881 | -19.4499 | -4.3247 | -2.6125 | -2.6123 |
0.5892 | 10.0 | 750 | 0.5476 | 0.0635 | -3.0214 | 0.2100 | 3.0848 | -19.4451 | -4.3309 | -2.6120 | -2.6117 |
0.5199 | 10.3333 | 775 | 0.5476 | 0.0649 | -3.0176 | 0.2100 | 3.0825 | -19.4326 | -4.3261 | -2.6123 | -2.6120 |
0.5199 | 10.6667 | 800 | 0.5476 | 0.0649 | -3.0260 | 0.2100 | 3.0908 | -19.4605 | -4.3262 | -2.6118 | -2.6116 |
0.5372 | 11.0 | 825 | 0.5476 | 0.0654 | -3.0196 | 0.2100 | 3.0850 | -19.4392 | -4.3243 | -2.6122 | -2.6119 |
0.5199 | 11.3333 | 850 | 0.5476 | 0.0645 | -3.0236 | 0.2100 | 3.0881 | -19.4525 | -4.3275 | -2.6122 | -2.6120 |
0.6065 | 11.6667 | 875 | 0.5476 | 0.0651 | -3.0232 | 0.2100 | 3.0883 | -19.4513 | -4.3254 | -2.6124 | -2.6122 |
0.5718 | 12.0 | 900 | 0.5476 | 0.0648 | -3.0221 | 0.2100 | 3.0869 | -19.4478 | -4.3265 | -2.6124 | -2.6121 |
0.4159 | 12.3333 | 925 | 0.5476 | 0.0650 | -3.0224 | 0.2100 | 3.0873 | -19.4485 | -4.3259 | -2.6124 | -2.6122 |
0.6238 | 12.6667 | 950 | 0.5476 | 0.0650 | -3.0224 | 0.2100 | 3.0873 | -19.4485 | -4.3259 | -2.6124 | -2.6122 |
0.6065 | 13.0 | 975 | 0.5476 | 0.0650 | -3.0224 | 0.2100 | 3.0873 | -19.4485 | -4.3259 | -2.6124 | -2.6122 |
0.5025 | 13.3333 | 1000 | 0.5476 | 0.0650 | -3.0224 | 0.2100 | 3.0873 | -19.4485 | -4.3259 | -2.6124 | -2.6122 |
Framework versions
- Transformers 4.41.2
- Pytorch 2.0.0+cu117
- Datasets 2.19.2
- Tokenizers 0.19.1
- Downloads last month
- 6
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.
Model tree for tsavage68/UTI2_M2_1000steps_1e7rate_03beta_CSFTDPO
Base model
mistralai/Mistral-7B-Instruct-v0.2
Finetuned
tsavage68/UTI_M2_1000steps_1e7rate_SFT