UTI2_M2_1000steps_1e5rate_03beta_CSFTDPO
This model is a fine-tuned version of tsavage68/UTI_M2_1000steps_1e7rate_SFT on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 2.0951
- Rewards/chosen: -5.6444
- Rewards/rejected: -4.1898
- Rewards/accuracies: 0.0300
- Rewards/margins: -1.4546
- Logps/rejected: -23.3400
- Logps/chosen: -23.3572
- Logits/rejected: -3.4660
- Logits/chosen: -3.4660
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 2
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
1.5384 | 0.3333 | 25 | 0.5865 | -0.9393 | -4.6336 | 0.2000 | 3.6943 | -24.8193 | -7.6735 | -2.6286 | -2.6335 |
1.9313 | 0.6667 | 50 | 3.3244 | -8.8014 | -7.0309 | 0.0800 | -1.7705 | -32.8104 | -33.8805 | -2.2793 | -2.2791 |
2.2334 | 1.0 | 75 | 2.1234 | -7.8647 | -6.3982 | 0.0400 | -1.4665 | -30.7014 | -30.7581 | -2.3195 | -2.3195 |
1.5885 | 1.3333 | 100 | 2.2933 | -7.5972 | -6.0666 | 0.0600 | -1.5306 | -29.5961 | -29.8666 | -2.4436 | -2.4436 |
1.5616 | 1.6667 | 125 | 2.1780 | -7.3269 | -5.8384 | 0.0500 | -1.4885 | -28.8354 | -28.9655 | -2.4142 | -2.4142 |
1.9139 | 2.0 | 150 | 2.0829 | -7.6465 | -6.1969 | 0.0300 | -1.4496 | -30.0302 | -30.0307 | -2.3439 | -2.3439 |
1.2991 | 2.3333 | 175 | 2.1037 | -7.8995 | -6.4415 | 0.0300 | -1.4580 | -30.8457 | -30.8743 | -2.2606 | -2.2606 |
2.6811 | 2.6667 | 200 | 2.0701 | -8.1244 | -6.6820 | 0.0300 | -1.4424 | -31.6472 | -31.6237 | -2.1655 | -2.1655 |
0.8733 | 3.0 | 225 | 2.1232 | -8.1779 | -6.7120 | 0.0400 | -1.4660 | -31.7472 | -31.8022 | -2.1965 | -2.1965 |
1.2195 | 3.3333 | 250 | 2.1795 | -8.2279 | -6.7396 | 0.0500 | -1.4883 | -31.8392 | -31.9687 | -2.2261 | -2.2261 |
1.4853 | 3.6667 | 275 | 1.9880 | -2.8100 | -1.4100 | 0.0300 | -1.4000 | -14.0739 | -13.9091 | -0.4257 | -0.4257 |
1.1822 | 4.0 | 300 | 2.0289 | -2.9362 | -1.5140 | 0.0300 | -1.4222 | -14.4206 | -14.3298 | 0.3805 | 0.3805 |
1.7017 | 4.3333 | 325 | 3.1499 | -4.3610 | -2.5853 | 0.0800 | -1.7757 | -17.9917 | -19.0793 | -0.3043 | -0.3043 |
2.0845 | 4.6667 | 350 | 2.1225 | -4.8636 | -3.3976 | 0.0400 | -1.4660 | -20.6994 | -20.7544 | -3.2196 | -3.2196 |
2.604 | 5.0 | 375 | 2.1166 | -4.5986 | -3.1359 | 0.0400 | -1.4627 | -19.8270 | -19.8711 | -3.0726 | -3.0726 |
1.421 | 5.3333 | 400 | 2.1357 | -4.3907 | -2.9196 | 0.0500 | -1.4712 | -19.1058 | -19.1783 | -3.0071 | -3.0071 |
1.7226 | 5.6667 | 425 | 2.0272 | -3.5428 | -2.1216 | 0.0300 | -1.4213 | -16.4459 | -16.3519 | -2.2781 | -2.2781 |
2.4447 | 6.0 | 450 | 2.0689 | -3.6109 | -2.1692 | 0.0300 | -1.4417 | -16.6047 | -16.5789 | -2.3727 | -2.3727 |
1.411 | 6.3333 | 475 | 2.2094 | -4.2537 | -2.7538 | 0.0500 | -1.5000 | -18.5531 | -18.7215 | -2.8542 | -2.8542 |
2.0897 | 6.6667 | 500 | 2.0544 | -2.9041 | -1.4694 | 0.0300 | -1.4347 | -14.2719 | -14.2228 | 1.1911 | 1.1911 |
2.1201 | 7.0 | 525 | 1.9272 | -2.9728 | -1.6125 | 0.0100 | -1.3604 | -14.7488 | -14.4519 | 0.5166 | 0.5166 |
2.0408 | 7.3333 | 550 | 1.9687 | -3.0107 | -1.6220 | 0.0200 | -1.3887 | -14.7806 | -14.5782 | 0.5712 | 0.5712 |
0.9684 | 7.6667 | 575 | 2.1804 | -5.6469 | -4.1579 | 0.0500 | -1.4890 | -23.2336 | -23.3653 | -3.1061 | -3.1061 |
2.1055 | 8.0 | 600 | 2.0889 | -5.5077 | -4.0567 | 0.0300 | -1.4510 | -22.8961 | -22.9014 | -3.1369 | -3.1369 |
1.9687 | 8.3333 | 625 | 2.1354 | -5.4589 | -3.9878 | 0.0500 | -1.4711 | -22.6667 | -22.7388 | -3.2956 | -3.2956 |
2.5668 | 8.6667 | 650 | 2.1562 | -5.5048 | -4.0251 | 0.0500 | -1.4798 | -22.7908 | -22.8919 | -3.4770 | -3.4770 |
1.5897 | 9.0 | 675 | 2.1046 | -5.4925 | -4.0344 | 0.0300 | -1.4581 | -22.8218 | -22.8507 | -3.3988 | -3.3988 |
1.9547 | 9.3333 | 700 | 2.1229 | -5.5252 | -4.0594 | 0.0400 | -1.4658 | -22.9052 | -22.9596 | -3.4297 | -3.4297 |
1.9507 | 9.6667 | 725 | 2.1021 | -5.5176 | -4.0607 | 0.0300 | -1.4569 | -22.9095 | -22.9343 | -3.3848 | -3.3848 |
1.6239 | 10.0 | 750 | 2.1085 | -5.5157 | -4.0561 | 0.0400 | -1.4596 | -22.8942 | -22.9280 | -3.3875 | -3.3875 |
1.8544 | 10.3333 | 775 | 2.1030 | -5.5459 | -4.0885 | 0.0300 | -1.4574 | -23.0023 | -23.0289 | -3.4001 | -3.4001 |
2.4013 | 10.6667 | 800 | 2.0931 | -5.6326 | -4.1789 | 0.0300 | -1.4537 | -23.3036 | -23.3178 | -3.4385 | -3.4385 |
1.2873 | 11.0 | 825 | 2.0988 | -5.6290 | -4.1728 | 0.0300 | -1.4562 | -23.2832 | -23.3059 | -3.4535 | -3.4535 |
1.8211 | 11.3333 | 850 | 2.0989 | -5.6433 | -4.1865 | 0.0300 | -1.4568 | -23.3289 | -23.3535 | -3.4376 | -3.4376 |
1.6314 | 11.6667 | 875 | 2.0969 | -5.6415 | -4.1859 | 0.0300 | -1.4556 | -23.3269 | -23.3475 | -3.4609 | -3.4609 |
1.5862 | 12.0 | 900 | 2.0963 | -5.6447 | -4.1892 | 0.0300 | -1.4555 | -23.3379 | -23.3581 | -3.4652 | -3.4652 |
2.6075 | 12.3333 | 925 | 2.0963 | -5.6449 | -4.1894 | 0.0300 | -1.4555 | -23.3386 | -23.3587 | -3.4643 | -3.4643 |
1.0943 | 12.6667 | 950 | 2.0954 | -5.6448 | -4.1898 | 0.0300 | -1.4550 | -23.3400 | -23.3584 | -3.4660 | -3.4660 |
1.2314 | 13.0 | 975 | 2.0953 | -5.6446 | -4.1898 | 0.0300 | -1.4548 | -23.3399 | -23.3577 | -3.4660 | -3.4660 |
2.0533 | 13.3333 | 1000 | 2.0951 | -5.6444 | -4.1898 | 0.0300 | -1.4546 | -23.3400 | -23.3572 | -3.4660 | -3.4660 |
Framework versions
- Transformers 4.41.2
- Pytorch 2.0.0+cu117
- Datasets 2.19.2
- Tokenizers 0.19.1
- Downloads last month
- 6
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
Model tree for tsavage68/UTI2_M2_1000steps_1e5rate_03beta_CSFTDPO
Base model
mistralai/Mistral-7B-Instruct-v0.2
Finetuned
tsavage68/UTI_M2_1000steps_1e7rate_SFT