UTI2_M2_1000steps_1e5rate_01beta_CSFTDPO
This model is a fine-tuned version of tsavage68/UTI_M2_1000steps_1e7rate_SFT on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.9857
- Rewards/chosen: -5.2195
- Rewards/rejected: -3.4974
- Rewards/accuracies: 0.0400
- Rewards/margins: -1.7222
- Logps/rejected: -74.3298
- Logps/chosen: -72.1167
- Logits/rejected: 1.1725
- Logits/chosen: 1.1724
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 2
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.8718 | 0.3333 | 25 | 0.8484 | -2.9302 | -14.2948 | 0.8400 | 11.3646 | -182.3043 | -49.2239 | -2.7108 | -2.7153 |
1.8764 | 0.6667 | 50 | 2.0267 | -3.9079 | -2.1496 | 0.0500 | -1.7583 | -60.8517 | -59.0005 | -0.0119 | -0.0121 |
2.368 | 1.0 | 75 | 2.1981 | -4.0485 | -2.1677 | 0.1300 | -1.8808 | -61.0330 | -60.4067 | -0.7578 | -0.7578 |
1.802 | 1.3333 | 100 | 2.2809 | -4.0920 | -2.1613 | 0.1600 | -1.9306 | -60.9696 | -60.8411 | -0.8665 | -0.8665 |
1.8302 | 1.6667 | 125 | 2.1253 | -4.1468 | -2.3140 | 0.1000 | -1.8328 | -62.4957 | -61.3891 | -0.6683 | -0.6683 |
2.109 | 2.0 | 150 | 2.0797 | -4.2257 | -2.4259 | 0.1000 | -1.7999 | -63.6147 | -62.1788 | -0.5669 | -0.5669 |
1.7801 | 2.3333 | 175 | 2.0029 | -4.2312 | -2.4934 | 0.0500 | -1.7378 | -64.2898 | -62.2331 | -0.6146 | -0.6146 |
2.0161 | 2.6667 | 200 | 2.1079 | -4.1571 | -2.3364 | 0.1000 | -1.8207 | -62.7205 | -61.4927 | -0.6148 | -0.6148 |
2.1333 | 3.0 | 225 | 2.0488 | -4.3309 | -2.5546 | 0.0700 | -1.7763 | -64.9022 | -63.2307 | -0.4279 | -0.4279 |
1.9667 | 3.3333 | 250 | 2.0994 | -4.1512 | -2.3367 | 0.0900 | -1.8144 | -62.7236 | -61.4335 | -0.6099 | -0.6099 |
1.975 | 3.6667 | 275 | 2.0435 | -4.3243 | -2.5523 | 0.0600 | -1.7720 | -64.8788 | -63.1645 | -0.4185 | -0.4184 |
1.8051 | 4.0 | 300 | 1.9829 | -4.3085 | -2.5886 | 0.0400 | -1.7199 | -65.2420 | -63.0064 | -0.4027 | -0.4027 |
1.953 | 4.3333 | 325 | 2.0072 | -4.3371 | -2.5954 | 0.0500 | -1.7417 | -65.3105 | -63.2929 | -0.4070 | -0.4070 |
2.2799 | 4.6667 | 350 | 2.1923 | -7.3999 | -5.5246 | 0.1300 | -1.8754 | -94.6021 | -93.9210 | -3.4528 | -3.4531 |
1.921 | 5.0 | 375 | 2.2218 | -5.5567 | -3.6593 | 0.1300 | -1.8974 | -75.9492 | -75.4888 | -1.5346 | -1.5339 |
1.8429 | 5.3333 | 400 | 1.9854 | -7.6870 | -5.9651 | 0.0400 | -1.7218 | -99.0076 | -96.7912 | -3.1616 | -3.1613 |
1.8022 | 5.6667 | 425 | 1.9533 | -4.2767 | -2.5861 | 0.0200 | -1.6907 | -65.2171 | -62.6890 | 0.9412 | 0.9412 |
2.3129 | 6.0 | 450 | 1.9431 | -4.4284 | -2.7482 | 0.0200 | -1.6803 | -66.8379 | -64.2059 | 0.4988 | 0.4988 |
1.906 | 6.3333 | 475 | 2.0904 | -7.0674 | -5.2585 | 0.0900 | -1.8088 | -91.9414 | -90.5951 | -3.6276 | -3.6276 |
1.6599 | 6.6667 | 500 | 2.3257 | -4.5302 | -2.5743 | 0.1600 | -1.9559 | -65.0988 | -65.2237 | 0.2828 | 0.2828 |
2.1192 | 7.0 | 525 | 2.4249 | -4.6675 | -2.6590 | 0.1900 | -2.0086 | -65.9460 | -66.5970 | 0.4401 | 0.4401 |
1.734 | 7.3333 | 550 | 2.4649 | -4.6820 | -2.6533 | 0.2100 | -2.0287 | -65.8892 | -66.7413 | 0.4168 | 0.4168 |
2.0797 | 7.6667 | 575 | 1.9457 | -5.0708 | -3.3879 | 0.0200 | -1.6829 | -73.2348 | -70.6292 | 1.0740 | 1.0740 |
1.9905 | 8.0 | 600 | 1.8612 | -5.3637 | -3.7940 | 0.0600 | -1.5697 | -77.2963 | -73.5585 | 1.4106 | 1.4106 |
1.9525 | 8.3333 | 625 | 1.9808 | -5.1006 | -3.3827 | 0.0400 | -1.7179 | -73.1830 | -70.9278 | 1.1564 | 1.1564 |
2.0246 | 8.6667 | 650 | 2.0176 | -5.0560 | -3.3053 | 0.0500 | -1.7507 | -72.4090 | -70.4813 | 1.0910 | 1.0910 |
1.9163 | 9.0 | 675 | 1.9146 | -5.2114 | -3.5636 | 0.0600 | -1.6478 | -74.9921 | -72.0358 | 1.2619 | 1.2618 |
1.9831 | 9.3333 | 700 | 2.1370 | -4.9749 | -3.1338 | 0.1100 | -1.8411 | -70.6938 | -69.6701 | 0.9305 | 0.9305 |
2.1009 | 9.6667 | 725 | 2.0270 | -5.0976 | -3.3389 | 0.0500 | -1.7587 | -72.7453 | -70.8974 | 1.0811 | 1.0810 |
1.8532 | 10.0 | 750 | 1.9858 | -5.1569 | -3.4344 | 0.0400 | -1.7226 | -73.6998 | -71.4908 | 1.1467 | 1.1467 |
1.8101 | 10.3333 | 775 | 1.9913 | -5.1561 | -3.4284 | 0.0400 | -1.7277 | -73.6404 | -71.4823 | 1.1431 | 1.1431 |
1.7788 | 10.6667 | 800 | 1.9572 | -5.2409 | -3.5461 | 0.0200 | -1.6948 | -74.8174 | -72.3310 | 1.2172 | 1.2171 |
1.9172 | 11.0 | 825 | 1.9851 | -5.1923 | -3.4705 | 0.0400 | -1.7218 | -74.0612 | -71.8445 | 1.1654 | 1.1654 |
1.9927 | 11.3333 | 850 | 1.9926 | -5.1865 | -3.4579 | 0.0400 | -1.7287 | -73.9347 | -71.7869 | 1.1538 | 1.1538 |
1.7894 | 11.6667 | 875 | 1.9762 | -5.2363 | -3.5228 | 0.0300 | -1.7135 | -74.5845 | -72.2844 | 1.1749 | 1.1749 |
1.7495 | 12.0 | 900 | 1.9855 | -5.2126 | -3.4905 | 0.0400 | -1.7220 | -74.2616 | -72.0471 | 1.1714 | 1.1713 |
1.8748 | 12.3333 | 925 | 1.9857 | -5.2150 | -3.4928 | 0.0400 | -1.7222 | -74.2844 | -72.0716 | 1.1714 | 1.1713 |
1.8576 | 12.6667 | 950 | 1.9853 | -5.2202 | -3.4983 | 0.0400 | -1.7218 | -74.3394 | -72.1231 | 1.1732 | 1.1732 |
1.9874 | 13.0 | 975 | 1.9855 | -5.2193 | -3.4973 | 0.0400 | -1.7219 | -74.3294 | -72.1140 | 1.1725 | 1.1724 |
1.8102 | 13.3333 | 1000 | 1.9857 | -5.2195 | -3.4974 | 0.0400 | -1.7222 | -74.3298 | -72.1167 | 1.1725 | 1.1724 |
Framework versions
- Transformers 4.41.2
- Pytorch 2.0.0+cu117
- Datasets 2.19.2
- Tokenizers 0.19.1
- Downloads last month
- 4
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
Model tree for tsavage68/UTI2_M2_1000steps_1e5rate_01beta_CSFTDPO
Base model
mistralai/Mistral-7B-Instruct-v0.2
Finetuned
tsavage68/UTI_M2_1000steps_1e7rate_SFT