UTI2_M2_1000steps_1e6rate_01beta_CSFTDPO
This model is a fine-tuned version of tsavage68/UTI_M2_1000steps_1e7rate_SFT on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.2525
- Rewards/chosen: -4.6992
- Rewards/rejected: -18.9442
- Rewards/accuracies: 0.8800
- Rewards/margins: 14.2450
- Logps/rejected: -228.7986
- Logps/chosen: -66.9138
- Logits/rejected: -1.9681
- Logits/chosen: -2.0652
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-06
- train_batch_size: 2
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.1981 | 0.3333 | 25 | 0.1707 | 0.4309 | -7.1069 | 0.8800 | 7.5378 | -110.4247 | -15.6125 | -2.4808 | -2.4822 |
0.5206 | 0.6667 | 50 | 0.0694 | 0.4032 | -9.1284 | 0.9000 | 9.5316 | -130.6399 | -15.8895 | -2.4773 | -2.4842 |
0.0347 | 1.0 | 75 | 0.3666 | -3.1187 | -8.7205 | 0.7800 | 5.6018 | -126.5613 | -51.1087 | -2.4899 | -2.4995 |
0.0788 | 1.3333 | 100 | 0.4184 | 0.4123 | -15.2113 | 0.8800 | 15.6236 | -191.4688 | -15.7981 | -2.4522 | -2.4834 |
0.052 | 1.6667 | 125 | 0.4180 | 0.4111 | -15.2137 | 0.8800 | 15.6248 | -191.4934 | -15.8105 | -2.4523 | -2.4835 |
0.1213 | 2.0 | 150 | 0.3560 | -3.5267 | -16.7763 | 0.8800 | 13.2496 | -207.1189 | -55.1884 | -1.9818 | -2.0298 |
0.2125 | 2.3333 | 175 | 0.1987 | -5.3431 | -19.4504 | 0.8900 | 14.1074 | -233.8604 | -73.3521 | -2.0857 | -2.1454 |
0.5195 | 2.6667 | 200 | 0.2487 | -5.1181 | -17.0835 | 0.8700 | 11.9655 | -210.1916 | -71.1022 | -1.9512 | -2.0125 |
0.0347 | 3.0 | 225 | 0.5257 | -4.6932 | -16.4184 | 0.8700 | 11.7251 | -203.5397 | -66.8538 | -1.9297 | -1.9821 |
0.0347 | 3.3333 | 250 | 0.5252 | -4.6368 | -16.5979 | 0.8700 | 11.9611 | -205.3350 | -66.2896 | -1.9309 | -1.9836 |
0.0693 | 3.6667 | 275 | 0.5277 | -4.6207 | -16.6952 | 0.8700 | 12.0744 | -206.3079 | -66.1288 | -1.9307 | -1.9833 |
0.3528 | 4.0 | 300 | 0.2783 | -5.2353 | -17.9334 | 0.8800 | 12.6980 | -218.6897 | -72.2747 | -2.1007 | -2.2018 |
0.0866 | 4.3333 | 325 | 0.5393 | -5.1111 | -18.7048 | 0.8700 | 13.5937 | -226.4044 | -71.0329 | -2.0012 | -2.0945 |
0.0347 | 4.6667 | 350 | 0.5419 | -5.1092 | -18.7024 | 0.8700 | 13.5933 | -226.3804 | -71.0133 | -2.0016 | -2.0948 |
0.0693 | 5.0 | 375 | 0.2439 | -4.7227 | -18.8694 | 0.8900 | 14.1466 | -228.0500 | -67.1490 | -1.9662 | -2.0633 |
0.0693 | 5.3333 | 400 | 0.2601 | -4.7346 | -18.8758 | 0.8800 | 14.1412 | -228.1138 | -67.2672 | -1.9665 | -2.0635 |
0.052 | 5.6667 | 425 | 0.2475 | -4.7163 | -18.8815 | 0.8800 | 14.1653 | -228.1716 | -67.0841 | -1.9664 | -2.0634 |
0.0866 | 6.0 | 450 | 0.2467 | -4.7194 | -18.8925 | 0.8800 | 14.1731 | -228.2814 | -67.1153 | -1.9668 | -2.0638 |
0.0173 | 6.3333 | 475 | 0.2504 | -4.7284 | -18.8991 | 0.8800 | 14.1708 | -228.3476 | -67.2055 | -1.9666 | -2.0636 |
0.1386 | 6.6667 | 500 | 0.2482 | -4.7124 | -18.9127 | 0.8800 | 14.2002 | -228.4828 | -67.0460 | -1.9668 | -2.0640 |
0.0173 | 7.0 | 525 | 0.2503 | -4.7152 | -18.9086 | 0.8800 | 14.1934 | -228.4421 | -67.0739 | -1.9667 | -2.0637 |
0.0866 | 7.3333 | 550 | 0.2469 | -4.7026 | -18.9134 | 0.8800 | 14.2107 | -228.4901 | -66.9480 | -1.9667 | -2.0638 |
0.0347 | 7.6667 | 575 | 0.2506 | -4.7014 | -18.9296 | 0.8800 | 14.2282 | -228.6524 | -66.9354 | -1.9673 | -2.0644 |
0.0866 | 8.0 | 600 | 0.2593 | -4.7150 | -18.9165 | 0.8800 | 14.2016 | -228.5215 | -67.0713 | -1.9673 | -2.0643 |
0.104 | 8.3333 | 625 | 0.2524 | -4.7020 | -18.9396 | 0.8800 | 14.2376 | -228.7525 | -66.9418 | -1.9674 | -2.0645 |
0.0173 | 8.6667 | 650 | 0.2503 | -4.7003 | -18.9479 | 0.8800 | 14.2476 | -228.8348 | -66.9245 | -1.9674 | -2.0645 |
0.0693 | 9.0 | 675 | 0.2511 | -4.7024 | -18.9458 | 0.8800 | 14.2433 | -228.8138 | -66.9459 | -1.9682 | -2.0653 |
0.0693 | 9.3333 | 700 | 0.2497 | -4.7013 | -18.9371 | 0.8800 | 14.2358 | -228.7275 | -66.9345 | -1.9677 | -2.0647 |
0.0693 | 9.6667 | 725 | 0.2438 | -4.6908 | -18.9453 | 0.8900 | 14.2545 | -228.8092 | -66.8294 | -1.9677 | -2.0648 |
0.0866 | 10.0 | 750 | 0.2475 | -4.6928 | -18.9485 | 0.8800 | 14.2557 | -228.8416 | -66.8499 | -1.9679 | -2.0651 |
0.0866 | 10.3333 | 775 | 0.2441 | -4.7016 | -18.9488 | 0.8900 | 14.2472 | -228.8439 | -66.9371 | -1.9684 | -2.0656 |
0.0866 | 10.6667 | 800 | 0.2484 | -4.7051 | -18.9371 | 0.8800 | 14.2320 | -228.7267 | -66.9721 | -1.9685 | -2.0656 |
0.0693 | 11.0 | 825 | 0.2421 | -4.6951 | -18.9478 | 0.8900 | 14.2526 | -228.8337 | -66.8728 | -1.9685 | -2.0656 |
0.052 | 11.3333 | 850 | 0.2506 | -4.7064 | -18.9338 | 0.8800 | 14.2275 | -228.6942 | -66.9851 | -1.9684 | -2.0655 |
0.0693 | 11.6667 | 875 | 0.2504 | -4.6954 | -18.9498 | 0.8800 | 14.2544 | -228.8539 | -66.8751 | -1.9685 | -2.0657 |
0.0693 | 12.0 | 900 | 0.2470 | -4.6944 | -18.9471 | 0.8800 | 14.2527 | -228.8271 | -66.8655 | -1.9680 | -2.0651 |
0.0347 | 12.3333 | 925 | 0.2470 | -4.6947 | -18.9470 | 0.8800 | 14.2523 | -228.8264 | -66.8687 | -1.9680 | -2.0651 |
0.0693 | 12.6667 | 950 | 0.2525 | -4.6997 | -18.9435 | 0.8800 | 14.2438 | -228.7915 | -66.9187 | -1.9681 | -2.0652 |
0.052 | 13.0 | 975 | 0.2525 | -4.6991 | -18.9443 | 0.8800 | 14.2452 | -228.7995 | -66.9127 | -1.9681 | -2.0652 |
0.0693 | 13.3333 | 1000 | 0.2525 | -4.6992 | -18.9442 | 0.8800 | 14.2450 | -228.7986 | -66.9138 | -1.9681 | -2.0652 |
Framework versions
- Transformers 4.41.2
- Pytorch 2.0.0+cu117
- Datasets 2.19.2
- Tokenizers 0.19.1
- Downloads last month
- 4
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
Model tree for tsavage68/UTI2_M2_1000steps_1e6rate_01beta_CSFTDPO
Base model
mistralai/Mistral-7B-Instruct-v0.2
Finetuned
tsavage68/UTI_M2_1000steps_1e7rate_SFT