UTI2_M2_1000steps_1e7rate_05beta_CSFTDPO
This model is a fine-tuned version of tsavage68/UTI_M2_1000steps_1e7rate_SFT on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.5476
- Rewards/chosen: 0.0699
- Rewards/rejected: -3.0830
- Rewards/accuracies: 0.2100
- Rewards/margins: 3.1530
- Logps/rejected: -15.5400
- Logps/chosen: -4.4026
- Logits/rejected: -2.6403
- Logits/chosen: -2.6398
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-07
- train_batch_size: 2
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.7009 | 0.3333 | 25 | 0.6667 | 0.0016 | -0.0631 | 0.1900 | 0.0647 | -9.5002 | -4.5393 | -2.7063 | -2.7055 |
0.5794 | 0.6667 | 50 | 0.5690 | 0.0605 | -0.5674 | 0.2100 | 0.6279 | -10.5088 | -4.4215 | -2.6950 | -2.6942 |
0.5772 | 1.0 | 75 | 0.5638 | -0.0374 | -1.8778 | 0.2000 | 1.8404 | -13.1295 | -4.6173 | -2.6653 | -2.6647 |
0.5715 | 1.3333 | 100 | 0.5485 | 0.0321 | -2.2707 | 0.2100 | 2.3028 | -13.9154 | -4.4783 | -2.6560 | -2.6555 |
0.5545 | 1.6667 | 125 | 0.5476 | 0.1013 | -2.5349 | 0.2100 | 2.6363 | -14.4438 | -4.3398 | -2.6499 | -2.6494 |
0.5545 | 2.0 | 150 | 0.5476 | 0.0902 | -2.9376 | 0.2100 | 3.0278 | -15.2492 | -4.3621 | -2.6442 | -2.6437 |
0.5545 | 2.3333 | 175 | 0.5476 | 0.0846 | -2.9244 | 0.2100 | 3.0090 | -15.2229 | -4.3733 | -2.6424 | -2.6419 |
0.4852 | 2.6667 | 200 | 0.5476 | 0.0848 | -2.9648 | 0.2100 | 3.0495 | -15.3035 | -4.3729 | -2.6423 | -2.6417 |
0.6412 | 3.0 | 225 | 0.5476 | 0.0853 | -2.9694 | 0.2100 | 3.0547 | -15.3127 | -4.3718 | -2.6421 | -2.6415 |
0.5545 | 3.3333 | 250 | 0.5476 | 0.0892 | -2.9671 | 0.2100 | 3.0563 | -15.3081 | -4.3640 | -2.6429 | -2.6424 |
0.5372 | 3.6667 | 275 | 0.5476 | 0.0803 | -2.9507 | 0.2100 | 3.0310 | -15.2754 | -4.3819 | -2.6416 | -2.6410 |
0.5892 | 4.0 | 300 | 0.5476 | 0.0791 | -3.0080 | 0.2100 | 3.0871 | -15.3899 | -4.3842 | -2.6421 | -2.6415 |
0.4679 | 4.3333 | 325 | 0.5476 | 0.0770 | -3.0043 | 0.2100 | 3.0814 | -15.3826 | -4.3884 | -2.6420 | -2.6415 |
0.5718 | 4.6667 | 350 | 0.5476 | 0.0767 | -3.0040 | 0.2100 | 3.0808 | -15.3820 | -4.3890 | -2.6414 | -2.6409 |
0.5199 | 5.0 | 375 | 0.5476 | 0.0830 | -3.0444 | 0.2100 | 3.1274 | -15.4628 | -4.3765 | -2.6415 | -2.6410 |
0.5025 | 5.3333 | 400 | 0.5476 | 0.0784 | -3.0520 | 0.2100 | 3.1304 | -15.4779 | -4.3857 | -2.6406 | -2.6401 |
0.5199 | 5.6667 | 425 | 0.5476 | 0.0772 | -3.0417 | 0.2100 | 3.1189 | -15.4575 | -4.3882 | -2.6418 | -2.6412 |
0.5025 | 6.0 | 450 | 0.5476 | 0.0775 | -3.0690 | 0.2100 | 3.1465 | -15.5119 | -4.3875 | -2.6403 | -2.6398 |
0.5718 | 6.3333 | 475 | 0.5476 | 0.0722 | -3.0608 | 0.2100 | 3.1330 | -15.4956 | -4.3980 | -2.6403 | -2.6398 |
0.5718 | 6.6667 | 500 | 0.5476 | 0.0733 | -3.0661 | 0.2100 | 3.1394 | -15.5061 | -4.3958 | -2.6403 | -2.6397 |
0.5025 | 7.0 | 525 | 0.5476 | 0.0687 | -3.0692 | 0.2100 | 3.1379 | -15.5123 | -4.4051 | -2.6407 | -2.6402 |
0.5199 | 7.3333 | 550 | 0.5476 | 0.0691 | -3.0762 | 0.2100 | 3.1454 | -15.5265 | -4.4042 | -2.6401 | -2.6396 |
0.5372 | 7.6667 | 575 | 0.5476 | 0.0728 | -3.0945 | 0.2100 | 3.1672 | -15.5629 | -4.3970 | -2.6414 | -2.6409 |
0.5718 | 8.0 | 600 | 0.5476 | 0.0736 | -3.0806 | 0.2100 | 3.1541 | -15.5351 | -4.3953 | -2.6405 | -2.6400 |
0.5372 | 8.3333 | 625 | 0.5476 | 0.0806 | -3.0954 | 0.2100 | 3.1759 | -15.5647 | -4.3813 | -2.6410 | -2.6405 |
0.4332 | 8.6667 | 650 | 0.5476 | 0.0762 | -3.0922 | 0.2100 | 3.1684 | -15.5583 | -4.3900 | -2.6412 | -2.6407 |
0.5372 | 9.0 | 675 | 0.5476 | 0.0738 | -3.0924 | 0.2100 | 3.1662 | -15.5587 | -4.3948 | -2.6408 | -2.6403 |
0.5025 | 9.3333 | 700 | 0.5476 | 0.0702 | -3.0892 | 0.2100 | 3.1594 | -15.5524 | -4.4020 | -2.6405 | -2.6400 |
0.5025 | 9.6667 | 725 | 0.5476 | 0.0641 | -3.0956 | 0.2100 | 3.1597 | -15.5651 | -4.4142 | -2.6410 | -2.6405 |
0.5892 | 10.0 | 750 | 0.5476 | 0.0696 | -3.0933 | 0.2100 | 3.1630 | -15.5606 | -4.4032 | -2.6403 | -2.6398 |
0.5199 | 10.3333 | 775 | 0.5476 | 0.0764 | -3.0810 | 0.2100 | 3.1574 | -15.5361 | -4.3897 | -2.6404 | -2.6399 |
0.5199 | 10.6667 | 800 | 0.5476 | 0.0750 | -3.0945 | 0.2100 | 3.1695 | -15.5629 | -4.3925 | -2.6399 | -2.6394 |
0.5372 | 11.0 | 825 | 0.5477 | 0.0727 | -3.0777 | 0.2100 | 3.1504 | -15.5293 | -4.3970 | -2.6405 | -2.6399 |
0.5199 | 11.3333 | 850 | 0.5477 | 0.0760 | -3.0775 | 0.2100 | 3.1534 | -15.5289 | -4.3905 | -2.6402 | -2.6397 |
0.6065 | 11.6667 | 875 | 0.5476 | 0.0737 | -3.0877 | 0.2100 | 3.1615 | -15.5495 | -4.3950 | -2.6404 | -2.6398 |
0.5718 | 12.0 | 900 | 0.5476 | 0.0713 | -3.0915 | 0.2100 | 3.1628 | -15.5570 | -4.3999 | -2.6403 | -2.6398 |
0.4159 | 12.3333 | 925 | 0.5476 | 0.0687 | -3.0820 | 0.2100 | 3.1507 | -15.5379 | -4.4051 | -2.6403 | -2.6398 |
0.6238 | 12.6667 | 950 | 0.5476 | 0.0699 | -3.0830 | 0.2100 | 3.1530 | -15.5400 | -4.4026 | -2.6403 | -2.6398 |
0.6065 | 13.0 | 975 | 0.5476 | 0.0699 | -3.0830 | 0.2100 | 3.1530 | -15.5400 | -4.4026 | -2.6403 | -2.6398 |
0.5025 | 13.3333 | 1000 | 0.5476 | 0.0699 | -3.0830 | 0.2100 | 3.1530 | -15.5400 | -4.4026 | -2.6403 | -2.6398 |
Framework versions
- Transformers 4.41.2
- Pytorch 2.0.0+cu117
- Datasets 2.19.2
- Tokenizers 0.19.1
- Downloads last month
- 4
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
Model tree for tsavage68/UTI2_M2_1000steps_1e7rate_05beta_CSFTDPO
Base model
mistralai/Mistral-7B-Instruct-v0.2
Finetuned
tsavage68/UTI_M2_1000steps_1e7rate_SFT