UTI2_M2_1000steps_1e5rate_05beta_CSFTDPO
This model is a fine-tuned version of tsavage68/UTI_M2_1000steps_1e7rate_SFT on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.5498
- Rewards/chosen: -13.0511
- Rewards/rejected: -16.6516
- Rewards/accuracies: 0.2100
- Rewards/margins: 3.6004
- Logps/rejected: -42.6771
- Logps/chosen: -30.6447
- Logits/rejected: 0.0583
- Logits/chosen: 0.0501
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 2
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
2.0624 | 0.3333 | 25 | 2.5226 | -1.8449 | -0.4090 | 0.0700 | -1.4358 | -10.1920 | -8.2322 | -2.5575 | -2.5577 |
3.5044 | 0.6667 | 50 | 4.5890 | -5.8971 | -3.0050 | 0.0800 | -2.8921 | -15.3840 | -16.3367 | -1.8026 | -1.8027 |
3.7907 | 1.0 | 75 | 2.9383 | -6.0760 | -3.7525 | 0.0300 | -2.3235 | -16.8790 | -16.6946 | -3.0195 | -3.0194 |
2.0779 | 1.3333 | 100 | 3.1415 | -5.1412 | -2.7057 | 0.0400 | -2.4355 | -14.7854 | -14.8249 | -2.3865 | -2.3865 |
2.1244 | 1.6667 | 125 | 3.0972 | -4.8789 | -2.4628 | 0.0300 | -2.4161 | -14.2995 | -14.3002 | -1.8462 | -1.8462 |
2.7225 | 2.0 | 150 | 3.0294 | -4.9954 | -2.6129 | 0.0300 | -2.3825 | -14.5998 | -14.5333 | -1.9868 | -1.9868 |
1.5732 | 2.3333 | 175 | 2.9701 | -7.2452 | -4.8894 | 0.0300 | -2.3558 | -19.1528 | -19.0329 | -2.3919 | -2.3919 |
4.1424 | 2.6667 | 200 | 3.0335 | -4.8641 | -2.4777 | 0.0300 | -2.3864 | -14.3295 | -14.2707 | -1.7217 | -1.7217 |
1.0287 | 3.0 | 225 | 3.1359 | -4.7700 | -2.3364 | 0.0400 | -2.4335 | -14.0468 | -14.0824 | -1.6971 | -1.6971 |
1.6456 | 3.3333 | 250 | 3.2455 | -4.7269 | -2.2495 | 0.0500 | -2.4774 | -13.8730 | -13.9962 | -1.3352 | -1.3352 |
2.0659 | 3.6667 | 275 | 2.7595 | -7.9375 | -6.1451 | 0.0900 | -1.7924 | -21.6641 | -20.4174 | 0.7532 | 0.7530 |
2.1423 | 4.0 | 300 | 3.1729 | -6.0067 | -3.5583 | 0.0500 | -2.4484 | -16.4906 | -16.5559 | -1.2394 | -1.2394 |
2.4459 | 4.3333 | 325 | 3.2141 | -8.9788 | -6.5133 | 0.0500 | -2.4655 | -22.4005 | -22.5000 | -3.0378 | -3.0378 |
3.0794 | 4.6667 | 350 | 3.1602 | -14.0839 | -11.6405 | 0.0400 | -2.4434 | -32.6550 | -32.7103 | -2.5895 | -2.5895 |
3.9364 | 5.0 | 375 | 3.0988 | -14.3023 | -11.8867 | 0.0300 | -2.4156 | -33.1474 | -33.1471 | -2.5474 | -2.5474 |
1.9883 | 5.3333 | 400 | 3.1292 | -14.9159 | -12.4872 | 0.0400 | -2.4286 | -34.3484 | -34.3742 | -2.5028 | -2.5028 |
2.4843 | 5.6667 | 425 | 2.9658 | -16.9208 | -14.5704 | 0.0300 | -2.3504 | -38.5147 | -38.3840 | -2.2091 | -2.2091 |
3.7325 | 6.0 | 450 | 3.0443 | -18.2416 | -15.8503 | 0.0300 | -2.3913 | -41.0746 | -41.0257 | -2.0492 | -2.0492 |
1.9807 | 6.3333 | 475 | 3.2023 | -10.0704 | -7.6142 | 0.0500 | -2.4562 | -24.6023 | -24.6833 | -2.7758 | -2.7758 |
2.4126 | 6.6667 | 500 | 2.9411 | -11.3553 | -9.1152 | 0.0300 | -2.2401 | -27.6043 | -27.2530 | -2.9774 | -2.9775 |
3.7081 | 7.0 | 525 | 2.6725 | -16.3050 | -14.2905 | 0.0300 | -2.0145 | -37.9550 | -37.1525 | 1.0618 | 1.0616 |
2.9707 | 7.3333 | 550 | 2.2326 | -12.0067 | -11.0286 | 0.0600 | -0.9781 | -31.4312 | -28.5558 | 1.7090 | 1.7071 |
1.1998 | 7.6667 | 575 | 2.6023 | -6.3024 | -4.7549 | 0.0600 | -1.5476 | -18.8837 | -17.1473 | 0.5970 | 0.5966 |
1.0342 | 8.0 | 600 | 0.5746 | -11.7381 | -22.2043 | 0.2000 | 10.4662 | -53.7825 | -28.0186 | 1.7281 | 1.7098 |
0.5377 | 8.3333 | 625 | 1.7320 | -11.1869 | -19.3957 | 0.1900 | 8.2088 | -48.1653 | -26.9162 | 0.6525 | 0.6424 |
0.9725 | 8.6667 | 650 | 1.1219 | -8.1698 | -11.3167 | 0.1900 | 3.1468 | -32.0073 | -20.8821 | 0.3043 | 0.2998 |
1.6977 | 9.0 | 675 | 0.6161 | -9.9974 | -12.2635 | 0.1800 | 2.2661 | -33.9009 | -24.5372 | 0.0440 | 0.0415 |
0.5025 | 9.3333 | 700 | 0.6105 | -9.9405 | -12.8803 | 0.1900 | 2.9398 | -35.1346 | -24.4235 | 0.0064 | 0.0013 |
0.5025 | 9.6667 | 725 | 0.6058 | -10.3113 | -13.4162 | 0.1900 | 3.1049 | -36.2063 | -25.1651 | -0.9500 | -0.9577 |
0.5908 | 10.0 | 750 | 0.5619 | -12.0967 | -14.9911 | 0.2000 | 2.8944 | -39.3562 | -28.7358 | 0.0605 | 0.0537 |
0.5199 | 10.3333 | 775 | 0.5780 | -12.4712 | -15.8265 | 0.2000 | 3.3553 | -41.0270 | -29.4849 | 0.0794 | 0.0714 |
0.5199 | 10.6667 | 800 | 0.5527 | -12.6098 | -16.1000 | 0.2100 | 3.4901 | -41.5739 | -29.7621 | 0.0901 | 0.0819 |
0.5372 | 11.0 | 825 | 0.5500 | -13.0464 | -16.6460 | 0.2100 | 3.5996 | -42.6660 | -30.6353 | 0.0593 | 0.0511 |
0.5199 | 11.3333 | 850 | 0.5498 | -13.0532 | -16.6508 | 0.2100 | 3.5976 | -42.6755 | -30.6489 | 0.0585 | 0.0503 |
0.6065 | 11.6667 | 875 | 0.5497 | -13.0545 | -16.6501 | 0.2100 | 3.5956 | -42.6742 | -30.6515 | 0.0585 | 0.0503 |
0.5719 | 12.0 | 900 | 0.5498 | -13.0510 | -16.6497 | 0.2100 | 3.5986 | -42.6733 | -30.6445 | 0.0587 | 0.0505 |
0.4159 | 12.3333 | 925 | 0.5495 | -13.0537 | -16.6558 | 0.2100 | 3.6020 | -42.6855 | -30.6499 | 0.0585 | 0.0503 |
0.6238 | 12.6667 | 950 | 0.5499 | -13.0443 | -16.6468 | 0.2100 | 3.6025 | -42.6676 | -30.6311 | 0.0586 | 0.0504 |
0.6065 | 13.0 | 975 | 0.5498 | -13.0511 | -16.6516 | 0.2100 | 3.6004 | -42.6771 | -30.6447 | 0.0583 | 0.0501 |
0.5025 | 13.3333 | 1000 | 0.5498 | -13.0511 | -16.6516 | 0.2100 | 3.6004 | -42.6771 | -30.6447 | 0.0583 | 0.0501 |
Framework versions
- Transformers 4.41.2
- Pytorch 2.0.0+cu117
- Datasets 2.19.2
- Tokenizers 0.19.1
- Downloads last month
- 4
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.
Model tree for tsavage68/UTI2_M2_1000steps_1e5rate_05beta_CSFTDPO
Base model
mistralai/Mistral-7B-Instruct-v0.2
Finetuned
tsavage68/UTI_M2_1000steps_1e7rate_SFT