UTI2_M2_300steps_1e8rate_03beta_CSFTDPO
This model is a fine-tuned version of tsavage68/UTI_M2_1000steps_1e7rate_SFT on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.6666
- Rewards/chosen: 0.0060
- Rewards/rejected: -0.0495
- Rewards/accuracies: 0.7100
- Rewards/margins: 0.0555
- Logps/rejected: -39.5212
- Logps/chosen: -19.9016
- Logits/rejected: -2.6824
- Logits/chosen: -2.6798
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-08
- train_batch_size: 2
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 300
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6931 | 0.3333 | 25 | 0.6924 | 0.0019 | 0.0000 | 0.1900 | 0.0018 | -39.3560 | -19.9153 | -2.6832 | -2.6806 |
0.6873 | 0.6667 | 50 | 0.6859 | 0.0055 | -0.0103 | 0.5400 | 0.0158 | -39.3904 | -19.9033 | -2.6823 | -2.6798 |
0.6944 | 1.0 | 75 | 0.6937 | -0.0057 | -0.0058 | 0.4500 | 0.0001 | -39.3756 | -19.9405 | -2.6824 | -2.6798 |
0.6899 | 1.3333 | 100 | 0.6855 | -0.0113 | -0.0281 | 0.5600 | 0.0168 | -39.4498 | -19.9591 | -2.6820 | -2.6794 |
0.6858 | 1.6667 | 125 | 0.6752 | 0.0071 | -0.0308 | 0.6400 | 0.0379 | -39.4588 | -19.8979 | -2.6822 | -2.6796 |
0.6767 | 2.0 | 150 | 0.6734 | 0.0063 | -0.0351 | 0.6600 | 0.0415 | -39.4733 | -19.9004 | -2.6827 | -2.6802 |
0.6625 | 2.3333 | 175 | 0.6598 | 0.0094 | -0.0599 | 0.7600 | 0.0693 | -39.5558 | -19.8902 | -2.6809 | -2.6783 |
0.6658 | 2.6667 | 200 | 0.6644 | 0.0077 | -0.0525 | 0.6900 | 0.0602 | -39.5310 | -19.8958 | -2.6823 | -2.6797 |
0.6793 | 3.0 | 225 | 0.6654 | 0.0092 | -0.0489 | 0.7200 | 0.0581 | -39.5192 | -19.8907 | -2.6819 | -2.6793 |
0.6836 | 3.3333 | 250 | 0.6662 | 0.0062 | -0.0499 | 0.7300 | 0.0561 | -39.5225 | -19.9009 | -2.6824 | -2.6798 |
0.6704 | 3.6667 | 275 | 0.6666 | 0.0060 | -0.0495 | 0.7100 | 0.0555 | -39.5212 | -19.9016 | -2.6824 | -2.6798 |
0.6726 | 4.0 | 300 | 0.6666 | 0.0060 | -0.0495 | 0.7100 | 0.0555 | -39.5212 | -19.9016 | -2.6824 | -2.6798 |
Framework versions
- Transformers 4.41.2
- Pytorch 2.0.0+cu117
- Datasets 2.19.2
- Tokenizers 0.19.1
- Downloads last month
- 4
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.
Model tree for tsavage68/UTI2_M2_300steps_1e8rate_03beta_CSFTDPO
Base model
mistralai/Mistral-7B-Instruct-v0.2
Finetuned
tsavage68/UTI_M2_1000steps_1e7rate_SFT