tsavage68's picture
End of training
5b89da6 verified
---
license: apache-2.0
base_model: mistralai/Mistral-7B-Instruct-v0.1
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: v1_1000_STEPS_1e5_rate_05_beta_DPO
results: []
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# v1_1000_STEPS_1e5_rate_05_beta_DPO
This model is a fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 4.8688
- Rewards/chosen: -27.6674
- Rewards/rejected: -27.1162
- Rewards/accuracies: 0.4330
- Rewards/margins: -0.5512
- Logps/rejected: -71.1119
- Logps/chosen: -70.5878
- Logits/rejected: -5.9442
- Logits/chosen: -5.9442
## Model description
More information needed
## Intended uses & limitations
More information needed
## Training and evaluation data
More information needed
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 2
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000
### Training results
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 1.5553 | 0.05 | 50 | 1.8706 | -4.7825 | -4.7649 | 0.4286 | -0.0176 | -26.4094 | -24.8181 | -3.5109 | -3.5109 |
| 5.8188 | 0.1 | 100 | 5.0281 | -26.6571 | -26.6181 | 0.4308 | -0.0390 | -70.1157 | -68.5673 | -1.3923 | -1.3923 |
| 5.8033 | 0.15 | 150 | 7.1546 | -40.4235 | -40.6296 | 0.4593 | 0.2060 | -98.1387 | -96.1001 | -3.5667 | -3.5667 |
| 7.8696 | 0.2 | 200 | 5.5313 | -29.1486 | -29.0376 | 0.4505 | -0.1109 | -74.9547 | -73.5501 | -3.4414 | -3.4414 |
| 4.4882 | 0.24 | 250 | 5.1766 | -27.5527 | -27.1630 | 0.4308 | -0.3897 | -71.2056 | -70.3585 | -4.9735 | -4.9735 |
| 6.4403 | 0.29 | 300 | 5.1323 | -27.5513 | -27.0082 | 0.4440 | -0.5431 | -70.8959 | -70.3556 | -5.3879 | -5.3879 |
| 5.2094 | 0.34 | 350 | 5.0288 | -27.1714 | -26.6651 | 0.4418 | -0.5063 | -70.2098 | -69.5959 | -5.6729 | -5.6729 |
| 9.8925 | 0.39 | 400 | 4.8892 | -27.3549 | -26.8568 | 0.4462 | -0.4981 | -70.5932 | -69.9629 | -5.8703 | -5.8703 |
| 8.279 | 0.44 | 450 | 4.8903 | -27.7693 | -27.3098 | 0.4374 | -0.4595 | -71.4991 | -70.7916 | -5.9049 | -5.9049 |
| 6.9741 | 0.49 | 500 | 4.9634 | -27.7246 | -27.2569 | 0.4484 | -0.4677 | -71.3933 | -70.7022 | -5.9114 | -5.9114 |
| 7.5287 | 0.54 | 550 | 4.9185 | -27.7575 | -27.2719 | 0.4505 | -0.4857 | -71.4233 | -70.7681 | -5.9444 | -5.9444 |
| 4.1175 | 0.59 | 600 | 4.9414 | -27.6038 | -27.0763 | 0.4418 | -0.5275 | -71.0321 | -70.4606 | -5.9236 | -5.9236 |
| 7.6353 | 0.64 | 650 | 4.8901 | -27.4506 | -26.8656 | 0.4308 | -0.5850 | -70.6107 | -70.1542 | -5.9567 | -5.9567 |
| 6.5311 | 0.68 | 700 | 4.8640 | -27.4782 | -26.9239 | 0.4242 | -0.5543 | -70.7274 | -70.2095 | -5.8651 | -5.8651 |
| 3.8896 | 0.73 | 750 | 4.8727 | -27.6349 | -27.0700 | 0.4374 | -0.5649 | -71.0195 | -70.5229 | -5.9781 | -5.9781 |
| 2.4094 | 0.78 | 800 | 4.8792 | -27.7076 | -27.1530 | 0.4352 | -0.5546 | -71.1855 | -70.6682 | -5.9983 | -5.9983 |
| 8.463 | 0.83 | 850 | 4.8683 | -27.6713 | -27.1213 | 0.4308 | -0.5500 | -71.1221 | -70.5956 | -5.9384 | -5.9384 |
| 5.1159 | 0.88 | 900 | 4.8691 | -27.6713 | -27.1222 | 0.4352 | -0.5491 | -71.1239 | -70.5956 | -5.9441 | -5.9441 |
| 7.8796 | 0.93 | 950 | 4.8688 | -27.6673 | -27.1163 | 0.4330 | -0.5510 | -71.1121 | -70.5876 | -5.9442 | -5.9442 |
| 6.2745 | 0.98 | 1000 | 4.8688 | -27.6674 | -27.1162 | 0.4330 | -0.5512 | -71.1119 | -70.5878 | -5.9442 | -5.9442 |
### Framework versions
- Transformers 4.39.1
- Pytorch 2.0.0+cu117
- Datasets 2.18.0
- Tokenizers 0.15.2