tsavage68's picture
End of training
26a8413 verified
---
license: apache-2.0
base_model: mistralai/Mistral-7B-Instruct-v0.1
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: v1_1000_STEPS_1e5_rate_01_beta_DPO
results: []
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# v1_1000_STEPS_1e5_rate_01_beta_DPO
This model is a fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 1.1843
- Rewards/chosen: -4.7599
- Rewards/rejected: -4.6623
- Rewards/accuracies: 0.4484
- Rewards/margins: -0.0976
- Logps/rejected: -63.5028
- Logps/chosen: -62.8524
- Logits/rejected: -5.1435
- Logits/chosen: -5.1435
## Model description
More information needed
## Intended uses & limitations
More information needed
## Training and evaluation data
More information needed
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 2
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000
### Training results
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 1.0273 | 0.05 | 50 | 1.0978 | -4.1663 | -4.0603 | 0.3868 | -0.1060 | -57.4823 | -56.9158 | -3.3835 | -3.3835 |
| 1.4867 | 0.1 | 100 | 1.6188 | -8.0229 | -8.1216 | 0.4659 | 0.0987 | -98.0960 | -95.4825 | -3.5244 | -3.5244 |
| 1.3148 | 0.15 | 150 | 1.2591 | -5.2353 | -5.1881 | 0.4637 | -0.0472 | -68.7606 | -67.6062 | -3.9106 | -3.9106 |
| 1.5023 | 0.2 | 200 | 1.2648 | -5.2617 | -5.1757 | 0.4462 | -0.0860 | -68.6362 | -67.8701 | -3.6954 | -3.6954 |
| 1.1555 | 0.24 | 250 | 1.2170 | -4.8688 | -4.7930 | 0.4505 | -0.0758 | -64.8100 | -63.9413 | -3.9459 | -3.9459 |
| 1.4516 | 0.29 | 300 | 1.2267 | -5.0560 | -4.9621 | 0.4286 | -0.0940 | -66.5001 | -65.8132 | -4.3050 | -4.3050 |
| 1.2594 | 0.34 | 350 | 1.2134 | -4.8394 | -4.7583 | 0.4440 | -0.0811 | -64.4627 | -63.6475 | -4.6928 | -4.6928 |
| 2.0063 | 0.39 | 400 | 1.1957 | -4.7217 | -4.6265 | 0.4462 | -0.0952 | -63.1444 | -62.4697 | -5.2439 | -5.2439 |
| 1.8188 | 0.44 | 450 | 1.2070 | -4.9169 | -4.8208 | 0.4462 | -0.0962 | -65.0873 | -64.4224 | -5.5926 | -5.5926 |
| 1.6531 | 0.49 | 500 | 1.2595 | -5.3935 | -5.3580 | 0.4374 | -0.0356 | -70.4593 | -69.1884 | -2.4458 | -2.4457 |
| 1.6375 | 0.54 | 550 | 1.2036 | -4.7802 | -4.6765 | 0.4418 | -0.1036 | -63.6449 | -63.0546 | -5.4670 | -5.4670 |
| 1.0633 | 0.59 | 600 | 1.2013 | -4.7613 | -4.6562 | 0.4396 | -0.1051 | -63.4414 | -62.8664 | -5.4292 | -5.4292 |
| 1.7188 | 0.64 | 650 | 1.1996 | -4.7873 | -4.6740 | 0.4484 | -0.1133 | -63.6198 | -63.1264 | -5.5429 | -5.5429 |
| 1.5469 | 0.68 | 700 | 1.1910 | -4.7298 | -4.6299 | 0.4484 | -0.1000 | -63.1784 | -62.5515 | -5.2089 | -5.2089 |
| 1.0102 | 0.73 | 750 | 1.1953 | -4.7801 | -4.6716 | 0.4462 | -0.1085 | -63.5956 | -63.0540 | -5.6196 | -5.6196 |
| 0.8289 | 0.78 | 800 | 1.1935 | -4.7729 | -4.6677 | 0.4484 | -0.1051 | -63.5568 | -62.9817 | -5.5697 | -5.5697 |
| 1.8281 | 0.83 | 850 | 1.1860 | -4.7551 | -4.6562 | 0.4484 | -0.0989 | -63.4419 | -62.8043 | -5.1995 | -5.1995 |
| 1.193 | 0.88 | 900 | 1.1845 | -4.7609 | -4.6632 | 0.4484 | -0.0977 | -63.5115 | -62.8620 | -5.1522 | -5.1522 |
| 1.6672 | 0.93 | 950 | 1.1844 | -4.7599 | -4.6622 | 0.4484 | -0.0977 | -63.5018 | -62.8523 | -5.1417 | -5.1417 |
| 1.4906 | 0.98 | 1000 | 1.1843 | -4.7599 | -4.6623 | 0.4484 | -0.0976 | -63.5028 | -62.8524 | -5.1435 | -5.1435 |
### Framework versions
- Transformers 4.39.1
- Pytorch 2.0.0+cu117
- Datasets 2.18.0
- Tokenizers 0.15.2