|
--- |
|
license: apache-2.0 |
|
base_model: mistralai/Mistral-7B-Instruct-v0.1 |
|
tags: |
|
- trl |
|
- dpo |
|
- generated_from_trainer |
|
model-index: |
|
- name: v1_1000_STEPS_1e5_rate_01_beta_DPO |
|
results: [] |
|
--- |
|
|
|
<!-- This model card has been generated automatically according to the information the Trainer had access to. You |
|
should probably proofread and complete it, then remove this comment. --> |
|
|
|
# v1_1000_STEPS_1e5_rate_01_beta_DPO |
|
|
|
This model is a fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) on an unknown dataset. |
|
It achieves the following results on the evaluation set: |
|
- Loss: 1.1843 |
|
- Rewards/chosen: -4.7599 |
|
- Rewards/rejected: -4.6623 |
|
- Rewards/accuracies: 0.4484 |
|
- Rewards/margins: -0.0976 |
|
- Logps/rejected: -63.5028 |
|
- Logps/chosen: -62.8524 |
|
- Logits/rejected: -5.1435 |
|
- Logits/chosen: -5.1435 |
|
|
|
## Model description |
|
|
|
More information needed |
|
|
|
## Intended uses & limitations |
|
|
|
More information needed |
|
|
|
## Training and evaluation data |
|
|
|
More information needed |
|
|
|
## Training procedure |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- learning_rate: 1e-05 |
|
- train_batch_size: 2 |
|
- eval_batch_size: 1 |
|
- seed: 42 |
|
- gradient_accumulation_steps: 2 |
|
- total_train_batch_size: 4 |
|
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 |
|
- lr_scheduler_type: cosine |
|
- lr_scheduler_warmup_steps: 100 |
|
- training_steps: 1000 |
|
|
|
### Training results |
|
|
|
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |
|
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| |
|
| 1.0273 | 0.05 | 50 | 1.0978 | -4.1663 | -4.0603 | 0.3868 | -0.1060 | -57.4823 | -56.9158 | -3.3835 | -3.3835 | |
|
| 1.4867 | 0.1 | 100 | 1.6188 | -8.0229 | -8.1216 | 0.4659 | 0.0987 | -98.0960 | -95.4825 | -3.5244 | -3.5244 | |
|
| 1.3148 | 0.15 | 150 | 1.2591 | -5.2353 | -5.1881 | 0.4637 | -0.0472 | -68.7606 | -67.6062 | -3.9106 | -3.9106 | |
|
| 1.5023 | 0.2 | 200 | 1.2648 | -5.2617 | -5.1757 | 0.4462 | -0.0860 | -68.6362 | -67.8701 | -3.6954 | -3.6954 | |
|
| 1.1555 | 0.24 | 250 | 1.2170 | -4.8688 | -4.7930 | 0.4505 | -0.0758 | -64.8100 | -63.9413 | -3.9459 | -3.9459 | |
|
| 1.4516 | 0.29 | 300 | 1.2267 | -5.0560 | -4.9621 | 0.4286 | -0.0940 | -66.5001 | -65.8132 | -4.3050 | -4.3050 | |
|
| 1.2594 | 0.34 | 350 | 1.2134 | -4.8394 | -4.7583 | 0.4440 | -0.0811 | -64.4627 | -63.6475 | -4.6928 | -4.6928 | |
|
| 2.0063 | 0.39 | 400 | 1.1957 | -4.7217 | -4.6265 | 0.4462 | -0.0952 | -63.1444 | -62.4697 | -5.2439 | -5.2439 | |
|
| 1.8188 | 0.44 | 450 | 1.2070 | -4.9169 | -4.8208 | 0.4462 | -0.0962 | -65.0873 | -64.4224 | -5.5926 | -5.5926 | |
|
| 1.6531 | 0.49 | 500 | 1.2595 | -5.3935 | -5.3580 | 0.4374 | -0.0356 | -70.4593 | -69.1884 | -2.4458 | -2.4457 | |
|
| 1.6375 | 0.54 | 550 | 1.2036 | -4.7802 | -4.6765 | 0.4418 | -0.1036 | -63.6449 | -63.0546 | -5.4670 | -5.4670 | |
|
| 1.0633 | 0.59 | 600 | 1.2013 | -4.7613 | -4.6562 | 0.4396 | -0.1051 | -63.4414 | -62.8664 | -5.4292 | -5.4292 | |
|
| 1.7188 | 0.64 | 650 | 1.1996 | -4.7873 | -4.6740 | 0.4484 | -0.1133 | -63.6198 | -63.1264 | -5.5429 | -5.5429 | |
|
| 1.5469 | 0.68 | 700 | 1.1910 | -4.7298 | -4.6299 | 0.4484 | -0.1000 | -63.1784 | -62.5515 | -5.2089 | -5.2089 | |
|
| 1.0102 | 0.73 | 750 | 1.1953 | -4.7801 | -4.6716 | 0.4462 | -0.1085 | -63.5956 | -63.0540 | -5.6196 | -5.6196 | |
|
| 0.8289 | 0.78 | 800 | 1.1935 | -4.7729 | -4.6677 | 0.4484 | -0.1051 | -63.5568 | -62.9817 | -5.5697 | -5.5697 | |
|
| 1.8281 | 0.83 | 850 | 1.1860 | -4.7551 | -4.6562 | 0.4484 | -0.0989 | -63.4419 | -62.8043 | -5.1995 | -5.1995 | |
|
| 1.193 | 0.88 | 900 | 1.1845 | -4.7609 | -4.6632 | 0.4484 | -0.0977 | -63.5115 | -62.8620 | -5.1522 | -5.1522 | |
|
| 1.6672 | 0.93 | 950 | 1.1844 | -4.7599 | -4.6622 | 0.4484 | -0.0977 | -63.5018 | -62.8523 | -5.1417 | -5.1417 | |
|
| 1.4906 | 0.98 | 1000 | 1.1843 | -4.7599 | -4.6623 | 0.4484 | -0.0976 | -63.5028 | -62.8524 | -5.1435 | -5.1435 | |
|
|
|
|
|
### Framework versions |
|
|
|
- Transformers 4.39.1 |
|
- Pytorch 2.0.0+cu117 |
|
- Datasets 2.18.0 |
|
- Tokenizers 0.15.2 |
|
|