openhermes-mistral-2.5-7b-dpo-test
This model is a fine-tuned version of teknium/OpenHermes-2.5-Mistral-7B on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.4487
- Rewards/chosen: -0.2951
- Rewards/rejected: -2.2421
- Rewards/accuracies: 0.875
- Rewards/margins: 1.9470
- Logps/rejected: -257.4751
- Logps/chosen: -204.3027
- Logits/rejected: -3.0752
- Logits/chosen: -3.0485
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 1
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 2
- training_steps: 200
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.1645 | 0.01 | 10 | 0.5339 | 0.3993 | -0.1483 | 0.6875 | 0.5476 | -236.5374 | -197.3593 | -3.1575 | -3.1872 |
0.0519 | 0.01 | 20 | 0.5521 | 0.2239 | -0.4486 | 0.625 | 0.6725 | -239.5405 | -199.1127 | -3.1969 | -3.2456 |
0.1618 | 0.01 | 30 | 0.5866 | -0.0538 | -0.8893 | 0.5625 | 0.8355 | -243.9472 | -201.8902 | -3.2286 | -3.2525 |
0.1752 | 0.02 | 40 | 0.5943 | -0.2184 | -1.2057 | 0.5 | 0.9873 | -247.1112 | -203.5360 | -3.2201 | -3.2477 |
0.3811 | 0.03 | 50 | 0.6973 | -0.6180 | -1.8146 | 0.5 | 1.1966 | -253.2001 | -207.5316 | -3.1943 | -3.2034 |
1.158 | 0.03 | 60 | 0.6347 | -0.4710 | -1.7363 | 0.5625 | 1.2653 | -252.4173 | -206.0622 | -3.1655 | -3.1197 |
0.8751 | 0.04 | 70 | 0.6103 | -0.4061 | -1.5966 | 0.5625 | 1.1905 | -251.0201 | -205.4132 | -3.1360 | -3.0544 |
0.7811 | 0.04 | 80 | 0.6405 | -0.4774 | -1.6574 | 0.5625 | 1.1799 | -251.6278 | -206.1260 | -3.1337 | -3.0492 |
1.4305 | 0.04 | 90 | 0.6257 | -0.4784 | -1.6184 | 0.5625 | 1.1399 | -251.2379 | -206.1361 | -3.1251 | -3.0489 |
0.5478 | 0.05 | 100 | 0.6191 | -0.5317 | -1.7067 | 0.5625 | 1.1750 | -252.1214 | -206.6691 | -3.1207 | -3.0753 |
0.6344 | 0.06 | 110 | 0.5691 | -0.4827 | -1.7734 | 0.5625 | 1.2907 | -252.7882 | -206.1789 | -3.1075 | -3.0806 |
0.5405 | 0.06 | 120 | 0.5337 | -0.4681 | -2.1739 | 0.8125 | 1.7058 | -256.7935 | -206.0332 | -3.1124 | -3.0733 |
0.7848 | 0.07 | 130 | 0.5390 | -0.5288 | -2.3789 | 0.8125 | 1.8501 | -258.8436 | -206.6404 | -3.1019 | -3.0628 |
1.3119 | 0.07 | 140 | 0.4753 | -0.3276 | -2.0907 | 0.875 | 1.7631 | -255.9614 | -204.6279 | -3.0904 | -3.0648 |
0.3636 | 0.07 | 150 | 0.4555 | -0.2566 | -2.0064 | 0.625 | 1.7498 | -255.1179 | -203.9175 | -3.0804 | -3.0640 |
0.427 | 0.08 | 160 | 0.4614 | -0.2900 | -2.0804 | 0.625 | 1.7904 | -255.8585 | -204.2518 | -3.0721 | -3.0518 |
0.8971 | 0.09 | 170 | 0.4629 | -0.3117 | -2.1791 | 0.875 | 1.8673 | -256.8448 | -204.4694 | -3.0711 | -3.0468 |
0.6219 | 0.09 | 180 | 0.4560 | -0.3042 | -2.2114 | 0.875 | 1.9073 | -257.1686 | -204.3934 | -3.0743 | -3.0485 |
0.7551 | 0.1 | 190 | 0.4520 | -0.3007 | -2.2400 | 0.875 | 1.9392 | -257.4540 | -204.3593 | -3.0755 | -3.0481 |
1.0917 | 0.1 | 200 | 0.4487 | -0.2951 | -2.2421 | 0.875 | 1.9470 | -257.4751 | -204.3027 | -3.0752 | -3.0485 |
Framework versions
- Transformers 4.34.1
- Pytorch 2.1.0+cu121
- Datasets 2.14.6
- Tokenizers 0.14.1
Model tree for voxmenthe/openhermes-mistral-2.5-7b-dpo-test
Base model
mistralai/Mistral-7B-v0.1
Finetuned
teknium/OpenHermes-2.5-Mistral-7B