End of training

5b89da6 verified 7 months ago

5.77 kB

	---
	license: apache-2.0
	base_model: mistralai/Mistral-7B-Instruct-v0.1
	tags:
	- trl
	- dpo
	- generated_from_trainer
	model-index:
	- name: v1_1000_STEPS_1e5_rate_05_beta_DPO
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# v1_1000_STEPS_1e5_rate_05_beta_DPO

	This model is a fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 4.8688
	- Rewards/chosen: -27.6674
	- Rewards/rejected: -27.1162
	- Rewards/accuracies: 0.4330
	- Rewards/margins: -0.5512
	- Logps/rejected: -71.1119
	- Logps/chosen: -70.5878
	- Logits/rejected: -5.9442
	- Logits/chosen: -5.9442

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1e-05
	- train_batch_size: 2
	- eval_batch_size: 1
	- seed: 42
	- gradient_accumulation_steps: 2
	- total_train_batch_size: 4
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_steps: 100
	- training_steps: 1000

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rewards/chosen \| Rewards/rejected \| Rewards/accuracies \| Rewards/margins \| Logps/rejected \| Logps/chosen \| Logits/rejected \| Logits/chosen \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:--------------:\|:----------------:\|:------------------:\|:---------------:\|:--------------:\|:------------:\|:---------------:\|:-------------:\|
	\| 1.5553 \| 0.05 \| 50 \| 1.8706 \| -4.7825 \| -4.7649 \| 0.4286 \| -0.0176 \| -26.4094 \| -24.8181 \| -3.5109 \| -3.5109 \|
	\| 5.8188 \| 0.1 \| 100 \| 5.0281 \| -26.6571 \| -26.6181 \| 0.4308 \| -0.0390 \| -70.1157 \| -68.5673 \| -1.3923 \| -1.3923 \|
	\| 5.8033 \| 0.15 \| 150 \| 7.1546 \| -40.4235 \| -40.6296 \| 0.4593 \| 0.2060 \| -98.1387 \| -96.1001 \| -3.5667 \| -3.5667 \|
	\| 7.8696 \| 0.2 \| 200 \| 5.5313 \| -29.1486 \| -29.0376 \| 0.4505 \| -0.1109 \| -74.9547 \| -73.5501 \| -3.4414 \| -3.4414 \|
	\| 4.4882 \| 0.24 \| 250 \| 5.1766 \| -27.5527 \| -27.1630 \| 0.4308 \| -0.3897 \| -71.2056 \| -70.3585 \| -4.9735 \| -4.9735 \|
	\| 6.4403 \| 0.29 \| 300 \| 5.1323 \| -27.5513 \| -27.0082 \| 0.4440 \| -0.5431 \| -70.8959 \| -70.3556 \| -5.3879 \| -5.3879 \|
	\| 5.2094 \| 0.34 \| 350 \| 5.0288 \| -27.1714 \| -26.6651 \| 0.4418 \| -0.5063 \| -70.2098 \| -69.5959 \| -5.6729 \| -5.6729 \|
	\| 9.8925 \| 0.39 \| 400 \| 4.8892 \| -27.3549 \| -26.8568 \| 0.4462 \| -0.4981 \| -70.5932 \| -69.9629 \| -5.8703 \| -5.8703 \|
	\| 8.279 \| 0.44 \| 450 \| 4.8903 \| -27.7693 \| -27.3098 \| 0.4374 \| -0.4595 \| -71.4991 \| -70.7916 \| -5.9049 \| -5.9049 \|
	\| 6.9741 \| 0.49 \| 500 \| 4.9634 \| -27.7246 \| -27.2569 \| 0.4484 \| -0.4677 \| -71.3933 \| -70.7022 \| -5.9114 \| -5.9114 \|
	\| 7.5287 \| 0.54 \| 550 \| 4.9185 \| -27.7575 \| -27.2719 \| 0.4505 \| -0.4857 \| -71.4233 \| -70.7681 \| -5.9444 \| -5.9444 \|
	\| 4.1175 \| 0.59 \| 600 \| 4.9414 \| -27.6038 \| -27.0763 \| 0.4418 \| -0.5275 \| -71.0321 \| -70.4606 \| -5.9236 \| -5.9236 \|
	\| 7.6353 \| 0.64 \| 650 \| 4.8901 \| -27.4506 \| -26.8656 \| 0.4308 \| -0.5850 \| -70.6107 \| -70.1542 \| -5.9567 \| -5.9567 \|
	\| 6.5311 \| 0.68 \| 700 \| 4.8640 \| -27.4782 \| -26.9239 \| 0.4242 \| -0.5543 \| -70.7274 \| -70.2095 \| -5.8651 \| -5.8651 \|
	\| 3.8896 \| 0.73 \| 750 \| 4.8727 \| -27.6349 \| -27.0700 \| 0.4374 \| -0.5649 \| -71.0195 \| -70.5229 \| -5.9781 \| -5.9781 \|
	\| 2.4094 \| 0.78 \| 800 \| 4.8792 \| -27.7076 \| -27.1530 \| 0.4352 \| -0.5546 \| -71.1855 \| -70.6682 \| -5.9983 \| -5.9983 \|
	\| 8.463 \| 0.83 \| 850 \| 4.8683 \| -27.6713 \| -27.1213 \| 0.4308 \| -0.5500 \| -71.1221 \| -70.5956 \| -5.9384 \| -5.9384 \|
	\| 5.1159 \| 0.88 \| 900 \| 4.8691 \| -27.6713 \| -27.1222 \| 0.4352 \| -0.5491 \| -71.1239 \| -70.5956 \| -5.9441 \| -5.9441 \|
	\| 7.8796 \| 0.93 \| 950 \| 4.8688 \| -27.6673 \| -27.1163 \| 0.4330 \| -0.5510 \| -71.1121 \| -70.5876 \| -5.9442 \| -5.9442 \|
	\| 6.2745 \| 0.98 \| 1000 \| 4.8688 \| -27.6674 \| -27.1162 \| 0.4330 \| -0.5512 \| -71.1119 \| -70.5878 \| -5.9442 \| -5.9442 \|


	### Framework versions

	- Transformers 4.39.1
	- Pytorch 2.0.0+cu117
	- Datasets 2.18.0
	- Tokenizers 0.15.2