End of training

7ae8167 verified 10 months ago

5.74 kB

	---
	license: apache-2.0
	base_model: mosaicml/mpt-7b-instruct
	tags:
	- trl
	- dpo
	- generated_from_trainer
	model-index:
	- name: mpt_1000_STEPS_1e5_rate_05_beta_DPO
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# mpt_1000_STEPS_1e5_rate_05_beta_DPO

	This model is a fine-tuned version of [mosaicml/mpt-7b-instruct](https://huggingface.co/mosaicml/mpt-7b-instruct) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 2.1807
	- Rewards/chosen: -19.4532
	- Rewards/rejected: -19.2274
	- Rewards/accuracies: 0.5033
	- Rewards/margins: -0.2258
	- Logps/rejected: -60.0122
	- Logps/chosen: -59.6986
	- Logits/rejected: 7.5623
	- Logits/chosen: 7.5620

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1e-05
	- train_batch_size: 2
	- eval_batch_size: 1
	- seed: 42
	- gradient_accumulation_steps: 2
	- total_train_batch_size: 4
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_steps: 100
	- training_steps: 1000

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rewards/chosen \| Rewards/rejected \| Rewards/accuracies \| Rewards/margins \| Logps/rejected \| Logps/chosen \| Logits/rejected \| Logits/chosen \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:--------------:\|:----------------:\|:------------------:\|:---------------:\|:--------------:\|:------------:\|:---------------:\|:-------------:\|
	\| 1.5203 \| 0.05 \| 50 \| 1.5171 \| -1.5689 \| -1.4986 \| 0.4791 \| -0.0703 \| -24.5546 \| -23.9299 \| 14.9602 \| 14.9630 \|
	\| 4.4339 \| 0.1 \| 100 \| 2.9117 \| -11.0118 \| -10.8837 \| 0.4813 \| -0.1281 \| -43.3247 \| -42.8158 \| 22.8545 \| 22.8566 \|
	\| 5.6756 \| 0.15 \| 150 \| 4.3519 \| -20.9772 \| -20.5347 \| 0.4703 \| -0.4424 \| -62.6269 \| -62.7465 \| 13.8454 \| 13.8456 \|
	\| 3.4587 \| 0.2 \| 200 \| 3.7953 \| -20.5135 \| -19.9733 \| 0.4549 \| -0.5402 \| -61.5040 \| -61.8193 \| 9.3162 \| 9.3161 \|
	\| 3.1326 \| 0.24 \| 250 \| 4.2192 \| -16.2805 \| -16.0169 \| 0.4857 \| -0.2636 \| -53.5912 \| -53.3533 \| 17.4741 \| 17.4741 \|
	\| 4.3129 \| 0.29 \| 300 \| 3.2442 \| -18.6648 \| -18.0875 \| 0.4462 \| -0.5773 \| -57.7325 \| -58.1219 \| 9.3299 \| 9.3300 \|
	\| 4.1056 \| 0.34 \| 350 \| 3.0391 \| -19.9243 \| -19.4698 \| 0.4659 \| -0.4545 \| -60.4970 \| -60.6408 \| 13.8852 \| 13.8856 \|
	\| 3.4604 \| 0.39 \| 400 \| 3.0915 \| -16.3912 \| -16.0366 \| 0.5055 \| -0.3546 \| -53.6306 \| -53.5745 \| 9.7129 \| 9.7125 \|
	\| 4.7084 \| 0.44 \| 450 \| 2.7841 \| -18.9738 \| -18.6116 \| 0.4835 \| -0.3622 \| -58.7806 \| -58.7398 \| 9.9158 \| 9.9143 \|
	\| 4.1944 \| 0.49 \| 500 \| 2.9877 \| -22.1479 \| -21.8535 \| 0.4901 \| -0.2944 \| -65.2644 \| -65.0879 \| 10.6479 \| 10.6476 \|
	\| 3.8283 \| 0.54 \| 550 \| 2.4650 \| -19.8299 \| -19.7039 \| 0.4989 \| -0.1260 \| -60.9653 \| -60.4520 \| 5.6892 \| 5.6889 \|
	\| 3.2208 \| 0.59 \| 600 \| 2.3549 \| -15.6227 \| -15.7624 \| 0.5385 \| 0.1397 \| -53.0822 \| -52.0377 \| 11.5783 \| 11.5782 \|
	\| 2.1741 \| 0.64 \| 650 \| 2.4777 \| -19.7204 \| -19.3976 \| 0.4945 \| -0.3228 \| -60.3526 \| -60.2330 \| 10.8601 \| 10.8596 \|
	\| 2.8376 \| 0.68 \| 700 \| 2.4241 \| -18.3119 \| -18.1735 \| 0.5055 \| -0.1384 \| -57.9045 \| -57.4161 \| 8.0859 \| 8.0854 \|
	\| 2.4514 \| 0.73 \| 750 \| 2.2743 \| -20.2330 \| -20.0266 \| 0.5033 \| -0.2064 \| -61.6106 \| -61.2582 \| 6.6227 \| 6.6223 \|
	\| 1.8899 \| 0.78 \| 800 \| 2.2326 \| -19.6323 \| -19.3966 \| 0.5121 \| -0.2358 \| -60.3506 \| -60.0568 \| 7.6793 \| 7.6789 \|
	\| 2.435 \| 0.83 \| 850 \| 2.1976 \| -19.5253 \| -19.2881 \| 0.5121 \| -0.2372 \| -60.1336 \| -59.8427 \| 7.3698 \| 7.3695 \|
	\| 2.7112 \| 0.88 \| 900 \| 2.1806 \| -19.4443 \| -19.2182 \| 0.5011 \| -0.2261 \| -59.9939 \| -59.6808 \| 7.5579 \| 7.5575 \|
	\| 2.6506 \| 0.93 \| 950 \| 2.1819 \| -19.4556 \| -19.2275 \| 0.5011 \| -0.2280 \| -60.0125 \| -59.7034 \| 7.5627 \| 7.5623 \|
	\| 1.5392 \| 0.98 \| 1000 \| 2.1807 \| -19.4532 \| -19.2274 \| 0.5033 \| -0.2258 \| -60.0122 \| -59.6986 \| 7.5623 \| 7.5620 \|


	### Framework versions

	- Transformers 4.39.1
	- Pytorch 2.0.0+cu117
	- Datasets 2.18.0
	- Tokenizers 0.15.2