zephyr-7b-dpo-lora / README.md

End of training

050c558 verified 7 months ago

5.93 kB

	---
	base_model: alignment-handbook/zephyr-7b-sft-full
	datasets:
	- HuggingFaceH4/ultrafeedback_binarized
	library_name: peft
	license: apache-2.0
	tags:
	- alignment-handbook
	- trl
	- dpo
	- generated_from_trainer
	model-index:
	- name: zephyr-7b-dpo-lora
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# zephyr-7b-dpo-lora

	This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on the HuggingFaceH4/ultrafeedback_binarized dataset.
	It achieves the following results on the evaluation set:
	- Loss: 1.5622
	- Rewards/chosen: -17.0278
	- Rewards/rejected: -19.8457
	- Rewards/accuracies: 0.6500
	- Rewards/margins: 2.8179
	- Logps/rejected: -2233.0220
	- Logps/chosen: -1971.0188
	- Logits/rejected: -1.7584
	- Logits/chosen: -1.7819

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-06
	- train_batch_size: 4
	- eval_batch_size: 8
	- seed: 42
	- distributed_type: multi-GPU
	- gradient_accumulation_steps: 4
	- total_train_batch_size: 16
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 20

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rewards/chosen \| Rewards/rejected \| Rewards/accuracies \| Rewards/margins \| Logps/rejected \| Logps/chosen \| Logits/rejected \| Logits/chosen \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|:--------------:\|:----------------:\|:------------------:\|:---------------:\|:--------------:\|:------------:\|:---------------:\|:-------------:\|
	\| 0.6861 \| 0.992 \| 62 \| 0.6889 \| 0.0006 \| -0.0081 \| 0.6450 \| 0.0087 \| -249.2549 \| -268.1721 \| -2.8525 \| -2.8868 \|
	\| 0.6612 \| 2.0 \| 125 \| 0.6531 \| -0.0006 \| -0.0929 \| 0.6550 \| 0.0923 \| -257.7365 \| -268.2944 \| -2.8301 \| -2.8604 \|
	\| 0.498 \| 2.992 \| 187 \| 0.6181 \| -0.5345 \| -0.7907 \| 0.6950 \| 0.2561 \| -327.5125 \| -321.6882 \| -2.7597 \| -2.7831 \|
	\| 0.2445 \| 4.0 \| 250 \| 0.6131 \| -1.7835 \| -2.3662 \| 0.6800 \| 0.5827 \| -485.0650 \| -446.5857 \| -2.0948 \| -2.1082 \|
	\| 0.1749 \| 4.992 \| 312 \| 0.6447 \| -2.2836 \| -2.9706 \| 0.6650 \| 0.6870 \| -545.5024 \| -496.5903 \| -1.8275 \| -1.8365 \|
	\| 0.0492 \| 6.0 \| 375 \| 0.9374 \| -7.0520 \| -8.3385 \| 0.6400 \| 1.2865 \| -1082.3002 \| -973.4358 \| -1.3453 \| -1.3454 \|
	\| 0.0064 \| 6.992 \| 437 \| 0.8928 \| -8.6496 \| -10.2275 \| 0.6450 \| 1.5779 \| -1271.1948 \| -1133.1906 \| -1.5853 \| -1.5996 \|
	\| 0.01 \| 8.0 \| 500 \| 1.2673 \| -13.8405 \| -16.0886 \| 0.6400 \| 2.2482 \| -1857.3101 \| -1652.2802 \| -1.6448 \| -1.6610 \|
	\| 0.0007 \| 8.992 \| 562 \| 1.1752 \| -11.4716 \| -13.4777 \| 0.6300 \| 2.0061 \| -1596.2178 \| -1415.3928 \| -1.8498 \| -1.8705 \|
	\| 0.0002 \| 10.0 \| 625 \| 1.3088 \| -13.5264 \| -15.8880 \| 0.6350 \| 2.3616 \| -1837.2434 \| -1620.8707 \| -1.8164 \| -1.8397 \|
	\| 0.0003 \| 10.992 \| 687 \| 1.3563 \| -15.6686 \| -18.2912 \| 0.6700 \| 2.6225 \| -2077.5627 \| -1835.0981 \| -1.7419 \| -1.7643 \|
	\| 0.0001 \| 12.0 \| 750 \| 1.4799 \| -16.0123 \| -18.6412 \| 0.6400 \| 2.6289 \| -2112.5684 \| -1869.4608 \| -1.7532 \| -1.7747 \|
	\| 0.0 \| 12.992 \| 812 \| 1.4863 \| -15.9107 \| -18.5614 \| 0.6450 \| 2.6507 \| -2104.5852 \| -1859.3058 \| -1.7792 \| -1.8020 \|
	\| 0.0003 \| 14.0 \| 875 \| 1.5278 \| -16.6140 \| -19.3716 \| 0.6500 \| 2.7576 \| -2185.6045 \| -1929.6328 \| -1.7600 \| -1.7826 \|
	\| 0.0438 \| 14.992 \| 937 \| 1.5387 \| -16.7605 \| -19.5376 \| 0.6500 \| 2.7771 \| -2202.2078 \| -1944.2887 \| -1.7625 \| -1.7854 \|
	\| 0.0001 \| 16.0 \| 1000 \| 1.5438 \| -16.8482 \| -19.6450 \| 0.6550 \| 2.7968 \| -2212.9512 \| -1953.0580 \| -1.7596 \| -1.7831 \|
	\| 0.0435 \| 16.992 \| 1062 \| 1.5527 \| -16.9283 \| -19.7428 \| 0.6500 \| 2.8145 \| -2222.7285 \| -1961.0630 \| -1.7629 \| -1.7860 \|
	\| 0.0001 \| 18.0 \| 1125 \| 1.5617 \| -16.9933 \| -19.8065 \| 0.6550 \| 2.8133 \| -2229.1018 \| -1967.5621 \| -1.7580 \| -1.7814 \|
	\| 0.0002 \| 18.992 \| 1187 \| 1.5675 \| -17.0212 \| -19.8377 \| 0.6550 \| 2.8165 \| -2232.2144 \| -1970.3562 \| -1.7594 \| -1.7825 \|
	\| 0.0001 \| 19.84 \| 1240 \| 1.5622 \| -17.0278 \| -19.8457 \| 0.6500 \| 2.8179 \| -2233.0220 \| -1971.0188 \| -1.7584 \| -1.7819 \|


	### Framework versions

	- PEFT 0.12.0
	- Transformers 4.44.0
	- Pytorch 2.4.0+cu121
	- Datasets 2.20.0
	- Tokenizers 0.19.1