fine-tuning-Phi2-with-webglm-qa-with-lora_8

74e46ad verified about 1 year ago

3.99 kB

	---
	license: mit
	library_name: peft
	tags:
	- generated_from_trainer
	base_model: microsoft/phi-2
	model-index:
	- name: fine-tuning-Phi2-with-webglm-qa-with-lora_8
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# fine-tuning-Phi2-with-webglm-qa-with-lora_8

	This model is a fine-tuned version of [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.0935

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train_batch_size: 2
	- eval_batch_size: 2
	- seed: 42
	- gradient_accumulation_steps: 5
	- total_train_batch_size: 10
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 60
	- training_steps: 1000
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|
	\| 7.1823 \| 0.31 \| 20 \| 6.1082 \|
	\| 4.0 \| 0.63 \| 40 \| 0.9863 \|
	\| 0.7159 \| 0.94 \| 60 \| 0.6293 \|
	\| 0.4994 \| 1.26 \| 80 \| 0.4239 \|
	\| 0.3187 \| 1.57 \| 100 \| 0.3044 \|
	\| 0.251 \| 1.89 \| 120 \| 0.2567 \|
	\| 0.2189 \| 2.2 \| 140 \| 0.2206 \|
	\| 0.1869 \| 2.52 \| 160 \| 0.2000 \|
	\| 0.1741 \| 2.83 \| 180 \| 0.1781 \|
	\| 0.1439 \| 3.14 \| 200 \| 0.1638 \|
	\| 0.1543 \| 3.46 \| 220 \| 0.1550 \|
	\| 0.1428 \| 3.77 \| 240 \| 0.1455 \|
	\| 0.127 \| 4.09 \| 260 \| 0.1394 \|
	\| 0.1206 \| 4.4 \| 280 \| 0.1314 \|
	\| 0.1206 \| 4.72 \| 300 \| 0.1298 \|
	\| 0.1162 \| 5.03 \| 320 \| 0.1246 \|
	\| 0.109 \| 5.35 \| 340 \| 0.1235 \|
	\| 0.1088 \| 5.66 \| 360 \| 0.1190 \|
	\| 0.1062 \| 5.97 \| 380 \| 0.1157 \|
	\| 0.0938 \| 6.29 \| 400 \| 0.1146 \|
	\| 0.0945 \| 6.6 \| 420 \| 0.1133 \|
	\| 0.1012 \| 6.92 \| 440 \| 0.1105 \|
	\| 0.0881 \| 7.23 \| 460 \| 0.1109 \|
	\| 0.0897 \| 7.55 \| 480 \| 0.1091 \|
	\| 0.0837 \| 7.86 \| 500 \| 0.1060 \|
	\| 0.0899 \| 8.18 \| 520 \| 0.1051 \|
	\| 0.0803 \| 8.49 \| 540 \| 0.1041 \|
	\| 0.0792 \| 8.81 \| 560 \| 0.1021 \|
	\| 0.0885 \| 9.12 \| 580 \| 0.1000 \|
	\| 0.0844 \| 9.43 \| 600 \| 0.1004 \|
	\| 0.0704 \| 9.75 \| 620 \| 0.0992 \|
	\| 0.0681 \| 10.06 \| 640 \| 0.0994 \|
	\| 0.0727 \| 10.38 \| 660 \| 0.0977 \|
	\| 0.0712 \| 10.69 \| 680 \| 0.0970 \|
	\| 0.073 \| 11.01 \| 700 \| 0.0971 \|
	\| 0.0683 \| 11.32 \| 720 \| 0.0974 \|
	\| 0.0682 \| 11.64 \| 740 \| 0.0964 \|
	\| 0.0716 \| 11.95 \| 760 \| 0.0962 \|
	\| 0.0645 \| 12.26 \| 780 \| 0.0948 \|
	\| 0.0662 \| 12.58 \| 800 \| 0.0947 \|
	\| 0.0677 \| 12.89 \| 820 \| 0.0947 \|
	\| 0.0626 \| 13.21 \| 840 \| 0.0953 \|
	\| 0.0628 \| 13.52 \| 860 \| 0.0946 \|
	\| 0.0642 \| 13.84 \| 880 \| 0.0937 \|
	\| 0.0641 \| 14.15 \| 900 \| 0.0939 \|
	\| 0.0587 \| 14.47 \| 920 \| 0.0939 \|
	\| 0.0664 \| 14.78 \| 940 \| 0.0933 \|
	\| 0.061 \| 15.09 \| 960 \| 0.0931 \|
	\| 0.0596 \| 15.41 \| 980 \| 0.0934 \|
	\| 0.0646 \| 15.72 \| 1000 \| 0.0935 \|


	### Framework versions

	- PEFT 0.7.1
	- Transformers 4.36.2
	- Pytorch 2.0.0
	- Datasets 2.15.0
	- Tokenizers 0.15.0

	---
	license: mit
	library_name: peft
	tags:
	- generated_from_trainer
	base_model: microsoft/phi-2
	model-index:
	- name: fine-tuning-Phi2-with-webglm-qa-with-lora_8
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# fine-tuning-Phi2-with-webglm-qa-with-lora_8

	This model is a fine-tuned version of [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.0935

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train_batch_size: 2
	- eval_batch_size: 2
	- seed: 42
	- gradient_accumulation_steps: 5
	- total_train_batch_size: 10
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 60
	- training_steps: 1000
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|
	\| 7.1823 \| 0.31 \| 20 \| 6.1082 \|
	\| 4.0 \| 0.63 \| 40 \| 0.9863 \|
	\| 0.7159 \| 0.94 \| 60 \| 0.6293 \|
	\| 0.4994 \| 1.26 \| 80 \| 0.4239 \|
	\| 0.3187 \| 1.57 \| 100 \| 0.3044 \|
	\| 0.251 \| 1.89 \| 120 \| 0.2567 \|
	\| 0.2189 \| 2.2 \| 140 \| 0.2206 \|
	\| 0.1869 \| 2.52 \| 160 \| 0.2000 \|
	\| 0.1741 \| 2.83 \| 180 \| 0.1781 \|
	\| 0.1439 \| 3.14 \| 200 \| 0.1638 \|
	\| 0.1543 \| 3.46 \| 220 \| 0.1550 \|
	\| 0.1428 \| 3.77 \| 240 \| 0.1455 \|
	\| 0.127 \| 4.09 \| 260 \| 0.1394 \|
	\| 0.1206 \| 4.4 \| 280 \| 0.1314 \|
	\| 0.1206 \| 4.72 \| 300 \| 0.1298 \|
	\| 0.1162 \| 5.03 \| 320 \| 0.1246 \|
	\| 0.109 \| 5.35 \| 340 \| 0.1235 \|
	\| 0.1088 \| 5.66 \| 360 \| 0.1190 \|
	\| 0.1062 \| 5.97 \| 380 \| 0.1157 \|
	\| 0.0938 \| 6.29 \| 400 \| 0.1146 \|
	\| 0.0945 \| 6.6 \| 420 \| 0.1133 \|
	\| 0.1012 \| 6.92 \| 440 \| 0.1105 \|
	\| 0.0881 \| 7.23 \| 460 \| 0.1109 \|
	\| 0.0897 \| 7.55 \| 480 \| 0.1091 \|
	\| 0.0837 \| 7.86 \| 500 \| 0.1060 \|
	\| 0.0899 \| 8.18 \| 520 \| 0.1051 \|
	\| 0.0803 \| 8.49 \| 540 \| 0.1041 \|
	\| 0.0792 \| 8.81 \| 560 \| 0.1021 \|
	\| 0.0885 \| 9.12 \| 580 \| 0.1000 \|
	\| 0.0844 \| 9.43 \| 600 \| 0.1004 \|
	\| 0.0704 \| 9.75 \| 620 \| 0.0992 \|
	\| 0.0681 \| 10.06 \| 640 \| 0.0994 \|
	\| 0.0727 \| 10.38 \| 660 \| 0.0977 \|
	\| 0.0712 \| 10.69 \| 680 \| 0.0970 \|
	\| 0.073 \| 11.01 \| 700 \| 0.0971 \|
	\| 0.0683 \| 11.32 \| 720 \| 0.0974 \|
	\| 0.0682 \| 11.64 \| 740 \| 0.0964 \|
	\| 0.0716 \| 11.95 \| 760 \| 0.0962 \|
	\| 0.0645 \| 12.26 \| 780 \| 0.0948 \|
	\| 0.0662 \| 12.58 \| 800 \| 0.0947 \|
	\| 0.0677 \| 12.89 \| 820 \| 0.0947 \|
	\| 0.0626 \| 13.21 \| 840 \| 0.0953 \|
	\| 0.0628 \| 13.52 \| 860 \| 0.0946 \|
	\| 0.0642 \| 13.84 \| 880 \| 0.0937 \|
	\| 0.0641 \| 14.15 \| 900 \| 0.0939 \|
	\| 0.0587 \| 14.47 \| 920 \| 0.0939 \|
	\| 0.0664 \| 14.78 \| 940 \| 0.0933 \|
	\| 0.061 \| 15.09 \| 960 \| 0.0931 \|
	\| 0.0596 \| 15.41 \| 980 \| 0.0934 \|
	\| 0.0646 \| 15.72 \| 1000 \| 0.0935 \|


	### Framework versions

	- PEFT 0.7.1
	- Transformers 4.36.2
	- Pytorch 2.0.0
	- Datasets 2.15.0
	- Tokenizers 0.15.0