HW2-orpo / README.md

Model save

09ffb66 verified 5 months ago

6.43 kB

	---
	library_name: transformers
	license: mit
	base_model: openai-community/gpt2
	tags:
	- trl
	- orpo
	- generated_from_trainer
	datasets:
	- piqa
	model-index:
	- name: HW2-orpo
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# HW2-orpo

	This model is a fine-tuned version of [openai-community/gpt2](https://huggingface.co/openai-community/gpt2) on the piqa dataset.
	It achieves the following results on the evaluation set:
	- Loss: 3.8617
	- Rewards/chosen: -0.3716
	- Rewards/rejected: -0.3885
	- Rewards/accuracies: 0.6390
	- Rewards/margins: 0.0170
	- Logps/rejected: -3.8851
	- Logps/chosen: -3.7156
	- Logits/rejected: -3.3968
	- Logits/chosen: -3.5059
	- Nll Loss: 3.7885
	- Log Odds Ratio: -0.7324
	- Log Odds Chosen: 0.1830

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train_batch_size: 1
	- eval_batch_size: 1
	- seed: 42
	- gradient_accumulation_steps: 8
	- total_train_batch_size: 8
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- num_epochs: 5
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rewards/chosen \| Rewards/rejected \| Rewards/accuracies \| Rewards/margins \| Logps/rejected \| Logps/chosen \| Logits/rejected \| Logits/chosen \| Nll Loss \| Log Odds Ratio \| Log Odds Chosen \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|:--------------:\|:----------------:\|:------------------:\|:---------------:\|:--------------:\|:------------:\|:---------------:\|:-------------:\|:--------:\|:--------------:\|:---------------:\|
	\| 3.5511 \| 0.2758 \| 500 \| 3.4162 \| -0.3146 \| -0.3224 \| 0.6303 \| 0.0078 \| -3.2238 \| -3.1457 \| -12.1919 \| -12.3316 \| 3.3464 \| -0.6978 \| 0.0837 \|
	\| 3.3852 \| 0.5517 \| 1000 \| 3.3345 \| -0.3060 \| -0.3152 \| 0.6421 \| 0.0092 \| -3.1517 \| -3.0602 \| -3.3351 \| -3.5024 \| 3.2656 \| -0.6894 \| 0.0984 \|
	\| 3.2734 \| 0.8275 \| 1500 \| 3.2903 \| -0.3011 \| -0.3101 \| 0.6309 \| 0.0090 \| -3.1013 \| -3.0113 \| -5.6602 \| -5.7320 \| 3.2211 \| -0.6920 \| 0.0975 \|
	\| 3.104 \| 1.1034 \| 2000 \| 3.2933 \| -0.3021 \| -0.3118 \| 0.6371 \| 0.0097 \| -3.1182 \| -3.0211 \| -0.2253 \| -0.3135 \| 3.2237 \| -0.6956 \| 0.1062 \|
	\| 2.8138 \| 1.3792 \| 2500 \| 3.2816 \| -0.3018 \| -0.3125 \| 0.6464 \| 0.0107 \| -3.1253 \| -3.0179 \| 1.3216 \| 1.2346 \| 3.2125 \| -0.6916 \| 0.1172 \|
	\| 2.8178 \| 1.6551 \| 3000 \| 3.2660 \| -0.2998 \| -0.3108 \| 0.6383 \| 0.0109 \| -3.1080 \| -2.9985 \| -0.7475 \| -0.8064 \| 3.1968 \| -0.6923 \| 0.1204 \|
	\| 2.8122 \| 1.9309 \| 3500 \| 3.2586 \| -0.2992 \| -0.3104 \| 0.6433 \| 0.0112 \| -3.1039 \| -2.9922 \| -2.8285 \| -2.9509 \| 3.1893 \| -0.6925 \| 0.1228 \|
	\| 2.4931 \| 2.2067 \| 4000 \| 3.3765 \| -0.3130 \| -0.3256 \| 0.6427 \| 0.0127 \| -3.2563 \| -3.1296 \| 1.6707 \| 1.5380 \| 3.3063 \| -0.7020 \| 0.1392 \|
	\| 2.3999 \| 2.4826 \| 4500 \| 3.4109 \| -0.3174 \| -0.3298 \| 0.6402 \| 0.0125 \| -3.2982 \| -3.1736 \| 1.4695 \| 1.2634 \| 3.3402 \| -0.7069 \| 0.1373 \|
	\| 2.4254 \| 2.7584 \| 5000 \| 3.3882 \| -0.3150 \| -0.3278 \| 0.6439 \| 0.0128 \| -3.2781 \| -3.1497 \| 2.1282 \| 1.9044 \| 3.3180 \| -0.7018 \| 0.1416 \|
	\| 2.373 \| 3.0343 \| 5500 \| 3.5698 \| -0.3370 \| -0.3515 \| 0.6408 \| 0.0145 \| -3.5149 \| -3.3698 \| 3.7150 \| 3.6601 \| 3.4983 \| -0.7147 \| 0.1595 \|
	\| 2.0541 \| 3.3101 \| 6000 \| 3.6256 \| -0.3430 \| -0.3570 \| 0.6284 \| 0.0140 \| -3.5700 \| -3.4302 \| 1.1269 \| 0.9714 \| 3.5532 \| -0.7240 \| 0.1540 \|
	\| 2.0641 \| 3.5860 \| 6500 \| 3.6157 \| -0.3425 \| -0.3577 \| 0.6445 \| 0.0152 \| -3.5771 \| -3.4246 \| -0.6703 \| -0.8165 \| 3.5439 \| -0.7178 \| 0.1665 \|
	\| 2.0747 \| 3.8618 \| 7000 \| 3.6335 \| -0.3447 \| -0.3598 \| 0.6402 \| 0.0151 \| -3.5983 \| -3.4474 \| -0.1967 \| -0.3291 \| 3.5616 \| -0.7193 \| 0.1640 \|
	\| 1.9377 \| 4.1376 \| 7500 \| 3.8286 \| -0.3671 \| -0.3838 \| 0.6445 \| 0.0167 \| -3.8381 \| -3.6712 \| -2.6871 \| -2.8058 \| 3.7557 \| -0.7288 \| 0.1800 \|
	\| 1.8001 \| 4.4135 \| 8000 \| 3.8629 \| -0.3715 \| -0.3882 \| 0.6414 \| 0.0168 \| -3.8822 \| -3.7146 \| -3.4193 \| -3.5370 \| 3.7898 \| -0.7315 \| 0.1810 \|
	\| 1.81 \| 4.6893 \| 8500 \| 3.8574 \| -0.3711 \| -0.3879 \| 0.6396 \| 0.0168 \| -3.8789 \| -3.7110 \| -4.2176 \| -4.3406 \| 3.7842 \| -0.7321 \| 0.1814 \|
	\| 1.8108 \| 4.9652 \| 9000 \| 3.8617 \| -0.3716 \| -0.3885 \| 0.6390 \| 0.0170 \| -3.8851 \| -3.7156 \| -3.3968 \| -3.5059 \| 3.7885 \| -0.7324 \| 0.1830 \|


	### Framework versions

	- Transformers 4.44.2
	- Pytorch 2.4.0+cu118
	- Datasets 2.21.0
	- Tokenizers 0.19.1