Model save

c9c9d19 verified 10 months ago

7.27 kB

	---
	library_name: transformers
	tags:
	- trl
	- dpo
	- alignment-handbook
	- generated_from_trainer
	model-index:
	- name: OpenELM-1_1B-DPO-full-max-14-reward
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# OpenELM-1_1B-DPO-full-max-14-reward

	This model was trained from scratch on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 1.1668
	- Rewards/chosen: -3.5938
	- Rewards/rejected: -4.0
	- Rewards/accuracies: 0.4902
	- Rewards/margins: 0.4121
	- Logps/rejected: -688.0
	- Logps/chosen: -676.0
	- Logits/rejected: -16.375
	- Logits/chosen: -16.875

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train_batch_size: 8
	- eval_batch_size: 16
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 4
	- gradient_accumulation_steps: 2
	- total_train_batch_size: 64
	- total_eval_batch_size: 64
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 3

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rewards/chosen \| Rewards/rejected \| Rewards/accuracies \| Rewards/margins \| Logps/rejected \| Logps/chosen \| Logits/rejected \| Logits/chosen \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|:--------------:\|:----------------:\|:------------------:\|:---------------:\|:--------------:\|:------------:\|:---------------:\|:-------------:\|
	\| 0.0562 \| 0.1047 \| 100 \| 0.6971 \| -1.2578 \| -1.5703 \| 0.5762 \| 0.3145 \| -446.0 \| -444.0 \| -9.3125 \| -9.5625 \|
	\| 0.0394 \| 0.2094 \| 200 \| 0.7479 \| -0.8516 \| -1.0078 \| 0.5195 \| 0.1572 \| -390.0 \| -404.0 \| -12.3125 \| -12.75 \|
	\| 0.0487 \| 0.3141 \| 300 \| 0.9195 \| -1.9922 \| -2.3125 \| 0.5176 \| 0.3203 \| -520.0 \| -516.0 \| -13.4375 \| -13.6875 \|
	\| 0.0454 \| 0.4188 \| 400 \| 0.8309 \| -1.4453 \| -1.6016 \| 0.4961 \| 0.1543 \| -448.0 \| -462.0 \| -15.625 \| -15.75 \|
	\| 0.0297 \| 0.5236 \| 500 \| 0.8326 \| -3.1094 \| -3.375 \| 0.5039 \| 0.2734 \| -628.0 \| -628.0 \| -15.5 \| -15.6875 \|
	\| 0.0434 \| 0.6283 \| 600 \| 0.8373 \| -1.6953 \| -1.875 \| 0.4941 \| 0.1826 \| -476.0 \| -488.0 \| -15.0 \| -15.25 \|
	\| 0.0496 \| 0.7330 \| 700 \| 0.9407 \| -3.7344 \| -3.9688 \| 0.5332 \| 0.2236 \| -684.0 \| -692.0 \| -9.5625 \| -10.3125 \|
	\| 0.0289 \| 0.8377 \| 800 \| 1.0108 \| -3.1406 \| -3.25 \| 0.4707 \| 0.0991 \| -612.0 \| -632.0 \| -13.0625 \| -13.3125 \|
	\| 0.0259 \| 0.9424 \| 900 \| 1.0869 \| -3.6094 \| -3.7812 \| 0.4648 \| 0.1631 \| -668.0 \| -680.0 \| -15.625 \| -15.875 \|
	\| 0.005 \| 1.0471 \| 1000 \| 1.0944 \| -3.4375 \| -3.625 \| 0.4570 \| 0.1758 \| -652.0 \| -664.0 \| -15.0625 \| -15.25 \|
	\| 0.0156 \| 1.1518 \| 1100 \| 1.2452 \| -4.4062 \| -4.5938 \| 0.4629 \| 0.1973 \| -748.0 \| -760.0 \| -16.5 \| -16.625 \|
	\| 0.0018 \| 1.2565 \| 1200 \| 1.0496 \| -3.7344 \| -3.9219 \| 0.4844 \| 0.1885 \| -680.0 \| -692.0 \| -15.5625 \| -15.875 \|
	\| 0.0046 \| 1.3613 \| 1300 \| 1.0484 \| -3.375 \| -3.6094 \| 0.4980 \| 0.2402 \| -648.0 \| -656.0 \| -14.9375 \| -15.25 \|
	\| 0.0041 \| 1.4660 \| 1400 \| 0.9980 \| -3.5156 \| -3.8438 \| 0.5137 \| 0.3379 \| -676.0 \| -668.0 \| -13.8125 \| -14.3125 \|
	\| 0.0077 \| 1.5707 \| 1500 \| 1.0434 \| -3.1719 \| -3.5156 \| 0.4902 \| 0.3535 \| -640.0 \| -636.0 \| -13.875 \| -14.375 \|
	\| 0.0016 \| 1.6754 \| 1600 \| 1.0882 \| -3.8594 \| -4.2812 \| 0.4922 \| 0.4141 \| -716.0 \| -704.0 \| -12.4375 \| -12.9375 \|
	\| 0.0042 \| 1.7801 \| 1700 \| 1.0261 \| -3.3438 \| -3.7656 \| 0.4941 \| 0.4238 \| -664.0 \| -652.0 \| -15.5 \| -15.9375 \|
	\| 0.0005 \| 1.8848 \| 1800 \| 1.0536 \| -3.2344 \| -3.5938 \| 0.4961 \| 0.3555 \| -648.0 \| -644.0 \| -16.625 \| -17.0 \|
	\| 0.0083 \| 1.9895 \| 1900 \| 1.1039 \| -3.4844 \| -3.8125 \| 0.4883 \| 0.3242 \| -672.0 \| -668.0 \| -16.25 \| -16.625 \|
	\| 0.0003 \| 2.0942 \| 2000 \| 1.1159 \| -3.5156 \| -3.8438 \| 0.4922 \| 0.3301 \| -672.0 \| -672.0 \| -16.125 \| -16.625 \|
	\| 0.0027 \| 2.1990 \| 2100 \| 1.1535 \| -3.5938 \| -4.0 \| 0.4980 \| 0.4043 \| -688.0 \| -680.0 \| -16.125 \| -16.625 \|
	\| 0.0003 \| 2.3037 \| 2200 \| 1.1505 \| -3.5781 \| -3.9844 \| 0.4902 \| 0.4062 \| -688.0 \| -676.0 \| -16.25 \| -16.625 \|
	\| 0.0006 \| 2.4084 \| 2300 \| 1.1535 \| -3.5469 \| -3.9531 \| 0.4902 \| 0.4023 \| -684.0 \| -672.0 \| -16.25 \| -16.75 \|
	\| 0.0002 \| 2.5131 \| 2400 \| 1.1581 \| -3.5781 \| -3.9844 \| 0.4922 \| 0.4082 \| -688.0 \| -676.0 \| -16.25 \| -16.625 \|
	\| 0.0001 \| 2.6178 \| 2500 \| 1.1609 \| -3.5625 \| -3.9688 \| 0.4961 \| 0.4082 \| -684.0 \| -672.0 \| -16.375 \| -16.75 \|
	\| 0.0008 \| 2.7225 \| 2600 \| 1.1668 \| -3.5938 \| -4.0 \| 0.4922 \| 0.4121 \| -688.0 \| -676.0 \| -16.375 \| -16.75 \|
	\| 0.0002 \| 2.8272 \| 2700 \| 1.1668 \| -3.5938 \| -4.0 \| 0.4902 \| 0.4121 \| -688.0 \| -676.0 \| -16.375 \| -16.75 \|
	\| 0.0003 \| 2.9319 \| 2800 \| 1.1668 \| -3.5938 \| -4.0 \| 0.4902 \| 0.4121 \| -688.0 \| -676.0 \| -16.375 \| -16.875 \|


	### Framework versions

	- Transformers 4.45.1
	- Pytorch 2.3.0
	- Datasets 3.0.1
	- Tokenizers 0.20.0