OpenELM-1_1B-DPO-full-max-4-reward

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.6190
Rewards/chosen: -13.625
Rewards/rejected: -15.0625
Rewards/accuracies: 0.5996
Rewards/margins: 1.4688
Logps/rejected: -1800.0
Logps/chosen: -1680.0
Logits/rejected: 1.0625
Logits/chosen: -0.2695

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 16
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 2
total_train_batch_size: 64
total_eval_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6378	0.0838	80	0.6868	-0.6758	-0.7656	0.5684	0.0918	-366.0	-386.0	-9.875	-10.125
0.6219	0.1675	160	0.6949	-0.9102	-1.0547	0.5977	0.1406	-394.0	-410.0	-10.125	-10.5
0.6151	0.2513	240	0.7637	-2.4531	-2.6562	0.5566	0.2031	-552.0	-564.0	-10.9375	-11.25
0.6607	0.3351	320	0.7307	-2.7344	-2.9375	0.5742	0.1992	-584.0	-592.0	-14.25	-14.4375
0.6304	0.4188	400	0.7129	-2.7344	-3.0156	0.5898	0.2715	-588.0	-592.0	-12.5	-13.0
0.623	0.5026	480	0.7718	-2.5469	-2.9375	0.5859	0.3887	-584.0	-572.0	-8.0625	-9.0
0.6091	0.5864	560	0.7543	-3.3281	-3.6562	0.5957	0.3320	-656.0	-652.0	-12.0	-12.75
0.583	0.6702	640	0.7081	-3.25	-3.7031	0.6406	0.4648	-660.0	-644.0	-9.0	-10.0625
0.6183	0.7539	720	0.7397	-3.7812	-4.0938	0.5996	0.3242	-700.0	-696.0	-8.5625	-9.4375
0.5988	0.8377	800	0.7986	-4.4688	-4.9375	0.5898	0.4609	-784.0	-764.0	-7.9062	-8.9375
0.5882	0.9215	880	0.7997	-3.2656	-3.6562	0.5879	0.3906	-656.0	-644.0	-8.3125	-9.1875
0.4256	1.0052	960	0.7816	-4.5312	-5.1875	0.6172	0.6367	-808.0	-772.0	-6.75	-7.9062
0.2006	1.0890	1040	0.9734	-5.9688	-6.6875	0.6094	0.7383	-960.0	-916.0	-4.7812	-6.0625
0.1977	1.1728	1120	0.9420	-6.25	-7.0	0.6094	0.7578	-988.0	-944.0	-5.0	-6.25
0.1717	1.2565	1200	1.0548	-7.4688	-8.25	0.5918	0.7852	-1112.0	-1064.0	-4.5	-5.8125
0.1881	1.3403	1280	0.9567	-6.9688	-7.8125	0.6035	0.8672	-1072.0	-1012.0	-3.2188	-4.4688
0.1897	1.4241	1360	0.9563	-6.9688	-7.8438	0.6055	0.8867	-1072.0	-1016.0	-4.2812	-5.6875
0.1383	1.5079	1440	1.1196	-8.5625	-9.5	0.6055	0.9922	-1240.0	-1176.0	-2.5938	-3.9062
0.146	1.5916	1520	1.0767	-9.5	-10.5	0.6055	1.0078	-1336.0	-1264.0	-1.6797	-3.0312
0.1831	1.6754	1600	0.9776	-8.0625	-8.9375	0.6055	0.8516	-1184.0	-1128.0	-2.2344	-3.5938
0.1667	1.7592	1680	1.0210	-7.75	-8.625	0.5957	0.9023	-1152.0	-1088.0	-1.7344	-3.2344
0.1514	1.8429	1760	1.0214	-8.6875	-9.6875	0.6133	0.9805	-1256.0	-1184.0	-1.1719	-2.5312
0.1594	1.9267	1840	1.0633	-8.8125	-9.75	0.5977	0.9727	-1264.0	-1200.0	-1.2344	-2.625
0.0307	2.0105	1920	1.0948	-8.75	-9.75	0.6172	1.0312	-1264.0	-1192.0	-1.4531	-2.9844
0.0214	2.0942	2000	1.5354	-12.25	-13.3125	0.6094	1.1016	-1624.0	-1544.0	0.1973	-1.2031
0.0186	2.1780	2080	1.5790	-13.5625	-14.9375	0.6055	1.3906	-1784.0	-1680.0	0.4902	-0.9102
0.0395	2.2618	2160	1.5234	-12.0625	-13.1875	0.6035	1.1406	-1608.0	-1520.0	0.5391	-0.7656
0.0217	2.3455	2240	1.5867	-13.1875	-14.5625	0.6035	1.375	-1744.0	-1632.0	0.8945	-0.4141
0.0268	2.4293	2320	1.5888	-13.0	-14.375	0.6035	1.4219	-1728.0	-1616.0	0.6797	-0.6758
0.0238	2.5131	2400	1.6647	-13.625	-15.0625	0.6055	1.4453	-1792.0	-1680.0	0.9648	-0.3633
0.0227	2.5969	2480	1.5873	-13.125	-14.5625	0.6094	1.4375	-1744.0	-1632.0	0.9258	-0.4199
0.0233	2.6806	2560	1.5836	-13.1875	-14.625	0.6035	1.4297	-1752.0	-1640.0	0.9297	-0.4180
0.021	2.7644	2640	1.5917	-13.4375	-14.9375	0.6094	1.4609	-1776.0	-1664.0	1.0078	-0.3223
0.0221	2.8482	2720	1.6077	-13.5625	-15.0	0.6035	1.4609	-1792.0	-1672.0	1.0469	-0.2793
0.0182	2.9319	2800	1.6190	-13.625	-15.0625	0.5996	1.4688	-1800.0	-1680.0	1.0625	-0.2695

Framework versions

Transformers 4.45.1
Pytorch 2.3.0
Datasets 3.0.1
Tokenizers 0.20.0

CharlesLi
/

OpenELM-1_1B-DPO-full-max-4-reward

OpenELM-1_1B-DPO-full-max-4-reward

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Evaluation results