OpenELM-1_1B-DPO-full-2-5

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.1888
Rewards/chosen: -13.5625
Rewards/rejected: -17.0
Rewards/accuracies: 0.7070
Rewards/margins: 3.4062
Logps/rejected: -1984.0
Logps/chosen: -1672.0
Logits/rejected: 6.2188
Logits/chosen: 4.5312

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 16
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 2
total_train_batch_size: 64
total_eval_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 5

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.615	0.1047	100	0.6275	-0.7383	-0.9961	0.6719	0.2578	-386.0	-388.0	-9.625	-9.8125
0.5897	0.2093	200	0.6029	-1.6641	-2.0938	0.6934	0.4336	-496.0	-480.0	-9.375	-9.75
0.6457	0.3140	300	0.5886	-1.3828	-1.8281	0.6895	0.4473	-470.0	-454.0	-13.5625	-13.75
0.6271	0.4186	400	0.5936	-1.7031	-2.25	0.6992	0.5430	-510.0	-484.0	-8.4375	-8.8125
0.5746	0.5233	500	0.5886	-2.0156	-2.5625	0.6816	0.5430	-540.0	-516.0	-6.6562	-7.4062
0.5484	0.6279	600	0.5710	-3.9531	-4.6875	0.6973	0.7422	-756.0	-708.0	-5.75	-6.4375
0.5747	0.7326	700	0.5820	-2.75	-3.4844	0.6953	0.7227	-632.0	-592.0	-6.5312	-7.5938
0.5591	0.8373	800	0.5662	-2.8594	-3.5156	0.7090	0.6523	-636.0	-600.0	-3.375	-4.7812
0.5892	0.9419	900	0.5821	-2.625	-3.2344	0.7012	0.5977	-608.0	-576.0	-4.8438	-6.125
0.261	1.0466	1000	0.5852	-3.9375	-4.9688	0.7324	1.0078	-780.0	-708.0	-0.8672	-2.1094
0.2407	1.1512	1100	0.5943	-4.0625	-5.0	0.6895	0.9336	-784.0	-720.0	-0.3672	-1.9688
0.2348	1.2559	1200	0.6151	-4.9375	-5.9688	0.6777	1.0547	-884.0	-808.0	1.5312	0.2227
0.257	1.3605	1300	0.6005	-4.4688	-5.4688	0.6973	0.9883	-832.0	-760.0	1.5312	-0.1445
0.2416	1.4652	1400	0.6023	-5.1875	-6.125	0.6855	0.9258	-900.0	-836.0	1.9141	0.2715
0.215	1.5699	1500	0.6062	-5.5938	-6.7188	0.6934	1.1328	-960.0	-872.0	1.9219	0.2637
0.2534	1.6745	1600	0.6013	-4.6562	-5.7188	0.7129	1.0391	-856.0	-780.0	2.7969	1.1406
0.2463	1.7792	1700	0.6173	-5.2812	-6.4375	0.6914	1.1484	-928.0	-844.0	1.9688	0.0977
0.23	1.8838	1800	0.6153	-5.8438	-7.0625	0.7090	1.2266	-992.0	-896.0	2.9062	1.0156
0.2092	1.9885	1900	0.6082	-5.5625	-6.7188	0.7051	1.1641	-956.0	-868.0	2.9375	1.0781
0.0271	2.0931	2000	0.7202	-7.625	-9.375	0.7207	1.7734	-1224.0	-1080.0	3.5781	1.8516
0.0367	2.1978	2100	0.8323	-9.3125	-11.5	0.7168	2.1406	-1432.0	-1248.0	4.7188	2.9219
0.0443	2.3025	2200	0.7840	-8.0	-10.0625	0.7324	2.0625	-1296.0	-1112.0	3.9375	2.0312
0.0302	2.4071	2300	0.7981	-8.375	-10.375	0.7070	2.0	-1328.0	-1152.0	4.625	2.8125
0.031	2.5118	2400	0.7786	-7.9062	-9.875	0.7129	1.9922	-1280.0	-1104.0	4.875	3.0156
0.018	2.6164	2500	0.8584	-9.9375	-12.125	0.6914	2.2031	-1496.0	-1312.0	5.4688	3.6719
0.0248	2.7211	2600	0.8079	-8.625	-10.6875	0.7012	2.0469	-1352.0	-1176.0	5.0312	3.0938
0.0263	2.8257	2700	0.8371	-9.3125	-11.375	0.6914	2.0156	-1424.0	-1248.0	5.2812	3.4531
0.033	2.9304	2800	0.8799	-9.8125	-12.1875	0.7207	2.4062	-1504.0	-1296.0	5.2188	3.3281
0.0118	3.0351	2900	0.8372	-9.625	-11.875	0.7246	2.2969	-1472.0	-1280.0	5.6562	3.7812
0.0094	3.1397	3000	0.9555	-11.0	-13.6875	0.7090	2.6875	-1656.0	-1416.0	6.0938	4.3125
0.0073	3.2444	3100	0.9687	-11.375	-14.125	0.7129	2.7344	-1696.0	-1456.0	5.9062	4.1875
0.0104	3.3490	3200	1.0111	-11.75	-14.5625	0.7070	2.8438	-1744.0	-1488.0	6.1875	4.4688
0.01	3.4537	3300	1.0564	-12.125	-15.0625	0.7051	2.9375	-1792.0	-1528.0	5.9375	4.2188
0.0089	3.5583	3400	0.9822	-11.375	-14.0625	0.7051	2.7031	-1696.0	-1448.0	5.875	4.2188
0.0106	3.6630	3500	1.0239	-11.5625	-14.375	0.7070	2.8125	-1720.0	-1472.0	5.9688	4.25
0.0099	3.7677	3600	1.0668	-11.9375	-14.9375	0.6973	3.0	-1784.0	-1512.0	6.125	4.375
0.0066	3.8723	3700	1.0938	-12.75	-15.875	0.7070	3.1406	-1872.0	-1592.0	6.2188	4.5312
0.0081	3.9770	3800	1.0255	-11.6875	-14.5625	0.7129	2.8906	-1744.0	-1488.0	5.9688	4.2812
0.0035	4.0816	3900	1.1112	-12.75	-15.875	0.7031	3.1406	-1872.0	-1592.0	6.2188	4.5312
0.002	4.1863	4000	1.1127	-12.8125	-16.0	0.7051	3.1562	-1888.0	-1600.0	6.1875	4.5
0.0036	4.2909	4100	1.1368	-13.0	-16.25	0.7031	3.25	-1912.0	-1616.0	6.1875	4.4688
0.0069	4.3956	4200	1.1589	-13.25	-16.625	0.7070	3.3125	-1944.0	-1640.0	6.2188	4.5312
0.0043	4.5003	4300	1.1756	-13.4375	-16.75	0.7031	3.375	-1968.0	-1656.0	6.2188	4.5312
0.0091	4.6049	4400	1.1842	-13.5	-16.875	0.7031	3.3906	-1976.0	-1664.0	6.2188	4.5312
0.0058	4.7096	4500	1.1865	-13.5	-16.875	0.7051	3.3906	-1976.0	-1664.0	6.2188	4.5312
0.0034	4.8142	4600	1.1880	-13.5625	-17.0	0.7051	3.3906	-1984.0	-1672.0	6.2188	4.5312
0.006	4.9189	4700	1.1888	-13.5625	-17.0	0.7070	3.4062	-1984.0	-1672.0	6.2188	4.5312

Framework versions

Transformers 4.44.2
Pytorch 2.1.2
Datasets 2.18.0
Tokenizers 0.19.1

CharlesLi
/

OpenELM-1_1B-DPO-full-2-5

OpenELM-1_1B-DPO-full-2-5

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Evaluation results