OpenELM-1_1B-SimPO

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

Logits/chosen: -0.5781
Logits/rejected: 1.2422
Logps/chosen: -113.0
Logps/rejected: -171.0
Loss: 0.8496
Nll Loss: 0.0
Rewards/accuracies: 0.6680
Rewards/chosen: -1.1328
Rewards/margins: 0.5742
Rewards/rejected: -1.7031

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 16
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 2
total_train_batch_size: 64
total_eval_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3

Training results

Training Loss	Epoch	Step	Logits/chosen	Logits/rejected	Logps/chosen	Logps/rejected	Validation Loss	Rewards/accuracies	Rewards/chosen	Rewards/margins	Rewards/rejected
0.9346	0.1047	100	-8.5625	-7.9688	-33.25	-41.75	0.9349	0.6133	-0.3320	0.0864	-0.4180
0.9139	0.2093	200	-3.4531	-2.4375	-48.5	-63.5	0.9069	0.6270	-0.4844	0.1504	-0.6367
0.907	0.3140	300	-5.1875	-4.0	-69.5	-83.5	0.9099	0.6055	-0.6914	0.1416	-0.8359
0.901	0.4186	400	-1.7422	0.0164	-84.0	-101.0	0.8957	0.6328	-0.8359	0.1748	-1.0156
0.8752	0.5233	500	-0.5625	0.8555	-72.5	-95.5	0.8768	0.6582	-0.7266	0.2324	-0.9570
0.8808	0.6279	600	2.1562	3.2344	-86.0	-109.5	0.8742	0.6445	-0.8633	0.2334	-1.0938
0.8277	0.7326	700	-0.7930	0.3496	-52.0	-77.5	0.8679	0.6445	-0.5195	0.2520	-0.7734
0.8341	0.8373	800	0.2188	1.3047	-80.5	-108.5	0.8503	0.6602	-0.8047	0.2773	-1.0859
0.8333	0.9419	900	0.6406	1.8438	-90.0	-121.5	0.8454	0.6660	-0.8984	0.3184	-1.2188
0.8071	1.0466	1000	0.1504	1.3516	-100.0	-133.0	0.8441	0.6699	-1.0	0.3340	-1.3359
0.7845	1.1512	1100	-1.5078	0.3301	-84.5	-122.5	0.8307	0.6660	-0.8477	0.3809	-1.2266
0.7483	1.2559	1200	-0.4160	0.9805	-94.5	-133.0	0.8353	0.6758	-0.9453	0.3809	-1.3281
0.7802	1.3605	1300	-1.5859	0.3418	-62.0	-100.5	0.8363	0.7051	-0.6211	0.3828	-1.0
0.7499	1.4652	1400	-0.1719	1.4531	-97.0	-141.0	0.8228	0.7012	-0.9727	0.4414	-1.4141
0.6966	1.5699	1500	-0.3301	1.5	-106.0	-152.0	0.8231	0.6836	-1.0625	0.4609	-1.5234
0.6921	1.6745	1600	0.6133	2.25	-107.0	-155.0	0.8222	0.6875	-1.0703	0.4766	-1.5469
0.7162	1.7792	1700	0.6992	2.4688	-103.0	-154.0	0.8106	0.6953	-1.0312	0.5078	-1.5391
0.714	1.8838	1800	0.0579	2.1875	-109.5	-162.0	0.8183	0.6855	-1.0938	0.5312	-1.625
0.7068	1.9885	1900	0.3184	1.9922	-97.5	-151.0	0.8164	0.7031	-0.9727	0.5352	-1.5078
0.4781	2.0931	2000	0.0977	1.7344	-119.0	-171.0	0.8475	0.6797	-1.1875	0.5273	-1.7109
0.4964	2.1978	2100	-0.9258	0.9219	-100.0	-155.0	0.8455	0.6875	-1.0	0.5547	-1.5547
0.4723	2.3025	2200	-0.4648	1.2969	-110.0	-166.0	0.8475	0.6934	-1.1016	0.5586	-1.6562
0.5051	2.4071	2300	-0.2891	1.4141	-113.0	-170.0	0.8480	0.6895	-1.1328	0.5664	-1.6953
0.4647	2.5118	2400	-0.3496	1.4531	-114.0	-171.0	0.8463	0.6758	-1.1406	0.5742	-1.7188
0.4442	2.6164	2500	-0.1436	1.5859	-123.5	-180.0	0.8527	0.6680	-1.2344	0.5664	-1.7969
0.4349	2.7211	2600	-0.5898	1.2422	-112.0	-169.0	0.8505	0.6699	-1.1172	0.5742	-1.6953
0.4514	2.8257	2700	-0.6406	1.1953	-112.0	-169.0	0.8493	0.6738	-1.1172	0.5781	-1.6953
0.459	2.9304	2800	-0.5781	1.2422	-113.0	-171.0	0.8496	0.6680	-1.1328	0.5742	-1.7031

Framework versions

Transformers 4.44.2
Pytorch 2.3.0
Datasets 3.0.0
Tokenizers 0.19.1

CharlesLi
/

OpenELM-1_1B-SimPO

OpenELM-1_1B-SimPO

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Evaluation results