metadata

library_name: transformers
tags:
  - trl
  - dpo
  - alignment-handbook
  - generated_from_trainer
model-index:
  - name: OpenELM-1_1B-SLiC
    results: []

OpenELM-1_1B-SLiC

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

Logits/chosen: -10.0625
Logits/rejected: -8.75
Logps/chosen: -752.0
Logps/rejected: -824.0
Loss: 0.6883
Rewards/accuracies: 0.7344
Rewards/chosen: -4.3438
Rewards/margins: 0.9922
Rewards/rejected: -5.3438

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 16
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 2
total_train_batch_size: 64
total_eval_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3

Training results

Training Loss	Epoch	Step	Logits/chosen	Logits/rejected	Logps/chosen	Logps/rejected	Validation Loss	Rewards/accuracies	Rewards/chosen	Rewards/margins	Rewards/rejected
0.7634	0.1047	100	-13.0625	-12.9375	-392.0	-392.0	0.7878	0.6406	-0.7461	0.2832	-1.0312
0.7498	0.2093	200	-12.75	-12.4375	-436.0	-444.0	0.7468	0.6719	-1.1719	0.3809	-1.5547
0.8142	0.3140	300	-14.8125	-14.75	-504.0	-516.0	0.7466	0.6914	-1.8594	0.4141	-2.2812
0.7764	0.4186	400	-14.5625	-14.4375	-516.0	-528.0	0.7499	0.6699	-1.9688	0.4316	-2.4062
0.731	0.5233	500	-11.0	-10.5	-560.0	-576.0	0.7240	0.6914	-2.4219	0.4375	-2.8594
0.665	0.6279	600	-10.75	-10.0625	-660.0	-696.0	0.7045	0.6973	-3.4062	0.6680	-4.0625
0.6806	0.7326	700	-13.875	-13.4375	-568.0	-604.0	0.6912	0.7070	-2.5156	0.6523	-3.1562
0.6597	0.8373	800	-13.5	-13.3125	-548.0	-576.0	0.7087	0.6777	-2.2969	0.5664	-2.8594
0.7325	0.9419	900	-14.0	-13.25	-588.0	-624.0	0.6838	0.7090	-2.6875	0.6602	-3.3594
0.2677	1.0466	1000	-12.1875	-11.0625	-640.0	-688.0	0.6726	0.7070	-3.2344	0.7734	-4.0
0.2256	1.1512	1100	-11.125	-10.0625	-676.0	-728.0	0.6992	0.7090	-3.5938	0.7969	-4.375
0.1954	1.2559	1200	-11.3125	-10.125	-664.0	-720.0	0.7033	0.7051	-3.4688	0.8477	-4.3125
0.2289	1.3605	1300	-11.0	-9.9375	-692.0	-740.0	0.6722	0.7344	-3.7344	0.7852	-4.5
0.2227	1.4652	1400	-12.5	-11.8125	-676.0	-720.0	0.6925	0.6953	-3.5781	0.7383	-4.3125
0.1902	1.5699	1500	-12.0625	-11.125	-736.0	-792.0	0.6758	0.7148	-4.1875	0.8320	-5.0312
0.2192	1.6745	1600	-13.625	-12.875	-704.0	-748.0	0.6833	0.7148	-3.8438	0.7695	-4.625
0.2137	1.7792	1700	-11.9375	-11.0	-716.0	-764.0	0.6734	0.7207	-3.9688	0.8008	-4.7812
0.2001	1.8838	1800	-12.125	-11.3125	-692.0	-740.0	0.6734	0.7207	-3.7344	0.7617	-4.5
0.1713	1.9885	1900	-10.4375	-9.25	-712.0	-768.0	0.6680	0.7383	-3.9375	0.8789	-4.8125
0.0184	2.0931	2000	-11.0625	-9.875	-704.0	-768.0	0.6845	0.7305	-3.8594	0.9453	-4.8125
0.0313	2.1978	2100	-11.25	-10.125	-720.0	-784.0	0.6798	0.7402	-4.0	0.9570	-4.9688
0.0401	2.3025	2200	-10.6875	-9.375	-732.0	-800.0	0.6865	0.7363	-4.1562	0.9492	-5.0938
0.0211	2.4071	2300	-10.125	-8.75	-740.0	-812.0	0.6874	0.7383	-4.2188	1.0078	-5.2188
0.0239	2.5118	2400	-10.1875	-8.875	-736.0	-800.0	0.6858	0.7383	-4.1562	0.9766	-5.125
0.0188	2.6164	2500	-10.125	-8.8125	-744.0	-816.0	0.6902	0.7324	-4.2812	0.9883	-5.25
0.0145	2.7211	2600	-10.125	-8.8125	-748.0	-816.0	0.6874	0.7383	-4.2812	0.9844	-5.2812
0.0229	2.8257	2700	-10.0625	-8.75	-752.0	-824.0	0.6883	0.7344	-4.3438	0.9922	-5.3438
0.0298	2.9304	2800	-10.0625	-8.75	-752.0	-824.0	0.6883	0.7344	-4.3438	0.9922	-5.3438

Framework versions

Transformers 4.44.2
Pytorch 2.3.0
Datasets 3.0.0
Tokenizers 0.19.1