Llama-3.1-8B-Instruct-SAA-1000

This model is a fine-tuned version of meta-llama/Llama-3.1-8B-Instruct on the bct_non_cot_dpo_1000 dataset. It achieves the following results on the evaluation set:

Loss: 0.1041
Rewards/chosen: -0.0071
Rewards/rejected: -0.0574
Rewards/accuracies: 0.8700
Rewards/margins: 0.0503
Logps/rejected: -0.5741
Logps/chosen: -0.0707
Logits/rejected: -0.3997
Logits/chosen: -0.3439
Sft Loss: 0.0083
Odds Ratio Loss: 0.9577

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 2
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10.0

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen	Sft Loss	Odds Ratio Loss
1.61	0.8889	50	1.4462	-0.1395	-0.1818	0.7900	0.0423	-1.8179	-1.3950	-0.4872	-0.4121	0.1643	12.8185
0.3241	1.7778	100	0.2648	-0.0222	-0.0659	0.8200	0.0438	-0.6595	-0.2217	-0.4637	-0.3875	0.0232	2.4164
0.1509	2.6667	150	0.1238	-0.0084	-0.0490	0.8600	0.0406	-0.4900	-0.0840	-0.4176	-0.3601	0.0101	1.1374
0.1335	3.5556	200	0.1089	-0.0074	-0.0505	0.8600	0.0432	-0.5055	-0.0738	-0.4038	-0.3492	0.0087	1.0023
0.1253	4.4444	250	0.1136	-0.0078	-0.0536	0.8800	0.0458	-0.5355	-0.0776	-0.3998	-0.3449	0.0097	1.0396
0.0851	5.3333	300	0.1041	-0.0071	-0.0574	0.8700	0.0503	-0.5741	-0.0707	-0.3997	-0.3439	0.0083	0.9577
0.0824	6.2222	350	0.1065	-0.0073	-0.0587	0.8700	0.0514	-0.5869	-0.0728	-0.3969	-0.3419	0.0088	0.9767
0.0869	7.1111	400	0.1160	-0.0080	-0.0625	0.8800	0.0545	-0.6250	-0.0801	-0.3942	-0.3392	0.0102	1.0581
0.0715	8.0	450	0.1095	-0.0075	-0.0618	0.8800	0.0543	-0.6184	-0.0750	-0.3933	-0.3379	0.0092	1.0028
0.0751	8.8889	500	0.1095	-0.0075	-0.0618	0.8800	0.0543	-0.6181	-0.0752	-0.3939	-0.3386	0.0093	1.0026
0.0784	9.7778	550	0.1089	-0.0075	-0.0622	0.8700	0.0547	-0.6221	-0.0747	-0.3937	-0.3381	0.0091	0.9983

Framework versions

PEFT 0.12.0
Transformers 4.45.2
Pytorch 2.3.0
Datasets 2.19.0
Tokenizers 0.20.0

chchen
/

Llama-3.1-8B-Instruct-SAA-1000

Llama-3.1-8B-Instruct-SAA-1000

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for chchen/Llama-3.1-8B-Instruct-SAA-1000

Evaluation results