Llama-3.1-8B-Instruct-SAA-600

This model is a fine-tuned version of meta-llama/Llama-3.1-8B-Instruct on the bct_non_cot_dpo_600 dataset. It achieves the following results on the evaluation set:

Loss: 0.0943
Rewards/chosen: -0.0072
Rewards/rejected: -0.0623
Rewards/accuracies: 0.8833
Rewards/margins: 0.0551
Logps/rejected: -0.6233
Logps/chosen: -0.0722
Logits/rejected: -0.4048
Logits/chosen: -0.3432
Sft Loss: 0.0119
Odds Ratio Loss: 0.8243

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 2
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10.0

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen	Sft Loss	Odds Ratio Loss
1.3352	1.4815	50	1.0317	-0.0989	-0.1576	0.8333	0.0587	-1.5758	-0.9889	-0.4812	-0.4002	0.1167	9.1492
0.2371	2.9630	100	0.1655	-0.0135	-0.0699	0.8833	0.0564	-0.6987	-0.1348	-0.4551	-0.3813	0.0177	1.4782
0.1421	4.4444	150	0.1010	-0.0077	-0.0577	0.8833	0.0500	-0.5773	-0.0770	-0.4107	-0.3473	0.0124	0.8869
0.1291	5.9259	200	0.0984	-0.0075	-0.0594	0.8833	0.0518	-0.5936	-0.0752	-0.4066	-0.3442	0.0123	0.8613
0.1246	7.4074	250	0.0943	-0.0072	-0.0623	0.8833	0.0551	-0.6233	-0.0722	-0.4048	-0.3432	0.0119	0.8243
0.1045	8.8889	300	0.0948	-0.0072	-0.0628	0.8833	0.0555	-0.6277	-0.0724	-0.4046	-0.3432	0.0119	0.8292

Framework versions

PEFT 0.12.0
Transformers 4.45.2
Pytorch 2.3.0
Datasets 2.19.0
Tokenizers 0.20.0

chchen
/

Llama-3.1-8B-Instruct-SAA-600

Llama-3.1-8B-Instruct-SAA-600

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for chchen/Llama-3.1-8B-Instruct-SAA-600

Evaluation results