Llama-3-8b-ultra-p-0.05-e2

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.5220
Rewards/chosen: -0.9441
Rewards/rejected: -1.8650
Rewards/accuracies: 0.7109
Rewards/margins: 0.9209
Logps/rejected: -451.1644
Logps/chosen: -350.9630
Logits/rejected: 0.7109
Logits/chosen: 0.5782

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 2
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 8
total_train_batch_size: 128
total_eval_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 2.0

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6104	0.2060	100	0.6072	-0.3684	-0.6016	0.6797	0.2333	-324.8277	-293.3899	0.3070	0.2457
0.584	0.4119	200	0.5793	-0.4564	-0.8408	0.6875	0.3845	-348.7453	-302.1898	0.2960	0.2163
0.5603	0.6179	300	0.5645	-0.5560	-1.0698	0.7031	0.5137	-371.6387	-312.1552	0.4682	0.3453
0.5484	0.8239	400	0.5480	-0.6169	-1.2393	0.7031	0.6224	-388.5962	-318.2488	0.5253	0.4021
0.5107	1.0299	500	0.5334	-0.8225	-1.6081	0.7344	0.7856	-425.4747	-338.8083	0.6083	0.4722
0.4572	1.2358	600	0.5305	-1.0575	-1.9794	0.6953	0.9220	-462.6064	-362.2999	0.7654	0.6337
0.469	1.4418	700	0.5242	-0.9388	-1.8351	0.7188	0.8963	-448.1704	-350.4342	0.7212	0.5900
0.4684	1.6478	800	0.5208	-0.9854	-1.9102	0.7188	0.9248	-455.6850	-355.0974	0.7644	0.6311
0.473	1.8538	900	0.5217	-0.9660	-1.8992	0.7031	0.9332	-454.5815	-353.1494	0.7253	0.5920

Framework versions

Transformers 4.45.1
Pytorch 2.4.1+cu121
Datasets 3.0.0
Tokenizers 0.20.0

tongliuphysics
/

Llama-3-8b-ultra-p-0.05-e2

Llama-3-8b-ultra-p-0.05-e2

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for tongliuphysics/Llama-3-8b-ultra-p-0.05-e2

Evaluation results