Llama-3-8b-ultra-p-0.05-e3

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.5170
Rewards/chosen: -1.6079
Rewards/rejected: -2.9565
Rewards/accuracies: 0.7188
Rewards/margins: 1.3486
Logps/rejected: -560.3118
Logps/chosen: -417.3455
Logits/rejected: 0.8998
Logits/chosen: 0.8373

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 2
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 8
total_train_batch_size: 128
total_eval_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 3.0

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6098	0.2060	100	0.6064	-0.3781	-0.6162	0.6719	0.2381	-326.2885	-294.3692	0.3025	0.2412
0.5824	0.4119	200	0.5766	-0.4669	-0.8684	0.6875	0.4014	-351.5011	-303.2473	0.2978	0.2161
0.558	0.6179	300	0.5602	-0.5726	-1.1031	0.7031	0.5305	-374.9736	-313.8190	0.4883	0.3629
0.5438	0.8239	400	0.5401	-0.6318	-1.2870	0.7188	0.6552	-393.3680	-319.7343	0.5810	0.4526
0.4996	1.0299	500	0.5226	-0.9294	-1.7973	0.75	0.8679	-444.3959	-349.4941	0.6763	0.5397
0.4404	1.2358	600	0.5192	-1.1485	-2.1556	0.7109	1.0070	-480.2187	-371.4092	0.8806	0.7545
0.4514	1.4418	700	0.5123	-1.0111	-2.0122	0.75	1.0011	-465.8849	-357.6671	0.8326	0.7097
0.4485	1.6478	800	0.5089	-1.0391	-2.0491	0.7188	1.0101	-469.5780	-360.4607	0.8489	0.7355
0.454	1.8538	900	0.5120	-1.0757	-2.1427	0.7422	1.0670	-478.9369	-364.1291	0.7763	0.6660
0.3813	2.0597	1000	0.5197	-1.4982	-2.7845	0.7344	1.2863	-543.1158	-406.3754	0.7819	0.7006
0.3665	2.2657	1100	0.5132	-1.4188	-2.6985	0.7422	1.2797	-534.5103	-398.4331	0.8288	0.7620
0.3692	2.4717	1200	0.5156	-1.5090	-2.8103	0.7422	1.3013	-545.6944	-407.4524	0.8530	0.7832
0.3733	2.6777	1300	0.5157	-1.4882	-2.7655	0.7422	1.2774	-541.2150	-405.3702	0.8711	0.8075
0.3498	2.8836	1400	0.5175	-1.6115	-2.9606	0.7344	1.3492	-560.7245	-417.7000	0.8930	0.8310

Framework versions

Transformers 4.45.1
Pytorch 2.4.1+cu121
Datasets 3.0.0
Tokenizers 0.20.0

tongliuphysics
/

Llama-3-8b-ultra-p-0.05-e3

Llama-3-8b-ultra-p-0.05-e3

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for tongliuphysics/Llama-3-8b-ultra-p-0.05-e3

Evaluation results