model_hh_usp1_400

This model is a fine-tuned version of meta-llama/Llama-2-7b-chat-hf on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.0025	4.0	100	1.8113	-1.7152	-5.1212	0.6100	3.4059	-119.7389	-112.3466	-0.1554	-0.1598
0.1942	8.0	200	3.6090	-1.4379	-8.2063	0.6100	6.7684	-123.1668	-112.0384	-1.0994	-1.1187
0.0502	12.0	300	3.3229	-9.0906	-16.5854	0.6200	7.4948	-132.4769	-120.5415	-0.9988	-1.0079
0.0	16.0	400	3.4296	-2.1656	-10.3972	0.6900	8.2316	-125.6012	-112.8470	-0.9657	-0.9623
0.0	20.0	500	3.4471	-2.1796	-10.4172	0.7100	8.2376	-125.6234	-112.8626	-0.9676	-0.9637
0.0	24.0	600	3.4031	-2.1735	-10.4669	0.7000	8.2933	-125.6786	-112.8558	-0.9675	-0.9640
0.0	28.0	700	3.4346	-2.1542	-10.4272	0.7000	8.2730	-125.6345	-112.8343	-0.9673	-0.9639
0.0	32.0	800	3.4246	-2.1606	-10.4103	0.6900	8.2497	-125.6157	-112.8415	-0.9675	-0.9642
0.0	36.0	900	3.4315	-2.1805	-10.4501	0.7000	8.2696	-125.6599	-112.8635	-0.9674	-0.9639
0.0	40.0	1000	3.4291	-2.1852	-10.4536	0.6900	8.2684	-125.6639	-112.8688	-0.9672	-0.9637