model_hh_shp3_200

This model is a fine-tuned version of meta-llama/Llama-2-7b-chat-hf on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
8.0	100	2.2133	-1.8646	-2.4313	0.5300	0.5667	-215.8960	-234.5382	-0.6413	-0.6979
16.0	200	2.2571	-1.9454	-2.5096	0.5300	0.5642	-215.9830	-234.6279	-0.6423	-0.6991
24.0	300	2.2275	-1.9722	-2.5264	0.5200	0.5542	-216.0016	-234.6577	-0.6429	-0.6988
32.0	400	2.2729	-2.0276	-2.5437	0.5200	0.5161	-216.0209	-234.7193	-0.6425	-0.6991
40.0	500	2.2476	-2.0622	-2.6344	0.5300	0.5723	-216.1217	-234.7577	-0.6440	-0.7005
48.0	600	2.2449	-2.0779	-2.6423	0.5300	0.5645	-216.1305	-234.7751	-0.6434	-0.6996
56.0	700	2.2415	-2.0486	-2.6063	0.5300	0.5577	-216.0904	-234.7426	-0.6439	-0.7000
64.0	800	2.2311	-2.0778	-2.6332	0.5300	0.5554	-216.1204	-234.7751	-0.6440	-0.7000
72.0	900	2.2534	-2.0857	-2.6363	0.5300	0.5507	-216.1238	-234.7838	-0.6437	-0.6996
80.0	1000	2.2700	-2.0808	-2.6185	0.5300	0.5377	-216.1040	-234.7784	-0.6432	-0.6992