zephyr-7b-dpo-qlora

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-qlora on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

Loss: 0.5036
Rewards/chosen: -2.0892
Rewards/rejected: -3.1197
Rewards/accuracies: 0.7295
Rewards/margins: 1.0304
Logps/rejected: -560.7722
Logps/chosen: -477.4810
Logits/rejected: 2.3638
Logits/chosen: 1.7891

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 4
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
gradient_accumulation_steps: 2
total_train_batch_size: 8
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Logits/chosen	Logits/rejected	Logps/chosen	Logps/rejected	Validation Loss	Rewards/accuracies	Rewards/chosen	Rewards/margins	Rewards/rejected
0.6931	0.01	100	-2.2163	-2.1335	-268.5095	-248.7855	0.6930	0.5135	0.0005	0.0003	0.0002
0.6926	0.03	200	-2.2157	-2.1330	-268.3331	-248.7224	0.6924	0.5885	0.0023	0.0014	0.0008
0.6904	0.04	300	-2.2194	-2.1373	-267.3080	-248.1708	0.6901	0.6475	0.0125	0.0062	0.0064
0.6868	0.05	400	-2.2179	-2.1356	-264.7627	-247.1243	0.6830	0.6610	0.0380	0.0211	0.0168
0.6781	0.07	500	-2.1590	-2.0748	-266.5388	-252.3708	0.6679	0.6785	0.0202	0.0558	-0.0356
0.6565	0.08	600	-2.0685	-1.9763	-278.9226	-272.4421	0.6403	0.6805	-0.1036	0.1327	-0.2364
0.6411	0.09	700	-2.0181	-1.9197	-283.8720	-282.3092	0.6254	0.6820	-0.1531	0.1819	-0.3350
0.6177	0.1	800	-1.9304	-1.8202	-307.0186	-313.1128	0.6134	0.6765	-0.3846	0.2585	-0.6431
0.6333	0.12	900	-1.9660	-1.8566	-308.6199	-317.1526	0.6082	0.6740	-0.4006	0.2829	-0.6835
0.5776	0.13	1000	-2.0038	-1.8956	-335.0627	-351.8794	0.6066	0.6735	-0.6650	0.3657	-1.0307
0.6093	0.14	1100	-2.0022	-1.9019	-324.4846	-341.5230	0.6075	0.6740	-0.5592	0.3679	-0.9272
0.5607	0.16	1200	-1.9182	-1.8081	-352.8372	-375.3466	0.5970	0.6800	-0.8428	0.4226	-1.2654
0.5627	0.17	1300	-1.3203	-1.1519	-411.9446	-433.7877	0.5935	0.6850	-1.4339	0.4160	-1.8498
0.5853	0.18	1400	-1.0520	-0.8708	-389.5525	-417.2325	0.5842	0.6950	-1.2099	0.4743	-1.6843
0.5622	0.2	1500	-0.6561	-0.4323	-419.2693	-453.9020	0.5712	0.6990	-1.5071	0.5439	-2.0510
0.4815	0.21	1600	-0.5810	-0.3415	-421.0228	-464.6043	0.5663	0.7035	-1.5246	0.6333	-2.1580
0.4698	0.22	1700	0.5697	-1.8165	-2.4986	0.6990	0.6821	-498.6652	-450.2103	0.5641	0.2594
0.5213	0.24	1800	0.5670	-1.4236	-2.1011	0.7055	0.6776	-458.9214	-410.9152	0.6173	0.2952
0.5295	0.25	1900	0.5606	-1.9797	-2.6952	0.6945	0.7155	-518.3280	-466.5294	0.8941	0.5819
0.6074	0.26	2000	0.5525	-1.1848	-1.7881	0.7165	0.6033	-427.6170	-387.0396	0.3449	0.0271
0.568	0.27	2100	0.5388	-1.5667	-2.2488	0.7220	0.6822	-473.6912	-425.2263	1.3497	0.9786
0.5643	0.29	2200	0.5539	-1.8112	-2.6184	0.7145	0.8072	-510.6461	-449.6774	1.9603	1.5565
0.5226	0.3	2300	0.5354	-1.6020	-2.3588	0.7245	0.7568	-484.6839	-428.7553	1.3673	0.9661
0.4144	0.31	2400	0.5338	-2.0110	-2.8276	0.7245	0.8167	-531.5681	-469.6557	1.6609	1.2542
0.5233	0.33	2500	0.5387	-1.9001	-2.7290	0.7245	0.8289	-521.7109	-458.5734	1.7390	1.3093
0.5425	0.34	2600	0.5430	-2.4619	-3.3366	0.7225	0.8747	-582.4704	-514.7514	2.4431	1.9262
0.4719	0.35	2700	0.5309	-1.9122	-2.7118	0.7285	0.7996	-519.9872	-459.7816	2.0586	1.6066
0.5543	0.37	2800	0.5268	-1.7066	-2.4623	0.7225	0.7557	-495.0332	-439.2162	1.5924	1.1721
0.5409	0.38	2900	0.5400	-2.1879	-3.1551	0.7175	0.9673	-564.3220	-487.3477	2.0890	1.6062
0.4956	0.39	3000	0.5285	-1.8388	-2.7165	0.7285	0.8777	-520.4593	-452.4431	1.6464	1.1679
0.4572	0.41	3100	0.5198	-1.6639	-2.4269	0.7265	0.7630	-491.4958	-434.9505	1.7627	1.2994
0.4962	0.42	3200	0.5181	-1.6914	-2.5214	0.7265	0.8300	-500.9511	-437.6994	1.6452	1.1780
0.6098	0.43	3300	0.5188	-1.6044	-2.4380	0.7310	0.8336	-492.6022	-428.9995	1.5141	1.0617
0.5349	0.44	3400	0.5210	-1.4720	-2.3090	0.7285	0.8370	-479.7061	-415.7578	1.4965	1.0371
0.4773	0.46	3500	0.5206	-1.4425	-2.2285	0.7280	0.7861	-471.6597	-412.8062	1.8090	1.3264
0.5312	0.47	3600	0.5196	-1.8128	-2.6719	0.7320	0.8591	-515.9943	-449.8387	2.5339	2.0191
0.5879	0.48	3700	0.5128	-1.9225	-2.7975	0.7355	0.8750	-528.5556	-460.8123	2.9390	2.3934
0.5202	0.5	3800	0.5155	-1.8291	-2.7153	0.7330	0.8863	-520.3419	-451.4667	2.2728	1.7445
0.5116	0.51	3900	0.5188	-2.0732	-3.0427	0.7285	0.9696	-553.0799	-475.8752	2.2721	1.7291
0.5521	0.52	4000	0.5161	-2.3283	-3.3054	0.7255	0.9771	-579.3469	-501.3872	2.2577	1.7449
0.5107	0.54	4100	0.5197	-1.8192	-2.7348	0.7215	0.9156	-522.2897	-450.4803	1.7678	1.2222
0.4773	0.55	4200	0.5163	-2.1894	-3.1554	0.7265	0.9660	-564.3451	-487.4992	1.8497	1.3121
0.4315	0.56	4300	0.5097	-2.0873	-3.0416	0.7340	0.9544	-552.9705	-477.2872	2.2039	1.6783
0.5176	0.58	4400	0.5097	-2.2486	-3.2409	0.7290	0.9924	-572.8979	-493.4146	2.0782	1.5387
0.4487	0.59	4500	0.5132	-2.0257	-3.0144	0.7245	0.9887	-550.2475	-471.1282	2.0676	1.4968
0.478	0.6	4600	0.5082	-2.0565	-3.0343	0.7270	0.9778	-552.2376	-474.2084	2.1065	1.5402
0.5351	0.62	4700	0.5038	-1.9625	-2.8993	0.7285	0.9368	-538.7390	-464.8120	2.0488	1.5017
0.4942	0.63	4800	0.5058	-2.2570	-3.2479	0.7305	0.9909	-573.5954	-494.2575	2.5210	1.9471
0.4918	0.64	4900	0.5129	-2.4781	-3.5322	0.7350	1.0541	-602.0275	-516.3653	2.8295	2.2468
0.4693	0.65	5000	0.5131	-2.2974	-3.3589	0.7315	1.0615	-584.6987	-498.2968	2.6931	2.1137
0.5796	0.67	5100	0.5084	-2.1485	-3.1709	0.7300	1.0224	-565.8975	-483.4113	2.4925	1.9365
0.5137	0.68	5200	0.5012	-2.0083	-2.9370	0.7365	0.9287	-542.5073	-469.3903	2.0969	1.5738
0.4484	0.69	5300	0.5022	-2.1149	-3.0765	0.7345	0.9616	-556.4618	-480.0531	2.2539	1.7154
0.4608	0.71	5400	0.5035	-2.1639	-3.1586	0.7380	0.9947	-564.6663	-484.9485	2.2224	1.6704
0.5746	0.72	5500	0.5045	-2.3599	-3.4023	0.7320	1.0424	-589.0370	-504.5520	2.2134	1.6562
0.5768	0.73	5600	0.5011	-2.0662	-3.0430	0.7375	0.9767	-553.1031	-475.1830	1.8199	1.2667
0.4359	0.75	5700	0.5032	-2.0933	-3.1100	0.7350	1.0166	-559.8049	-477.8932	1.9073	1.3503
0.4812	0.76	5800	0.5056	-2.2931	-3.3640	0.7320	1.0709	-585.2068	-497.8671	2.1234	1.5508
0.5048	0.77	5900	0.5036	-1.9424	-2.9286	0.7335	0.9862	-541.6672	-462.8024	1.7970	1.2367
0.4505	0.79	6000	0.5053	-1.9881	-2.9896	0.7330	1.0015	-547.7703	-467.3695	1.9582	1.3843
0.5197	0.8	6100	0.5071	-2.0238	-3.0391	0.7315	1.0152	-552.7153	-470.9445	2.0118	1.4341
0.6046	0.81	6200	0.5064	-2.0803	-3.1116	0.7310	1.0313	-559.9708	-476.5939	2.1151	1.5328
0.4669	0.82	6300	0.5072	-2.1010	-3.1541	0.7310	1.0531	-564.2192	-478.6570	2.2264	1.6394
0.5631	0.84	6400	0.5055	-2.0938	-3.1385	0.7305	1.0447	-562.6528	-477.9385	2.3072	1.7230
0.433	0.85	6500	0.5044	-2.0630	-3.0936	0.7290	1.0306	-558.1638	-474.8586	2.2760	1.6963
0.4908	0.86	6600	0.5043	-2.0569	-3.0863	0.7295	1.0294	-557.4365	-474.2540	2.3343	1.7557
0.522	0.88	6700	0.5039	-2.0755	-3.1060	0.7300	1.0304	-559.4037	-476.1125	2.3469	1.7706
0.4953	0.89	6800	0.5039	-2.0918	-3.1235	0.7290	1.0317	-561.1605	-477.7388	2.3881	1.8129
0.5683	0.9	6900	0.5036	-2.0899	-3.1203	0.7300	1.0304	-560.8373	-477.5472	2.3649	1.7897
0.5399	0.92	7000	0.5037	-2.0831	-3.1119	0.7295	1.0288	-560.0004	-476.8721	2.3590	1.7832
0.4628	0.93	7100	0.5035	-2.0882	-3.1188	0.7300	1.0307	-560.6896	-477.3761	2.3659	1.7910
0.5273	0.94	7200	0.5036	-2.0897	-3.1202	0.7295	1.0305	-560.8275	-477.5317	2.3594	1.7853
0.4445	0.96	7300	0.5035	-2.0889	-3.1197	0.7305	1.0308	-560.7729	-477.4447	2.3614	1.7871
0.4839	0.97	7400	0.5035	-2.0894	-3.1199	0.7310	1.0304	-560.7961	-477.5042	2.3646	1.7896
0.4425	0.98	7500	0.5036	-2.0892	-3.1197	0.7295	1.0304	-560.7722	-477.4810	2.3638	1.7891
0.5195	0.99	7600	0.5036	-2.0892	-3.1197	0.7295	1.0304	-560.7722	-477.4810	2.3638	1.7891

Framework versions

PEFT 0.7.1
Transformers 4.36.2
Pytorch 2.1.2+cu121
Datasets 2.14.6
Tokenizers 0.15.0

wirthdrew1
/

zephyr-7b-dpo-qlora

zephyr-7b-dpo-qlora

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for wirthdrew1/zephyr-7b-dpo-qlora

Dataset used to train wirthdrew1/zephyr-7b-dpo-qlora

Evaluation results