Mistral-7B-v0.3-gen-dpo-10k

This model is a fine-tuned version of mistralai/Mistral-7B-v0.3 on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.4487
Rewards/real: 7.5104
Rewards/generated: 1.3739
Rewards/accuracies: 0.8462
Rewards/margins: 6.1365
Logps/generated: -218.1546
Logps/real: -153.6829
Logits/generated: -1.8523
Logits/real: -2.6888

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 4
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 2
total_train_batch_size: 32
total_eval_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/real	Rewards/generated	Rewards/accuracies	Rewards/margins	Logps/generated	Logps/real	Logits/generated	Logits/real
0.877	0.0992	31	0.8103	0.1264	-0.1498	0.7308	0.2762	-233.3916	-227.5227	-3.0962	-3.1732
0.6795	0.1984	62	0.6395	0.2309	-1.2902	0.8462	1.5211	-244.7957	-226.4777	-2.7947	-2.9556
0.5656	0.2976	93	0.5345	-0.4078	-2.8686	0.8846	2.4608	-260.5800	-232.8651	-2.6125	-2.8094
0.5392	0.3968	124	0.4632	0.7347	-2.5555	0.9038	3.2903	-257.4490	-221.4394	-2.6252	-2.8182
0.519	0.496	155	0.4194	0.8203	-2.7159	0.9038	3.5362	-259.0524	-220.5834	-2.5450	-2.7892
0.4522	0.5952	186	0.4109	1.1478	-3.3424	0.9038	4.4902	-265.3172	-217.3086	-2.1889	-2.5745
0.4175	0.6944	217	0.4189	1.7292	-3.5464	0.8846	5.2756	-267.3578	-211.4943	-2.0150	-2.4367
0.5123	0.7936	248	0.4031	1.5992	-2.5850	0.9038	4.1842	-257.7434	-212.7944	-2.3289	-2.6748
0.4467	0.8928	279	0.4215	2.1259	-3.1648	0.8654	5.2908	-263.5421	-207.5273	-1.9457	-2.5122
0.432	0.992	310	0.3889	2.4989	-2.2218	0.9038	4.7207	-254.1118	-203.7978	-2.1945	-2.6616
0.206	1.0912	341	0.3944	3.9149	-1.2192	0.8654	5.1341	-244.0859	-189.6380	-2.1800	-2.7888
0.1884	1.1904	372	0.3790	4.2792	-1.2916	0.8846	5.5708	-244.8093	-185.9946	-2.2022	-2.8626
0.1866	1.2896	403	0.3799	4.7981	-1.0761	0.8654	5.8742	-242.6544	-180.8058	-2.1602	-2.8470
0.195	1.3888	434	0.3898	5.3519	-0.2792	0.8462	5.6311	-234.6861	-175.2681	-2.2097	-2.9184
0.1787	1.488	465	0.4027	5.4325	-0.2501	0.8462	5.6826	-234.3951	-174.4621	-2.3064	-2.9594
0.1808	1.5872	496	0.3806	5.3354	-0.7266	0.8654	6.0620	-239.1595	-175.4325	-2.1412	-2.9272
0.1629	1.6864	527	0.3708	5.4311	-0.4097	0.8846	5.8408	-235.9910	-174.4760	-2.3029	-2.9067
0.1993	1.7856	558	0.3883	6.1042	0.4673	0.8654	5.6370	-227.2212	-167.7442	-2.2351	-2.9173
0.1687	1.8848	589	0.3744	5.8543	-0.2070	0.8846	6.0613	-233.9639	-170.2437	-2.1107	-2.7838
0.1721	1.984	620	0.3694	5.9535	-0.1297	0.8846	6.0832	-233.1905	-169.2515	-2.1660	-2.8452
0.1383	2.0832	651	0.3671	6.4065	0.2173	0.8846	6.1892	-229.7208	-164.7217	-1.9638	-2.7078
0.1365	2.1824	682	0.3923	7.0262	0.7925	0.8654	6.2337	-223.9683	-158.5246	-1.8941	-2.6932
0.1396	2.2816	713	0.4887	7.4973	2.3669	0.8077	5.1304	-208.2246	-153.8134	-2.1548	-2.9114
0.1397	2.3808	744	0.4148	7.3082	1.2357	0.8462	6.0725	-219.5363	-155.7047	-1.9055	-2.6924
0.1351	2.48	775	0.4137	7.3508	1.1950	0.8654	6.1558	-219.9435	-155.2784	-1.8891	-2.6999
0.14	2.5792	806	0.4429	7.5628	1.7969	0.8654	5.7659	-213.9247	-153.1584	-1.9732	-2.7988
0.1303	2.6784	837	0.4819	7.7271	2.2012	0.8654	5.5260	-209.8819	-151.5153	-1.9800	-2.8090
0.1329	2.7776	868	0.4405	7.4045	1.1559	0.8462	6.2486	-220.3349	-154.7421	-1.8232	-2.6413
0.1353	2.8768	899	0.4549	7.5125	1.3390	0.8462	6.1735	-218.5042	-153.6622	-1.8288	-2.6739
0.1319	2.976	930	0.4487	7.5104	1.3739	0.8462	6.1365	-218.1546	-153.6829	-1.8523	-2.6888

Framework versions

Transformers 4.43.3
Pytorch 2.2.2+cu121
Datasets 2.20.0
Tokenizers 0.19.1

AmberYifan
/

Mistral-7B-v0.3-gen-dpo-10k

Mistral-7B-v0.3-gen-dpo-10k

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for AmberYifan/Mistral-7B-v0.3-gen-dpo-10k

Evaluation results