Mistral-7B-v0.3-dpo-10k

This model is a fine-tuned version of mistralai/Mistral-7B-v0.3 on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.4922
Rewards/real: -4.5129
Rewards/generated: -4.6699
Rewards/accuracies: 0.4423
Rewards/margins: 0.1570
Logps/generated: -155.2124
Logps/real: -181.5346
Logits/generated: -2.1203
Logits/real: -2.3164

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 4
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 2
total_train_batch_size: 32
total_eval_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/real	Rewards/generated	Rewards/accuracies	Rewards/margins	Logps/generated	Logps/real	Logits/generated	Logits/real
0.7877	0.0992	31	0.7717	0.6905	0.4761	0.7308	0.2144	-103.7523	-129.5006	-2.5223	-2.6981
0.6309	0.1984	62	0.7320	2.1415	1.7428	0.7115	0.3987	-91.0858	-114.9906	-2.5564	-2.7396
0.5309	0.2976	93	0.7175	1.5709	0.9016	0.6538	0.6692	-99.4969	-120.6967	-2.4703	-2.6171
0.4323	0.3968	124	0.7714	1.9586	1.4739	0.6923	0.4847	-93.7744	-116.8195	-2.6808	-2.8349
0.297	0.496	155	0.7161	2.2903	1.7549	0.8077	0.5355	-90.9648	-113.5018	-2.6256	-2.7696
0.2144	0.5952	186	0.8257	1.6038	1.0213	0.7115	0.5825	-98.3000	-120.3671	-2.8900	-3.0602
0.2497	0.6944	217	0.6849	2.4543	1.7831	0.8077	0.6712	-90.6823	-111.8619	-2.6469	-2.8201
0.1112	0.7936	248	0.6993	2.2831	1.4322	0.7885	0.8508	-94.1910	-113.5747	-2.7020	-2.8645
0.176	0.8928	279	0.6700	2.7841	2.2447	0.7692	0.5394	-86.0663	-108.5641	-2.8051	-2.9280
0.1135	0.992	310	0.6956	2.3849	1.8198	0.7885	0.5651	-90.3149	-112.5561	-2.8024	-2.9203
0.1221	1.0912	341	0.7314	2.2046	1.6143	0.7308	0.5903	-92.3708	-114.3593	-2.5886	-2.7365
0.0864	1.1904	372	0.7718	2.3206	1.9459	0.6346	0.3747	-89.0543	-113.1994	-2.5355	-2.7014
0.0871	1.2896	403	0.8231	1.9873	1.7063	0.5962	0.2810	-91.4506	-116.5322	-2.5240	-2.6833
0.1454	1.3888	434	0.7980	1.7358	1.2782	0.6731	0.4576	-95.7309	-119.0471	-2.4325	-2.6120
0.0747	1.488	465	0.8086	1.9033	1.4938	0.6538	0.4094	-93.5750	-117.3725	-2.3557	-2.5683
0.0882	1.5872	496	0.9281	0.8252	0.4834	0.5192	0.3418	-103.6798	-128.1537	-2.2722	-2.4783
0.0693	1.6864	527	0.8954	0.5032	-0.0439	0.6154	0.5471	-108.9523	-131.3737	-2.1399	-2.3681
0.0982	1.7856	558	0.8777	1.0122	0.5411	0.6538	0.4711	-103.1028	-126.2834	-2.3326	-2.5183
0.0674	1.8848	589	0.9360	-0.0587	-0.5311	0.5962	0.4724	-113.8238	-136.9920	-2.3026	-2.4848
0.0424	1.984	620	0.9421	-0.2586	-0.6968	0.5769	0.4382	-115.4816	-138.9915	-2.2955	-2.4846
0.0235	2.0832	651	1.0939	-1.6766	-2.0193	0.5	0.3428	-128.7065	-153.1709	-2.2115	-2.3974
0.024	2.1824	682	1.1491	-2.1565	-2.5396	0.5	0.3831	-133.9093	-157.9701	-2.2049	-2.3936
0.0469	2.2816	713	1.1324	-2.0618	-2.4801	0.5	0.4183	-133.3140	-157.0232	-2.2161	-2.4094
0.0328	2.3808	744	1.1837	-2.4534	-2.7702	0.4808	0.3168	-136.2151	-160.9390	-2.2080	-2.3952
0.0367	2.48	775	1.1779	-2.6139	-2.9724	0.4808	0.3585	-138.2376	-162.5442	-2.1815	-2.3777
0.0596	2.5792	806	1.2847	-3.3490	-3.6206	0.4231	0.2716	-144.7193	-169.8953	-2.1523	-2.3458
0.0395	2.6784	837	1.3358	-3.6588	-3.9010	0.4231	0.2422	-147.5237	-172.9937	-2.1399	-2.3346
0.0302	2.7776	868	1.3725	-3.7911	-4.0386	0.4231	0.2474	-148.8990	-174.3167	-2.1529	-2.3475
0.0132	2.8768	899	1.4969	-4.4629	-4.6237	0.4423	0.1607	-154.7499	-181.0344	-2.1227	-2.3178
0.034	2.976	930	1.4922	-4.5129	-4.6699	0.4423	0.1570	-155.2124	-181.5346	-2.1203	-2.3164

Framework versions

Transformers 4.43.3
Pytorch 2.2.2+cu121
Datasets 2.20.0
Tokenizers 0.19.1

AmberYifan
/

Mistral-7B-v0.3-dpo-10k

Mistral-7B-v0.3-dpo-10k

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for AmberYifan/Mistral-7B-v0.3-dpo-10k

Evaluation results