ROE_QA_TAIDE-LX-7B-Chat_Q100_80_20_V6

This model is a fine-tuned version of taide/TAIDE-LX-7B-Chat on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.3442

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 1
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 4
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 5
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss
4.8628	0.0321	100	3.4692
4.6437	0.0643	200	3.2308
3.8154	0.0964	300	2.8144
3.3006	0.1285	400	2.5033
2.953	0.1607	500	2.2690
2.2844	0.1928	600	1.9550
2.134	0.2249	700	1.7888
1.8182	0.2571	800	1.6310
1.6776	0.2892	900	1.5062
1.6317	0.3213	1000	1.4026
1.6311	0.3535	1100	1.2253
1.1003	0.3856	1200	1.0703
1.2014	0.4177	1300	1.0672
0.9489	0.4499	1400	0.9135
1.0068	0.4820	1500	0.8950
1.0342	0.5141	1600	0.8074
0.9713	0.5463	1700	0.7767
0.9598	0.5784	1800	0.6995
0.9357	0.6105	1900	0.6643
0.8298	0.6427	2000	0.6456
0.8803	0.6748	2100	0.6164
0.8899	0.7069	2200	0.5867
0.7877	0.7391	2300	0.5748
0.7792	0.7712	2400	0.5513
0.8387	0.8033	2500	0.5267
0.8625	0.8355	2600	0.5111
0.8514	0.8676	2700	0.4992
0.762	0.8997	2800	0.4881
0.779	0.9319	2900	0.4840
0.8938	0.9640	3000	0.4676
0.7007	0.9961	3100	0.4668
0.7006	1.0283	3200	0.4500
0.6622	1.0604	3300	0.4383
0.6724	1.0925	3400	0.4404
0.6551	1.1247	3500	0.4372
0.6395	1.1568	3600	0.4273
0.6317	1.1889	3700	0.4217
0.6683	1.2211	3800	0.4218
0.6987	1.2532	3900	0.4136
0.7309	1.2853	4000	0.4151
0.6192	1.3175	4100	0.4092
0.6013	1.3496	4200	0.4109
0.6138	1.3817	4300	0.4069
0.6224	1.4139	4400	0.4001
0.6414	1.4460	4500	0.3942
0.6234	1.4781	4600	0.3949
0.6365	1.5103	4700	0.3914
0.6602	1.5424	4800	0.3908
0.6646	1.5746	4900	0.3916
0.6295	1.6067	5000	0.3832
0.6379	1.6388	5100	0.3828
0.6544	1.6710	5200	0.3836
0.6224	1.7031	5300	0.3905
0.5892	1.7352	5400	0.3785
0.5985	1.7674	5500	0.3794
0.5776	1.7995	5600	0.3751
0.5799	1.8316	5700	0.3745
0.6247	1.8638	5800	0.3742
0.5724	1.8959	5900	0.3716
0.627	1.9280	6000	0.3750
0.566	1.9602	6100	0.3720
0.5754	1.9923	6200	0.3706
0.3561	2.0244	6300	0.3685
0.4468	2.0566	6400	0.3663
0.392	2.0887	6500	0.3669
0.4049	2.1208	6600	0.3659
0.3967	2.1530	6700	0.3649
0.418	2.1851	6800	0.3662
0.4973	2.2172	6900	0.3630
0.3685	2.2494	7000	0.3627
0.3967	2.2815	7100	0.3618
0.3921	2.3136	7200	0.3580
0.3905	2.3458	7300	0.3595
0.3888	2.3779	7400	0.3574
0.4403	2.4100	7500	0.3571
0.4343	2.4422	7600	0.3574
0.4088	2.4743	7700	0.3549
0.3924	2.5064	7800	0.3560
0.4383	2.5386	7900	0.3540
0.3786	2.5707	8000	0.3525
0.3779	2.6028	8100	0.3518
0.4555	2.6350	8200	0.3519
0.3813	2.6671	8300	0.3497
0.375	2.6992	8400	0.3505
0.3889	2.7314	8500	0.3493
0.3524	2.7635	8600	0.3469
0.431	2.7956	8700	0.3480
0.4185	2.8278	8800	0.3471
0.378	2.8599	8900	0.3487
0.3978	2.8920	9000	0.3457
0.421	2.9242	9100	0.3461
0.3691	2.9563	9200	0.3443
0.4019	2.9884	9300	0.3446
0.2392	3.0206	9400	0.3473
0.2213	3.0527	9500	0.3443
0.2189	3.0848	9600	0.3437
0.2363	3.1170	9700	0.3442

Framework versions

PEFT 0.12.1.dev0
Transformers 4.44.2
Pytorch 2.3.0+cu121
Datasets 2.19.1
Tokenizers 0.19.1

allen0909
/

ROE_QA_TAIDE-LX-7B-Chat_Q100_80_20_V6

ROE_QA_TAIDE-LX-7B-Chat_Q100_80_20_V6

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for allen0909/ROE_QA_TAIDE-LX-7B-Chat_Q100_80_20_V6

Evaluation results