bengali_qa_microsoft_model

This model is a fine-tuned version of microsoft/mdeberta-v3-base on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 8
eval_batch_size: 8
seed: 3407
gradient_accumulation_steps: 8
total_train_batch_size: 64
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
training_steps: 50
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss	Exact Match	F1 Score
6.6257	0.0053	1	6.6659	0.1504	23.7019
6.6148	0.0107	2	6.5377	0.7519	36.0663
6.5217	0.0160	3	6.3226	1.9549	43.4507
6.1229	0.0214	4	5.6034	3.7594	56.8030
5.7015	0.0267	5	5.3006	5.7143	56.8976
5.3474	0.0321	6	5.1196	10.9023	56.6826
5.2375	0.0374	7	4.8802	16.9925	57.8356
5.067	0.0428	8	4.6180	19.8496	57.7216
4.7947	0.0481	9	4.3354	22.7820	58.3592
4.4271	0.0534	10	4.0533	26.0902	58.7345
4.3096	0.0588	11	3.7947	31.5789	60.0493
4.1219	0.0641	12	3.5726	35.2632	61.0031
3.9806	0.0695	13	3.4024	38.4962	62.6457
3.653	0.0748	14	3.2782	41.7293	63.9058
3.474	0.0802	15	3.1569	43.9850	65.0367
3.3639	0.0855	16	3.0200	45.4887	64.9433
3.2411	0.0908	17	2.8749	46.4662	64.8581
3.0711	0.0962	18	2.7349	47.8195	65.1471
3.0726	0.1015	19	2.6103	48.4962	64.6205
2.9879	0.1069	20	2.4996	49.2481	64.7963
2.9237	0.1122	21	2.3950	50.6767	65.0445