ModernBERT-large_v3_scratch

This model is a fine-tuned version of answerdotai/ModernBERT-large on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.1638
Accuracy: 0.9008
Precision Macro: 0.7724
Recall Macro: 0.7784
F1 Macro: 0.7752
F1 Weighted: 0.9013

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 16
eval_batch_size: 16
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 64
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 40
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy	Precision Macro	Recall Macro	F1 Macro	F1 Weighted
2.1409	1.0	179	0.4797	0.8155	0.7656	0.5858	0.5889	0.8001
1.8913	2.0	358	0.4433	0.8383	0.7709	0.6087	0.6125	0.8239
1.7772	3.0	537	0.3867	0.8629	0.7665	0.6576	0.6777	0.8535
1.3739	4.0	716	0.3396	0.8819	0.7647	0.6833	0.7033	0.8742
1.121	5.0	895	0.3194	0.8926	0.7935	0.7307	0.7533	0.8884
0.8297	6.0	1074	0.4077	0.8800	0.8479	0.6714	0.7001	0.8696
0.7174	7.0	1253	0.4211	0.8737	0.7463	0.7607	0.7510	0.8748
0.5598	8.0	1432	0.4373	0.8932	0.7960	0.6906	0.7144	0.8848
0.4317	9.0	1611	0.5494	0.8711	0.7343	0.7678	0.7460	0.8748
0.3809	10.0	1790	0.4896	0.8920	0.7838	0.7139	0.7367	0.8865
0.2739	11.0	1969	0.6534	0.8888	0.7627	0.7727	0.7671	0.8896
0.1934	12.0	2148	0.5885	0.9008	0.8028	0.7404	0.7633	0.8968
0.1742	13.0	2327	0.7146	0.8825	0.8056	0.7260	0.7535	0.8781
0.0825	14.0	2506	0.8700	0.8970	0.7733	0.7348	0.7497	0.8938
0.0688	15.0	2685	0.8066	0.8939	0.7636	0.7315	0.7448	0.8910
0.0796	16.0	2864	0.8853	0.8970	0.8123	0.7289	0.7564	0.8920
0.1044	17.0	3043	0.8411	0.8913	0.7614	0.7502	0.7554	0.8904
0.0893	18.0	3222	0.8432	0.8983	0.7941	0.7347	0.7564	0.8942
0.0274	19.0	3401	0.9003	0.8926	0.7772	0.7345	0.7515	0.8894
0.0161	20.0	3580	1.0964	0.8907	0.7648	0.7677	0.7659	0.8909
0.0066	21.0	3759	0.9782	0.8958	0.7639	0.7616	0.7627	0.8956
0.027	22.0	3938	1.0439	0.8913	0.7557	0.7800	0.7663	0.8935
0.0569	23.0	4117	0.9039	0.9033	0.8002	0.7709	0.7838	0.9016
0.0126	24.0	4296	0.9952	0.9002	0.7845	0.7529	0.7663	0.8979
0.0047	25.0	4475	0.9702	0.9052	0.7872	0.7849	0.7860	0.9051
0.0091	26.0	4654	1.0793	0.8970	0.7821	0.7575	0.7682	0.8953
0.0038	27.0	4833	1.0187	0.9027	0.7781	0.7714	0.7745	0.9022
0.0028	28.0	5012	1.0220	0.9015	0.7739	0.7746	0.7742	0.9015
0.0025	29.0	5191	1.0514	0.9015	0.7757	0.7746	0.7751	0.9014
0.0002	30.0	5370	1.0703	0.9027	0.7771	0.7796	0.7783	0.9029
0.0138	31.0	5549	1.0361	0.9021	0.7767	0.7790	0.7778	0.9023
0.0017	32.0	5728	1.0631	0.9027	0.7777	0.7836	0.7806	0.9032
0.0015	33.0	5907	1.0906	0.9008	0.7708	0.7782	0.7743	0.9014
0.0111	34.0	6086	1.1079	0.9002	0.7703	0.7778	0.7739	0.9008
0.0001	35.0	6265	1.1265	0.8996	0.7698	0.7774	0.7735	0.9002
0.0012	36.0	6444	1.1395	0.9008	0.7707	0.7783	0.7743	0.9014
0.0001	37.0	6623	1.1534	0.9015	0.7728	0.7788	0.7757	0.9019
0.0001	38.0	6802	1.1619	0.9008	0.7724	0.7784	0.7752	0.9013
0.0001	39.0	6981	1.1634	0.9015	0.7728	0.7788	0.7757	0.9019
0.0007	40.0	7160	1.1638	0.9008	0.7724	0.7784	0.7752	0.9013

Framework versions

Transformers 4.55.0
Pytorch 2.7.0+cu126
Datasets 4.0.0
Tokenizers 0.21.4

aiface
/

ModernBERT-large_v3_scratch

ModernBERT-large_v3_scratch

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for aiface/ModernBERT-large_v3_scratch

Evaluation results