ModernBERT-large_v3_scratch

This model is a fine-tuned version of answerdotai/ModernBERT-large on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1638
  • Accuracy: 0.9008
  • Precision Macro: 0.7724
  • Recall Macro: 0.7784
  • F1 Macro: 0.7752
  • F1 Weighted: 0.9013

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 64
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 40
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Accuracy Precision Macro Recall Macro F1 Macro F1 Weighted
2.1409 1.0 179 0.4797 0.8155 0.7656 0.5858 0.5889 0.8001
1.8913 2.0 358 0.4433 0.8383 0.7709 0.6087 0.6125 0.8239
1.7772 3.0 537 0.3867 0.8629 0.7665 0.6576 0.6777 0.8535
1.3739 4.0 716 0.3396 0.8819 0.7647 0.6833 0.7033 0.8742
1.121 5.0 895 0.3194 0.8926 0.7935 0.7307 0.7533 0.8884
0.8297 6.0 1074 0.4077 0.8800 0.8479 0.6714 0.7001 0.8696
0.7174 7.0 1253 0.4211 0.8737 0.7463 0.7607 0.7510 0.8748
0.5598 8.0 1432 0.4373 0.8932 0.7960 0.6906 0.7144 0.8848
0.4317 9.0 1611 0.5494 0.8711 0.7343 0.7678 0.7460 0.8748
0.3809 10.0 1790 0.4896 0.8920 0.7838 0.7139 0.7367 0.8865
0.2739 11.0 1969 0.6534 0.8888 0.7627 0.7727 0.7671 0.8896
0.1934 12.0 2148 0.5885 0.9008 0.8028 0.7404 0.7633 0.8968
0.1742 13.0 2327 0.7146 0.8825 0.8056 0.7260 0.7535 0.8781
0.0825 14.0 2506 0.8700 0.8970 0.7733 0.7348 0.7497 0.8938
0.0688 15.0 2685 0.8066 0.8939 0.7636 0.7315 0.7448 0.8910
0.0796 16.0 2864 0.8853 0.8970 0.8123 0.7289 0.7564 0.8920
0.1044 17.0 3043 0.8411 0.8913 0.7614 0.7502 0.7554 0.8904
0.0893 18.0 3222 0.8432 0.8983 0.7941 0.7347 0.7564 0.8942
0.0274 19.0 3401 0.9003 0.8926 0.7772 0.7345 0.7515 0.8894
0.0161 20.0 3580 1.0964 0.8907 0.7648 0.7677 0.7659 0.8909
0.0066 21.0 3759 0.9782 0.8958 0.7639 0.7616 0.7627 0.8956
0.027 22.0 3938 1.0439 0.8913 0.7557 0.7800 0.7663 0.8935
0.0569 23.0 4117 0.9039 0.9033 0.8002 0.7709 0.7838 0.9016
0.0126 24.0 4296 0.9952 0.9002 0.7845 0.7529 0.7663 0.8979
0.0047 25.0 4475 0.9702 0.9052 0.7872 0.7849 0.7860 0.9051
0.0091 26.0 4654 1.0793 0.8970 0.7821 0.7575 0.7682 0.8953
0.0038 27.0 4833 1.0187 0.9027 0.7781 0.7714 0.7745 0.9022
0.0028 28.0 5012 1.0220 0.9015 0.7739 0.7746 0.7742 0.9015
0.0025 29.0 5191 1.0514 0.9015 0.7757 0.7746 0.7751 0.9014
0.0002 30.0 5370 1.0703 0.9027 0.7771 0.7796 0.7783 0.9029
0.0138 31.0 5549 1.0361 0.9021 0.7767 0.7790 0.7778 0.9023
0.0017 32.0 5728 1.0631 0.9027 0.7777 0.7836 0.7806 0.9032
0.0015 33.0 5907 1.0906 0.9008 0.7708 0.7782 0.7743 0.9014
0.0111 34.0 6086 1.1079 0.9002 0.7703 0.7778 0.7739 0.9008
0.0001 35.0 6265 1.1265 0.8996 0.7698 0.7774 0.7735 0.9002
0.0012 36.0 6444 1.1395 0.9008 0.7707 0.7783 0.7743 0.9014
0.0001 37.0 6623 1.1534 0.9015 0.7728 0.7788 0.7757 0.9019
0.0001 38.0 6802 1.1619 0.9008 0.7724 0.7784 0.7752 0.9013
0.0001 39.0 6981 1.1634 0.9015 0.7728 0.7788 0.7757 0.9019
0.0007 40.0 7160 1.1638 0.9008 0.7724 0.7784 0.7752 0.9013

Framework versions

  • Transformers 4.55.0
  • Pytorch 2.7.0+cu126
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
15
Safetensors
Model size
396M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for aiface/ModernBERT-large_v3_scratch

Finetuned
(181)
this model