Mediocre-Judge's picture
End of training
731fc3e verified
metadata
library_name: transformers
license: mit
base_model: FacebookAI/roberta-base
tags:
  - generated_from_trainer
model-index:
  - name: bengali_qa_model_AGGRO_roberta-base
    results: []

bengali_qa_model_AGGRO_roberta-base

This model is a fine-tuned version of FacebookAI/roberta-base on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1743
  • Exact Match: 96.2857
  • F1 Score: 97.2732

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 3407
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 64
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • training_steps: 100

Training results

Training Loss Epoch Step Validation Loss Exact Match F1 Score
5.9958 0.0053 1 6.0009 0.0 14.2511
6.0212 0.0107 2 5.9880 0.0 14.2134
5.9773 0.0160 3 5.9623 0.0 14.2530
5.9605 0.0214 4 5.9240 0.0 14.7064
5.922 0.0267 5 5.8733 0.0 14.2565
5.8831 0.0321 6 5.8088 0.0 14.4523
5.8306 0.0374 7 5.7290 0.0 16.0421
5.7652 0.0428 8 5.6310 5.6391 34.9089
5.6731 0.0481 9 5.5117 17.1429 51.1515
5.5294 0.0535 10 5.3626 31.5789 59.2927
5.4532 0.0588 11 5.1749 43.9098 64.9926
5.2211 0.0641 12 4.9679 49.9248 69.2611
5.0949 0.0695 13 4.7511 53.0075 71.4743
4.8805 0.0748 14 4.5299 55.1880 72.9875
4.651 0.0802 15 4.2956 57.8195 74.4621
4.4113 0.0855 16 4.0291 60.0752 75.8558
4.2577 0.0909 17 3.7318 62.2556 76.3222
4.0153 0.0962 18 3.4295 63.6842 77.0091
3.6706 0.1016 19 3.1520 61.1278 76.5172
3.5342 0.1069 20 2.8936 52.7820 73.9467
3.2798 0.1123 21 2.6488 46.1654 71.9439
3.1167 0.1176 22 2.4104 47.1429 71.3425
2.6525 0.1230 23 2.1745 52.2556 71.5805
2.497 0.1283 24 1.9401 58.4211 72.4925
2.3689 0.1336 25 1.7185 60.2256 72.6269
2.0833 0.1390 26 1.5153 60.2256 72.8689
1.8679 0.1443 27 1.3483 61.2782 74.1167
1.7384 0.1497 28 1.2158 64.5865 76.8679
1.47 0.1550 29 1.1047 67.2932 78.9366
1.397 0.1604 30 1.0146 70.5263 81.1621
1.2822 0.1657 31 0.9423 73.3083 83.6952
0.9928 0.1711 32 0.8767 75.2632 85.0494
0.7992 0.1764 33 0.8122 77.8947 86.9631
0.897 0.1818 34 0.7455 80.6767 89.1149
0.8307 0.1871 35 0.6772 83.3835 91.3579
0.8469 0.1924 36 0.6040 86.2406 93.9573
0.6431 0.1978 37 0.5333 86.8421 94.6721
0.8116 0.2031 38 0.4519 87.6692 95.6610
0.6474 0.2085 39 0.3950 87.6692 95.7701
0.6241 0.2138 40 0.3626 87.6692 95.9608
0.6299 0.2192 41 0.3394 87.7444 95.9051
0.2552 0.2245 42 0.3260 87.7444 95.9297
0.3891 0.2299 43 0.3234 87.6692 95.8513
0.3552 0.2352 44 0.3129 87.9699 95.6941
0.2864 0.2406 45 0.2998 88.0451 95.3209
0.4347 0.2459 46 0.2798 89.4737 95.0850
0.2938 0.2513 47 0.2587 90.3759 94.9503
0.2821 0.2566 48 0.2445 90.9023 95.1257
0.3619 0.2619 49 0.2320 91.3534 94.9029
0.4783 0.2673 50 0.2176 91.7293 95.0914
0.1834 0.2726 51 0.2116 91.8797 95.1105
0.3803 0.2780 52 0.2054 92.1805 94.9606
0.2242 0.2833 53 0.2052 92.3308 94.9873
0.1771 0.2887 54 0.2033 92.4812 95.3112
0.3369 0.2940 55 0.1978 93.0827 95.7403
0.2277 0.2994 56 0.1936 93.7594 96.3688
0.2296 0.3047 57 0.1947 93.8346 96.6249
0.2281 0.3101 58 0.1939 93.9098 96.8548
0.1287 0.3154 59 0.1905 94.4361 96.9572
0.191 0.3207 60 0.1865 95.0376 97.2070
0.1435 0.3261 61 0.1868 94.9624 97.1697
0.1648 0.3314 62 0.1900 94.5865 96.8381
0.1668 0.3368 63 0.1889 94.9624 96.9874
0.1634 0.3421 64 0.1850 95.3383 97.0437
0.2374 0.3475 65 0.1797 95.7895 97.4394
0.1382 0.3528 66 0.1768 96.3910 97.6053
0.2683 0.3582 67 0.1736 96.5414 97.6811
0.1452 0.3635 68 0.1720 96.3910 97.4557
0.1796 0.3689 69 0.1704 96.4662 97.4221
0.0786 0.3742 70 0.1686 96.5414 97.4985
0.2424 0.3796 71 0.1669 96.6917 97.5989
0.089 0.3849 72 0.1656 96.7669 97.6242
0.2073 0.3902 73 0.1654 96.7669 97.6238
0.1657 0.3956 74 0.1663 96.5414 97.4733
0.0868 0.4009 75 0.1677 96.3158 97.4407
0.1281 0.4063 76 0.1697 96.0150 97.1804
0.1729 0.4116 77 0.1705 95.8647 97.1085
0.1871 0.4170 78 0.1703 96.0150 97.2090
0.1735 0.4223 79 0.1695 96.0150 97.2090
0.1239 0.4277 80 0.1700 95.9398 97.2144
0.0944 0.4330 81 0.1696 95.8647 97.1392
0.2494 0.4384 82 0.1696 96.0150 97.2896
0.0746 0.4437 83 0.1689 95.8647 97.1392
0.1175 0.4490 84 0.1680 96.0150 97.2090
0.2597 0.4544 85 0.1665 96.0902 97.2082
0.1567 0.4597 86 0.1656 96.0150 97.1330
0.0738 0.4651 87 0.1647 96.1654 97.2834
0.1551 0.4704 88 0.1641 96.2406 97.3586
0.0965 0.4758 89 0.1634 96.0902 97.2833
0.1466 0.4811 90 0.1625 96.1654 97.3085
0.115 0.4865 91 0.1619 96.6165 97.6096
0.1848 0.4918 92 0.1613 96.6165 97.5345
0.0955 0.4972 93 0.1607 96.6165 97.5405
0.1348 0.5025 94 0.1603 96.6165 97.5405
0.1316 0.5079 95 0.1600 96.6165 97.4655
0.1544 0.5132 96 0.1598 96.6917 97.5407
0.1746 0.5185 97 0.1596 96.6917 97.5407
0.0762 0.5239 98 0.1596 96.5414 97.3903
0.1685 0.5292 99 0.1595 96.5414 97.3903
0.1243 0.5346 100 0.1595 96.6917 97.5407

Framework versions

  • Transformers 4.46.3
  • Pytorch 2.4.0
  • Datasets 3.1.0
  • Tokenizers 0.20.3