arxiv-summarization-t5-base-2022-09-21
This model is a fine-tuned version of t5-base on the ccdv/arxiv-summarization dataset. It achieves the following results on the evaluation set:
- Loss: 1.8650
- Rouge1: 40.6781
- Rouge2: 14.7167
- Rougel: 26.6375
- Rougelsum: 35.5959
- Gen Len: 117.1969
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 3.0
Training results
Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum | Gen Len |
---|---|---|---|---|---|---|---|---|
2.3291 | 0.05 | 10000 | 2.1906 | 18.6571 | 7.1341 | 14.8347 | 16.9545 | 19.0 |
2.2454 | 0.1 | 20000 | 2.1549 | 18.5037 | 7.1908 | 14.7141 | 16.8233 | 18.9997 |
2.2107 | 0.15 | 30000 | 2.1013 | 18.7638 | 7.326 | 14.9437 | 17.072 | 19.0 |
2.1486 | 0.2 | 40000 | 2.0845 | 18.6879 | 7.2441 | 14.8835 | 16.983 | 19.0 |
2.158 | 0.25 | 50000 | 2.0699 | 18.8314 | 7.3712 | 15.0166 | 17.1215 | 19.0 |
2.1476 | 0.3 | 60000 | 2.0424 | 18.9783 | 7.4138 | 15.1121 | 17.2778 | 18.9981 |
2.1164 | 0.34 | 70000 | 2.0349 | 18.9257 | 7.4649 | 15.0335 | 17.1819 | 19.0 |
2.079 | 0.39 | 80000 | 2.0208 | 18.643 | 7.4096 | 14.8927 | 16.9786 | 18.9994 |
2.101 | 0.44 | 90000 | 2.0113 | 19.3881 | 7.7012 | 15.3981 | 17.6516 | 19.0 |
2.0576 | 0.49 | 100000 | 2.0022 | 18.9985 | 7.542 | 15.1157 | 17.2972 | 18.9992 |
2.0983 | 0.54 | 110000 | 1.9941 | 18.7691 | 7.4625 | 15.0256 | 17.1146 | 19.0 |
2.053 | 0.59 | 120000 | 1.9855 | 19.002 | 7.5602 | 15.1497 | 17.2963 | 19.0 |
2.0434 | 0.64 | 130000 | 1.9786 | 19.2385 | 7.6533 | 15.3094 | 17.5439 | 18.9994 |
2.0354 | 0.69 | 140000 | 1.9746 | 19.184 | 7.7307 | 15.2897 | 17.491 | 18.9992 |
2.0347 | 0.74 | 150000 | 1.9639 | 19.2408 | 7.693 | 15.3357 | 17.5297 | 19.0 |
2.0236 | 0.79 | 160000 | 1.9590 | 19.0781 | 7.6256 | 15.1932 | 17.3486 | 18.9998 |
2.0187 | 0.84 | 170000 | 1.9532 | 19.0343 | 7.6792 | 15.1884 | 17.3519 | 19.0 |
1.9939 | 0.89 | 180000 | 1.9485 | 18.8247 | 7.5005 | 15.0246 | 17.1485 | 18.9998 |
1.9961 | 0.94 | 190000 | 1.9504 | 19.0695 | 7.6559 | 15.2139 | 17.3814 | 19.0 |
2.0197 | 0.99 | 200000 | 1.9399 | 19.2821 | 7.6685 | 15.3029 | 17.5374 | 18.9988 |
1.9457 | 1.03 | 210000 | 1.9350 | 19.053 | 7.6502 | 15.2123 | 17.3793 | 19.0 |
1.9552 | 1.08 | 220000 | 1.9317 | 19.1878 | 7.7235 | 15.3272 | 17.5252 | 18.9998 |
1.9772 | 1.13 | 230000 | 1.9305 | 19.0855 | 7.6303 | 15.1943 | 17.3942 | 18.9997 |
1.9171 | 1.18 | 240000 | 1.9291 | 19.0711 | 7.6437 | 15.2175 | 17.3893 | 18.9995 |
1.9393 | 1.23 | 250000 | 1.9230 | 19.276 | 7.725 | 15.3826 | 17.586 | 18.9995 |
1.9295 | 1.28 | 260000 | 1.9197 | 19.2999 | 7.7958 | 15.3961 | 17.6056 | 18.9975 |
1.9725 | 1.33 | 270000 | 1.9173 | 19.2958 | 7.7121 | 15.3659 | 17.584 | 19.0 |
1.9668 | 1.38 | 280000 | 1.9129 | 19.089 | 7.6846 | 15.2395 | 17.3879 | 18.9998 |
1.941 | 1.43 | 290000 | 1.9132 | 19.2127 | 7.7336 | 15.311 | 17.4742 | 18.9995 |
1.9427 | 1.48 | 300000 | 1.9108 | 19.217 | 7.7591 | 15.334 | 17.53 | 18.9998 |
1.9521 | 1.53 | 310000 | 1.9041 | 19.1285 | 7.6736 | 15.2625 | 17.458 | 19.0 |
1.9352 | 1.58 | 320000 | 1.9041 | 19.1656 | 7.723 | 15.3035 | 17.4818 | 18.9991 |
1.9342 | 1.63 | 330000 | 1.9004 | 19.2573 | 7.7766 | 15.3558 | 17.5382 | 19.0 |
1.9631 | 1.68 | 340000 | 1.8978 | 19.236 | 7.7584 | 15.3408 | 17.4993 | 18.9998 |
1.8987 | 1.72 | 350000 | 1.8968 | 19.1716 | 7.7231 | 15.2836 | 17.4655 | 18.9997 |
1.9433 | 1.77 | 360000 | 1.8924 | 19.2644 | 7.8294 | 15.4018 | 17.5808 | 18.9998 |
1.9159 | 1.82 | 370000 | 1.8912 | 19.1833 | 7.8267 | 15.3175 | 17.4918 | 18.9995 |
1.9516 | 1.87 | 380000 | 1.8856 | 19.3077 | 7.7432 | 15.3723 | 17.6115 | 19.0 |
1.9218 | 1.92 | 390000 | 1.8880 | 19.2668 | 7.8231 | 15.3834 | 17.5701 | 18.9994 |
1.9159 | 1.97 | 400000 | 1.8860 | 19.2224 | 7.7903 | 15.3488 | 17.4992 | 18.9997 |
1.8741 | 2.02 | 410000 | 1.8854 | 19.2572 | 7.741 | 15.3405 | 17.5351 | 19.0 |
1.8668 | 2.07 | 420000 | 1.8854 | 19.3658 | 7.8593 | 15.4418 | 17.656 | 18.9995 |
1.8638 | 2.12 | 430000 | 1.8831 | 19.305 | 7.8218 | 15.3843 | 17.5861 | 18.9997 |
1.8334 | 2.17 | 440000 | 1.8817 | 19.3269 | 7.8249 | 15.4231 | 17.5958 | 18.9994 |
1.8893 | 2.22 | 450000 | 1.8803 | 19.2949 | 7.7885 | 15.3947 | 17.585 | 18.9997 |
1.8929 | 2.27 | 460000 | 1.8783 | 19.291 | 7.8346 | 15.428 | 17.5797 | 18.9997 |
1.861 | 2.32 | 470000 | 1.8766 | 19.4284 | 7.8832 | 15.4746 | 17.6946 | 18.9997 |
1.8719 | 2.37 | 480000 | 1.8751 | 19.1525 | 7.7641 | 15.3348 | 17.47 | 18.9998 |
1.8889 | 2.41 | 490000 | 1.8742 | 19.1743 | 7.768 | 15.3292 | 17.4665 | 18.9998 |
1.8834 | 2.46 | 500000 | 1.8723 | 19.3069 | 7.7935 | 15.3987 | 17.5913 | 18.9998 |
1.8564 | 2.51 | 510000 | 1.8695 | 19.3217 | 7.8292 | 15.4063 | 17.6081 | 19.0 |
1.8706 | 2.56 | 520000 | 1.8697 | 19.294 | 7.8217 | 15.3964 | 17.581 | 18.9998 |
1.883 | 2.61 | 530000 | 1.8703 | 19.2784 | 7.8634 | 15.404 | 17.5942 | 18.9995 |
1.8622 | 2.66 | 540000 | 1.8677 | 19.3165 | 7.8378 | 15.4259 | 17.6064 | 18.9988 |
1.8781 | 2.71 | 550000 | 1.8676 | 19.3237 | 7.7954 | 15.3995 | 17.6008 | 19.0 |
1.8793 | 2.76 | 560000 | 1.8685 | 19.2141 | 7.7605 | 15.3345 | 17.5268 | 18.9997 |
1.8795 | 2.81 | 570000 | 1.8675 | 19.2694 | 7.8082 | 15.3996 | 17.5831 | 19.0 |
1.8425 | 2.86 | 580000 | 1.8659 | 19.2886 | 7.7987 | 15.4005 | 17.5859 | 18.9997 |
1.8605 | 2.91 | 590000 | 1.8650 | 19.2778 | 7.7934 | 15.3931 | 17.5809 | 18.9997 |
1.8448 | 2.96 | 600000 | 1.8655 | 19.2884 | 7.8087 | 15.4025 | 17.5856 | 19.0 |
Framework versions
- Transformers 4.23.0.dev0
- Pytorch 1.12.0
- Datasets 2.5.1
- Tokenizers 0.13.0
- Downloads last month
- 17
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.