File size: 8,149 Bytes
0d63360
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6142481
0d63360
 
 
 
 
 
 
 
 
 
 
 
 
 
7e59bd7
 
 
 
 
 
 
0d63360
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
---
tags:
- generated_from_trainer
datasets:
- xsum
metrics:
- rouge
model-index:
- name: t5-small-finetuned_xsum
  results:
  - task:
      name: Sequence-to-sequence Language Modeling
      type: text2text-generation
    dataset:
      name: xsum
      type: xsum
      args: default
    metrics:
    - name: Rouge1
      type: rouge
      value: 33.1688
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# t5-small-finetuned_xsum

This model is a fine-tuned version of [t5-small](https://huggingface.co/t5-small) on the xsum dataset.
It achieves the following results on the evaluation set:
- Loss: 2.0881
- Rouge1: 33.1688
- Rouge2: 11.831
- Rougel: 26.796
- Rougelsum: 26.7931
- Gen Len: 18.7957

## Model description

More information needed

## Intended uses & limitations

The Extreme Summarization (XSum) dataset is a dataset for evaluation of abstractive single-document summarization systems. The goal is to create a short, one-sentence new summary answering the question “What is the article about?”. The dataset consists of 226,711 news articles accompanied with a one-sentence summary. The articles are collected from BBC articles (2010 to 2017) and cover a wide variety of domains (e.g., News, Politics, Sports, Weather, Business, Technology, Science, Health, Family, Education, Entertainment and Arts). The official random split contains 204,045 (90%), 11,332 (5%) and 11,334 (5) documents in training, validation and test sets, respectively.

T5, or Text-to-Text Transfer Transformer, is a Transformer based architecture that uses a text-to-text approach. Every task – including translation, question answering, and classification – is cast as feeding the model text as input and training it to generate some target text. This allows for the use of the same model, loss function, hyperparameters, etc. across our diverse set of tasks. The changes compared to BERT include:

-   adding a causal decoder to the bidirectional architecture.
-   replacing the fill-in-the-blank cloze task with a mix of alternative pre-training tasks.
 

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 50

### Training results

| Training Loss | Epoch | Step   | Validation Loss | Rouge1  | Rouge2  | Rougel  | Rougelsum | Gen Len |
|:-------------:|:-----:|:------:|:---------------:|:-------:|:-------:|:-------:|:---------:|:-------:|
| 2.3789        | 1.0   | 12753  | 2.2274          | 31.3107 | 10.1407 | 25.0522 | 25.0423   | 18.8193 |
| 2.3565        | 2.0   | 25506  | 2.2159          | 31.5958 | 10.4022 | 25.3267 | 25.3228   | 18.7992 |
| 2.3504        | 3.0   | 38259  | 2.2037          | 31.8838 | 10.5974 | 25.5777 | 25.5786   | 18.7928 |
| 2.3345        | 4.0   | 51012  | 2.1956          | 31.8402 | 10.5656 | 25.5027 | 25.4994   | 18.8163 |
| 2.3175        | 5.0   | 63765  | 2.1868          | 31.9412 | 10.7187 | 25.6688 | 25.6719   | 18.7902 |
| 2.3177        | 6.0   | 76518  | 2.1805          | 31.9831 | 10.7074 | 25.6869 | 25.6863   | 18.8099 |
| 2.3027        | 7.0   | 89271  | 2.1734          | 32.0714 | 10.7714 | 25.7193 | 25.7141   | 18.7961 |
| 2.289         | 8.0   | 102024 | 2.1667          | 32.1598 | 10.883  | 25.8608 | 25.8605   | 18.8144 |
| 2.2875        | 9.0   | 114777 | 2.1622          | 32.0933 | 10.9046 | 25.8399 | 25.8329   | 18.8009 |
| 2.2796        | 10.0  | 127530 | 2.1547          | 32.391  | 11.112  | 26.0903 | 26.0931   | 18.7992 |
| 2.286         | 11.0  | 140283 | 2.1504          | 32.4479 | 11.1077 | 26.1274 | 26.1267   | 18.7975 |
| 2.2542        | 12.0  | 153036 | 2.1464          | 32.4059 | 11.1583 | 26.1111 | 26.1047   | 18.8042 |
| 2.2526        | 13.0  | 165789 | 2.1416          | 32.425  | 11.2178 | 26.1854 | 26.1795   | 18.7865 |
| 2.2374        | 14.0  | 178542 | 2.1372          | 32.299  | 11.1047 | 26.0495 | 26.0434   | 18.8016 |
| 2.2295        | 15.0  | 191295 | 2.1331          | 32.4283 | 11.2233 | 26.135  | 26.128    | 18.8004 |
| 2.2213        | 16.0  | 204048 | 2.1306          | 32.4948 | 11.2885 | 26.2607 | 26.2551   | 18.7854 |
| 2.1985        | 17.0  | 216801 | 2.1282          | 32.5872 | 11.3243 | 26.31   | 26.3062   | 18.7986 |
| 2.1993        | 18.0  | 229554 | 2.1245          | 32.6278 | 11.3196 | 26.3142 | 26.315    | 18.7809 |
| 2.2044        | 19.0  | 242307 | 2.1223          | 32.676  | 11.3871 | 26.356  | 26.3426   | 18.8007 |
| 2.2035        | 20.0  | 255060 | 2.1188          | 32.8736 | 11.4703 | 26.4901 | 26.4899   | 18.7863 |
| 2.1909        | 21.0  | 267813 | 2.1167          | 32.8288 | 11.4666 | 26.4992 | 26.4877   | 18.796  |
| 2.1835        | 22.0  | 280566 | 2.1141          | 32.9183 | 11.5267 | 26.5302 | 26.5338   | 18.8034 |
| 2.1845        | 23.0  | 293319 | 2.1127          | 32.7907 | 11.444  | 26.4614 | 26.459    | 18.8054 |
| 2.1725        | 24.0  | 306072 | 2.1109          | 32.8191 | 11.4973 | 26.5109 | 26.5012   | 18.7818 |
| 2.1805        | 25.0  | 318825 | 2.1082          | 32.7333 | 11.4325 | 26.4093 | 26.4028   | 18.7986 |
| 2.1661        | 26.0  | 331578 | 2.1063          | 32.8703 | 11.5443 | 26.5105 | 26.5101   | 18.7962 |
| 2.1606        | 27.0  | 344331 | 2.1048          | 32.884  | 11.558  | 26.5504 | 26.5465   | 18.7939 |
| 2.1508        | 28.0  | 357084 | 2.1032          | 32.9699 | 11.6036 | 26.6348 | 26.6266   | 18.7983 |
| 2.1479        | 29.0  | 369837 | 2.1019          | 32.8247 | 11.5812 | 26.5659 | 26.5595   | 18.7992 |
| 2.1363        | 30.0  | 382590 | 2.1019          | 32.9982 | 11.6801 | 26.6552 | 26.6497   | 18.797  |
| 2.1513        | 31.0  | 395343 | 2.0996          | 32.9903 | 11.6632 | 26.6579 | 26.6521   | 18.7911 |
| 2.1389        | 32.0  | 408096 | 2.0981          | 33.0195 | 11.7282 | 26.683  | 26.6757   | 18.7824 |
| 2.1421        | 33.0  | 420849 | 2.0968          | 32.9967 | 11.6949 | 26.6734 | 26.662    | 18.796  |
| 2.1545        | 34.0  | 433602 | 2.0954          | 33.0943 | 11.7329 | 26.7367 | 26.7295   | 18.7871 |
| 2.1459        | 35.0  | 446355 | 2.0949          | 33.1534 | 11.816  | 26.775  | 26.7716   | 18.7914 |
| 2.1364        | 36.0  | 459108 | 2.0933          | 33.0686 | 11.7418 | 26.7147 | 26.7066   | 18.7901 |
| 2.1194        | 37.0  | 471861 | 2.0928          | 33.1276 | 11.8268 | 26.7684 | 26.7626   | 18.802  |
| 2.1292        | 38.0  | 484614 | 2.0925          | 33.0462 | 11.7669 | 26.6798 | 26.6783   | 18.802  |
| 2.1317        | 39.0  | 497367 | 2.0913          | 33.1402 | 11.7889 | 26.7822 | 26.7824   | 18.7962 |
| 2.1176        | 40.0  | 510120 | 2.0907          | 33.1488 | 11.8001 | 26.7749 | 26.7615   | 18.7992 |
| 2.1318        | 41.0  | 522873 | 2.0899          | 33.0963 | 11.8162 | 26.7433 | 26.7325   | 18.7924 |
| 2.1052        | 42.0  | 535626 | 2.0899          | 33.0764 | 11.7624 | 26.7294 | 26.7238   | 18.7911 |
| 2.1267        | 43.0  | 548379 | 2.0891          | 33.1292 | 11.8029 | 26.7684 | 26.7693   | 18.7885 |
| 2.1211        | 44.0  | 561132 | 2.0894          | 33.09   | 11.7676 | 26.7418 | 26.7394   | 18.7853 |
| 2.1243        | 45.0  | 573885 | 2.0880          | 33.1449 | 11.7899 | 26.7725 | 26.7634   | 18.7946 |
| 2.0947        | 46.0  | 586638 | 2.0885          | 33.1548 | 11.8108 | 26.808  | 26.8003   | 18.7917 |
| 2.1246        | 47.0  | 599391 | 2.0881          | 33.148  | 11.8208 | 26.803  | 26.7961   | 18.7913 |
| 2.127         | 48.0  | 612144 | 2.0877          | 33.1935 | 11.8399 | 26.8209 | 26.8142   | 18.7925 |
| 2.1231        | 49.0  | 624897 | 2.0878          | 33.158  | 11.8159 | 26.7898 | 26.785    | 18.794  |
| 2.1296        | 50.0  | 637650 | 2.0881          | 33.1688 | 11.831  | 26.796  | 26.7931   | 18.7957 |


### Framework versions

- Transformers 4.12.0.dev0
- Pytorch 1.10.0+cu113
- Datasets 1.14.0
- Tokenizers 0.10.3