agemagician commited on
Commit
2b1b504
1 Parent(s): 96ff991

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +147 -0
README.md CHANGED
@@ -1,3 +1,150 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ language:
4
+ - multilingual
5
+ - af
6
+ - am
7
+ - ar
8
+ - az
9
+ - be
10
+ - bg
11
+ - bn
12
+ - ca
13
+ - ceb
14
+ - co
15
+ - cs
16
+ - cy
17
+ - da
18
+ - de
19
+ - el
20
+ - en
21
+ - eo
22
+ - es
23
+ - et
24
+ - eu
25
+ - fa
26
+ - fi
27
+ - fil
28
+ - fr
29
+ - fy
30
+ - ga
31
+ - gd
32
+ - gl
33
+ - gu
34
+ - ha
35
+ - haw
36
+ - hi
37
+ - hmn
38
+ - ht
39
+ - hu
40
+ - hy
41
+ - ig
42
+ - is
43
+ - it
44
+ - iw
45
+ - ja
46
+ - jv
47
+ - ka
48
+ - kk
49
+ - km
50
+ - kn
51
+ - ko
52
+ - ku
53
+ - ky
54
+ - la
55
+ - lb
56
+ - lo
57
+ - lt
58
+ - lv
59
+ - mg
60
+ - mi
61
+ - mk
62
+ - ml
63
+ - mn
64
+ - mr
65
+ - ms
66
+ - mt
67
+ - my
68
+ - ne
69
+ - nl
70
+ - no
71
+ - ny
72
+ - pa
73
+ - pl
74
+ - ps
75
+ - pt
76
+ - ro
77
+ - ru
78
+ - sd
79
+ - si
80
+ - sk
81
+ - sl
82
+ - sm
83
+ - sn
84
+ - so
85
+ - sq
86
+ - sr
87
+ - st
88
+ - su
89
+ - sv
90
+ - sw
91
+ - ta
92
+ - te
93
+ - tg
94
+ - th
95
+ - tr
96
+ - uk
97
+ - und
98
+ - ur
99
+ - uz
100
+ - vi
101
+ - xh
102
+ - yi
103
+ - yo
104
+ - zh
105
+ - zu
106
+ datasets:
107
+ - mc4
108
  ---
109
+
110
+ # MLongT5 (transient-global attention, base-sized model)
111
+
112
+ MLongT5 model pre-trained on Multi-language corpus. The model was introduced in the paper [mLongT5: A Multilingual and Efficient Text-To-Text Transformer for Longer Sequences](https://arxiv.org/pdf/2305.11129.pdf) by Uthus et al. and first released in [the LongT5 repository](https://github.com/google-research/longt5). All the model architecture and configuration can be found in [Flaxformer repository](https://github.com/google/flaxformer) which uses another Google research project repository [T5x](https://github.com/google-research/t5x).
113
+
114
+ Disclaimer: The team releasing MLongT5 did not write a model card for this model so this model card has been written by Ahmed Elnaggar.
115
+
116
+ ## Model description
117
+ MLongT5 model is an encoder-decoder transformer pre-trained in a text-to-text denoising generative setting ([Pegasus-like generation pre-training](https://arxiv.org/pdf/1912.08777.pdf)). MLongT5 model is an extension of [LongT5 model](https://arxiv.org/abs/2112.07916), and it enables using one of the two different efficient attention mechanisms - (1) Local attention, or (2) Transient-Global attention. The usage of attention sparsity patterns allows the model to efficiently handle input sequence.
118
+
119
+ MLongT5 is particularly effective when fine-tuned for text generation (summarization, question answering) which requires handling long input sequences (up to 16,384 tokens).
120
+
121
+ ## Intended uses & limitations
122
+
123
+ The model is mostly meant to be fine-tuned on a supervised dataset. See the [model hub](https://huggingface.co/models?search=mlongt5) to look for fine-tuned versions on a task that interests you.
124
+
125
+ ### How to use
126
+
127
+ ```python
128
+ from transformers import T5Tokenizer, LongT5Model
129
+
130
+ tokenizer = T5Tokenizer.from_pretrained("agemagician/mlong-t5-tglobal-base")
131
+ model = LongT5Model.from_pretrained("agemagician/mlong-t5-tglobal-base")
132
+
133
+ inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
134
+ outputs = model(**inputs)
135
+
136
+ last_hidden_states = outputs.last_hidden_state
137
+ ```
138
+
139
+ ### BibTeX entry and citation info
140
+
141
+ ```bibtex
142
+ @misc{uthus2023mlongt5,
143
+ title={mLongT5: A Multilingual and Efficient Text-To-Text Transformer for Longer Sequences},
144
+ author={David Uthus and Santiago Ontañón and Joshua Ainslie and Mandy Guo},
145
+ year={2023},
146
+ eprint={2305.11129},
147
+ archivePrefix={arXiv},
148
+ primaryClass={cs.CL}
149
+ }
150
+ ```