agemagician
/

mlong-t5-tglobal-base

Text2Text Generation

Model card Files Files and versions Community

agemagician commited on May 19, 2023

Commit

2b1b504

·

1 Parent(s): 96ff991

Update README.md

Files changed (1) hide show

README.md +147 -0

README.md CHANGED Viewed

@@ -1,3 +1,150 @@
 ---
 license: apache-2.0
 ---

 ---
 license: apache-2.0
+language:
+- multilingual
+- af
+- am
+- ar
+- az
+- be
+- bg
+- bn
+- ca
+- ceb
+- co
+- cs
+- cy
+- da
+- de
+- el
+- en
+- eo
+- es
+- et
+- eu
+- fa
+- fi
+- fil
+- fr
+- fy
+- ga
+- gd
+- gl
+- gu
+- ha
+- haw
+- hi
+- hmn
+- ht
+- hu
+- hy
+- ig
+- is
+- it
+- iw
+- ja
+- jv
+- ka
+- kk
+- km
+- kn
+- ko
+- ku
+- ky
+- la
+- lb
+- lo
+- lt
+- lv
+- mg
+- mi
+- mk
+- ml
+- mn
+- mr
+- ms
+- mt
+- my
+- ne
+- nl
+- no
+- ny
+- pa
+- pl
+- ps
+- pt
+- ro
+- ru
+- sd
+- si
+- sk
+- sl
+- sm
+- sn
+- so
+- sq
+- sr
+- st
+- su
+- sv
+- sw
+- ta
+- te
+- tg
+- th
+- tr
+- uk
+- und
+- ur
+- uz
+- vi
+- xh
+- yi
+- yo
+- zh
+- zu
+datasets:
+- mc4
 ---
+# MLongT5 (transient-global attention, base-sized model)
+MLongT5 model pre-trained on Multi-language corpus. The model was introduced in the paper [mLongT5: A Multilingual and Efficient Text-To-Text Transformer for Longer Sequences](https://arxiv.org/pdf/2305.11129.pdf) by Uthus et al. and first released in [the LongT5 repository](https://github.com/google-research/longt5). All the model architecture and configuration can be found in [Flaxformer repository](https://github.com/google/flaxformer) which uses another Google research project repository [T5x](https://github.com/google-research/t5x).
+Disclaimer: The team releasing MLongT5 did not write a model card for this model so this model card has been written by Ahmed Elnaggar.
+## Model description
+MLongT5 model is an encoder-decoder transformer pre-trained in a text-to-text denoising generative setting ([Pegasus-like generation pre-training](https://arxiv.org/pdf/1912.08777.pdf)). MLongT5 model is an extension of [LongT5 model](https://arxiv.org/abs/2112.07916), and it enables using one of the two different efficient attention mechanisms - (1) Local attention, or (2) Transient-Global attention. The usage of attention sparsity patterns allows the model to efficiently handle input sequence.
+MLongT5 is particularly effective when fine-tuned for text generation (summarization, question answering) which requires handling long input sequences (up to 16,384 tokens).
+## Intended uses & limitations
+The model is mostly meant to be fine-tuned on a supervised dataset. See the [model hub](https://huggingface.co/models?search=mlongt5) to look for fine-tuned versions on a task that interests you.
+### How to use
+```python
+from transformers import T5Tokenizer, LongT5Model
+tokenizer = T5Tokenizer.from_pretrained("agemagician/mlong-t5-tglobal-base")
+model = LongT5Model.from_pretrained("agemagician/mlong-t5-tglobal-base")
+inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
+outputs = model(**inputs)
+last_hidden_states = outputs.last_hidden_state
+```
+### BibTeX entry and citation info
+```bibtex
+@misc{uthus2023mlongt5,
+      title={mLongT5: A Multilingual and Efficient Text-To-Text Transformer for Longer Sequences},
+      author={David Uthus and Santiago Ontañón and Joshua Ainslie and Mandy Guo},
+      year={2023},
+      eprint={2305.11129},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL}
+}
+```