--- license: apache-2.0 language: - multilingual - af - am - ar - az - be - bg - bn - ca - ceb - co - cs - cy - da - de - el - en - eo - es - et - eu - fa - fi - fil - fr - fy - ga - gd - gl - gu - ha - haw - hi - hmn - ht - hu - hy - ig - is - it - iw - ja - jv - ka - kk - km - kn - ko - ku - ky - la - lb - lo - lt - lv - mg - mi - mk - ml - mn - mr - ms - mt - my - ne - nl - no - ny - pa - pl - ps - pt - ro - ru - sd - si - sk - sl - sm - sn - so - sq - sr - st - su - sv - sw - ta - te - tg - th - tr - uk - und - ur - uz - vi - xh - yi - yo - zh - zu datasets: - mc4 --- # MLongT5 (transient-global attention, base-sized model) MLongT5 model pre-trained on Multi-language corpus. The model was introduced in the paper [mLongT5: A Multilingual and Efficient Text-To-Text Transformer for Longer Sequences](https://arxiv.org/pdf/2305.11129.pdf) by Uthus et al. and first released in [the LongT5 repository](https://github.com/google-research/longt5). All the model architecture and configuration can be found in [Flaxformer repository](https://github.com/google/flaxformer) which uses another Google research project repository [T5x](https://github.com/google-research/t5x). Disclaimer: The team releasing MLongT5 did not write a model card for this model so this model card has been written by Ahmed Elnaggar. ## Model description MLongT5 model is an encoder-decoder transformer pre-trained in a text-to-text denoising generative setting ([Pegasus-like generation pre-training](https://arxiv.org/pdf/1912.08777.pdf)). MLongT5 model is an extension of [LongT5 model](https://arxiv.org/abs/2112.07916), and it enables using one of the two different efficient attention mechanisms - (1) Local attention, or (2) Transient-Global attention. The usage of attention sparsity patterns allows the model to efficiently handle input sequence. MLongT5 is particularly effective when fine-tuned for text generation (summarization, question answering) which requires handling long input sequences (up to 16,384 tokens). ## Intended uses & limitations The model is mostly meant to be fine-tuned on a supervised dataset. See the [model hub](https://huggingface.co/models?search=mlongt5) to look for fine-tuned versions on a task that interests you. ### How to use ```python from transformers import T5Tokenizer, LongT5Model tokenizer = T5Tokenizer.from_pretrained("agemagician/mlong-t5-tglobal-base") model = LongT5Model.from_pretrained("agemagician/mlong-t5-tglobal-base") inputs = tokenizer("Hello, my dog is cute", return_tensors="pt") outputs = model(**inputs) last_hidden_states = outputs.last_hidden_state ``` ### BibTeX entry and citation info ```bibtex @misc{uthus2023mlongt5, title={mLongT5: A Multilingual and Efficient Text-To-Text Transformer for Longer Sequences}, author={David Uthus and Santiago Ontañón and Joshua Ainslie and Mandy Guo}, year={2023}, eprint={2305.11129}, archivePrefix={arXiv}, primaryClass={cs.CL} } ``` > Created by [Ahmed Elnaggar/@Elnaggar_AI](https://twitter.com/Elnaggar_AI) | [LinkedIn](https://www.linkedin.com/in/prof-ahmed-elnaggar/)