agemagician
/

mlong-t5-tglobal-base

Text2Text Generation

Inference Endpoints

Model card Files Files and versions Community

mlong-t5-tglobal-base / README.md

agemagician's picture

Update README.md

2b1b504 over 1 year ago

|

2.98 kB

	---
	license: apache-2.0
	language:
	- multilingual
	- af
	- am
	- ar
	- az
	- be
	- bg
	- bn
	- ca
	- ceb
	- co
	- cs
	- cy
	- da
	- de
	- el
	- en
	- eo
	- es
	- et
	- eu
	- fa
	- fi
	- fil
	- fr
	- fy
	- ga
	- gd
	- gl
	- gu
	- ha
	- haw
	- hi
	- hmn
	- ht
	- hu
	- hy
	- ig
	- is
	- it
	- iw
	- ja
	- jv
	- ka
	- kk
	- km
	- kn
	- ko
	- ku
	- ky
	- la
	- lb
	- lo
	- lt
	- lv
	- mg
	- mi
	- mk
	- ml
	- mn
	- mr
	- ms
	- mt
	- my
	- ne
	- nl
	- no
	- ny
	- pa
	- pl
	- ps
	- pt
	- ro
	- ru
	- sd
	- si
	- sk
	- sl
	- sm
	- sn
	- so
	- sq
	- sr
	- st
	- su
	- sv
	- sw
	- ta
	- te
	- tg
	- th
	- tr
	- uk
	- und
	- ur
	- uz
	- vi
	- xh
	- yi
	- yo
	- zh
	- zu
	datasets:
	- mc4
	---

	# MLongT5 (transient-global attention, base-sized model)

	MLongT5 model pre-trained on Multi-language corpus. The model was introduced in the paper [mLongT5: A Multilingual and Efficient Text-To-Text Transformer for Longer Sequences](https://arxiv.org/pdf/2305.11129.pdf) by Uthus et al. and first released in [the LongT5 repository](https://github.com/google-research/longt5). All the model architecture and configuration can be found in [Flaxformer repository](https://github.com/google/flaxformer) which uses another Google research project repository [T5x](https://github.com/google-research/t5x).

	Disclaimer: The team releasing MLongT5 did not write a model card for this model so this model card has been written by Ahmed Elnaggar.

	## Model description
	MLongT5 model is an encoder-decoder transformer pre-trained in a text-to-text denoising generative setting ([Pegasus-like generation pre-training](https://arxiv.org/pdf/1912.08777.pdf)). MLongT5 model is an extension of [LongT5 model](https://arxiv.org/abs/2112.07916), and it enables using one of the two different efficient attention mechanisms - (1) Local attention, or (2) Transient-Global attention. The usage of attention sparsity patterns allows the model to efficiently handle input sequence.

	MLongT5 is particularly effective when fine-tuned for text generation (summarization, question answering) which requires handling long input sequences (up to 16,384 tokens).

	## Intended uses & limitations

	The model is mostly meant to be fine-tuned on a supervised dataset. See the [model hub](https://huggingface.co/models?search=mlongt5) to look for fine-tuned versions on a task that interests you.

	### How to use

	```python
	from transformers import T5Tokenizer, LongT5Model

	tokenizer = T5Tokenizer.from_pretrained("agemagician/mlong-t5-tglobal-base")
	model = LongT5Model.from_pretrained("agemagician/mlong-t5-tglobal-base")

	inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
	outputs = model(**inputs)

	last_hidden_states = outputs.last_hidden_state
	```

	### BibTeX entry and citation info

	```bibtex
	@misc{uthus2023mlongt5,
	title={mLongT5: A Multilingual and Efficient Text-To-Text Transformer for Longer Sequences},
	author={David Uthus and Santiago Ontañón and Joshua Ainslie and Mandy Guo},
	year={2023},
	eprint={2305.11129},
	archivePrefix={arXiv},
	primaryClass={cs.CL}
	}
	```