File size: 6,851 Bytes
155d4aa 2b1b504 155d4aa 2b1b504 cf9fb77 2b1b504 cf9fb77 2b1b504 ea0ed7f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 |
---
license: apache-2.0
language:
- multilingual
- af
- am
- ar
- az
- be
- bg
- bn
- ca
- ceb
- co
- cs
- cy
- da
- de
- el
- en
- eo
- es
- et
- eu
- fa
- fi
- fil
- fr
- fy
- ga
- gd
- gl
- gu
- ha
- haw
- hi
- hmn
- ht
- hu
- hy
- ig
- is
- it
- iw
- ja
- jv
- ka
- kk
- km
- kn
- ko
- ku
- ky
- la
- lb
- lo
- lt
- lv
- mg
- mi
- mk
- ml
- mn
- mr
- ms
- mt
- my
- ne
- nl
- no
- ny
- pa
- pl
- ps
- pt
- ro
- ru
- sd
- si
- sk
- sl
- sm
- sn
- so
- sq
- sr
- st
- su
- sv
- sw
- ta
- te
- tg
- th
- tr
- uk
- und
- ur
- uz
- vi
- xh
- yi
- yo
- zh
- zu
datasets:
- mc4
---
# MLongT5 (transient-global attention, base-sized model)
MLongT5 model pre-trained on Multi-language corpus. The model was introduced in the paper [mLongT5: A Multilingual and Efficient Text-To-Text Transformer for Longer Sequences](https://arxiv.org/pdf/2305.11129.pdf) by Uthus et al. and first released in [the LongT5 repository](https://github.com/google-research/longt5). All the model architecture and configuration can be found in [Flaxformer repository](https://github.com/google/flaxformer) which uses another Google research project repository [T5x](https://github.com/google-research/t5x).
Disclaimer: The team releasing MLongT5 did not write a model card for this model so this model card has been written by Ahmed Elnaggar.
## Model description
MLongT5 model is an encoder-decoder transformer pre-trained in a text-to-text denoising generative setting ([Pegasus-like generation pre-training](https://arxiv.org/pdf/1912.08777.pdf)). MLongT5 model is an extension of [LongT5 model](https://arxiv.org/abs/2112.07916), and it enables using one of the two different efficient attention mechanisms - (1) Local attention, or (2) Transient-Global attention. The usage of attention sparsity patterns allows the model to efficiently handle input sequence.
MLongT5 is particularly effective when fine-tuned for text generation (summarization, question answering) which requires handling long input sequences (up to 16,384 tokens).
## Intended uses & limitations
The model is mostly meant to be fine-tuned on a supervised dataset. See the [model hub](https://huggingface.co/models?search=mlongt5) to look for fine-tuned versions on a task that interests you.
### How to use
### How to use
The following shows how one can extract the last hidden representation for the model.
```python
from transformers import T5Tokenizer, LongT5Model
tokenizer = T5Tokenizer.from_pretrained("agemagician/mlong-t5-tglobal-base")
model = LongT5Model.from_pretrained("agemagician/mlong-t5-tglobal-base")
inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
outputs = model(**inputs)
last_hidden_states = outputs.last_hidden_state
```
The following shows how one can predict masked passages using the different denoising strategies.
### S-Denoising
For *S-Denoising*, please make sure to prompt the text with the prefix `[S2S]` as shown below.
```python
from transformers import LongT5ForConditionalGeneration, T5Tokenizer
import torch
model = LongT5ForConditionalGeneration.from_pretrained("agemagician/mlong-t5-tglobal-base", low_cpu_mem_usage=True, torch_dtype=torch.bfloat16).to("cuda")
tokenizer = T5Tokenizer.from_pretrained("agemagician/mlong-t5-tglobal-base")
input_string = "[S2S] Mr. Dursley was the director of a firm called Grunnings, which made drills. He was a big, solid man with a bald head. Mrs. Dursley was thin and blonde and more than the usual amount of neck, which came in very useful as she spent so much of her time craning over garden fences, spying on the neighbours. The Dursleys had a small son called Dudley and in their opinion there was no finer boy anywhere <extra_id_0>"
inputs = tokenizer(input_string, return_tensors="pt").input_ids.to("cuda")
outputs = model.generate(inputs, max_length=200)
print(tokenizer.decode(outputs[0]))
```
### R-Denoising
For *R-Denoising*, please make sure to prompt the text with the prefix `[NLU]` as shown below.
```python
from transformers import LongT5ForConditionalGeneration, T5Tokenizer
import torch
model = LongT5ForConditionalGeneration.from_pretrained("agemagician/mlong-t5-tglobal-base", low_cpu_mem_usage=True, torch_dtype=torch.bfloat16).to("cuda")
tokenizer = T5Tokenizer.from_pretrained("agemagician/mlong-t5-tglobal-base")
input_string = "[NLU] Mr. Dursley was the director of a firm called <extra_id_0>, which made <extra_id_1>. He was a big, solid man with a bald head. Mrs. Dursley was thin and <extra_id_2> of neck, which came in very useful as she spent so much of her time <extra_id_3>. The Dursleys had a small son called Dudley and <extra_id_4>"
inputs = tokenizer(input_string, return_tensors="pt", add_special_tokens=False).input_ids.to("cuda")
outputs = model.generate(inputs, max_length=200)
print(tokenizer.decode(outputs[0]))
```
### X-Denoising
For *X-Denoising*, please make sure to prompt the text with the prefix `[NLG]` as shown below.
```python
from transformers import LongT5ForConditionalGeneration, T5Tokenizer
import torch
model = LongT5ForConditionalGeneration.from_pretrained("agemagician/mlong-t5-tglobal-base", low_cpu_mem_usage=True, torch_dtype=torch.bfloat16).to("cuda")
tokenizer = T5Tokenizer.from_pretrained("agemagician/mlong-t5-tglobal-base")
input_string = "[NLG] Mr. Dursley was the director of a firm called Grunnings, which made drills. He was a big, solid man wiht a bald head. Mrs. Dursley was thin and blonde and more than the usual amount of neck, which came in very useful as she
spent so much of her time craning over garden fences, spying on the neighbours. The Dursleys had a small son called Dudley and in their opinion there was no finer boy anywhere. <extra_id_0>"
model.cuda()
inputs = tokenizer(input_string, return_tensors="pt", add_special_tokens=False).input_ids.to("cuda")
outputs = model.generate(inputs, max_length=200)
print(tokenizer.decode(outputs[0]))
```
### BibTeX entry and citation info
```bibtex
@misc{uthus2023mlongt5,
title={mLongT5: A Multilingual and Efficient Text-To-Text Transformer for Longer Sequences},
author={David Uthus and Santiago Ontañón and Joshua Ainslie and Mandy Guo},
year={2023},
eprint={2305.11129},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
> Created by [Ahmed Elnaggar/@Elnaggar_AI](https://twitter.com/Elnaggar_AI) | [LinkedIn](https://www.linkedin.com/in/prof-ahmed-elnaggar/) |