File size: 4,199 Bytes
8d41681 c45fef2 a29639c c45fef2 a29639c c45fef2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 |
---
library_name: keras-hub
---
### Model Overview
⚠️ T5 is currently only available via the `keras-hub-nightly` package. Use `pip install keras-hub-nightly` to try this model.
T5 encoder-decoder backbone model.
T5 is a LLM pretrained on a mix of unsupervised and supervised tasks,
where each task is converted to a sequence-to-sequence format.
T5 works well on a variety of tasks out-of-the-box by prepending
various prefixes to the input sequence, e.g., for translation:
`"translate English to German: ..."`, for summarization:
`"summarize: ..."`.
T5 was introduced in
[Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683)
The default constructor gives a fully customizable, randomly initialized T5
model with any number of layers, heads, and embedding dimensions. To load
preset architectures and weights, use the `from_preset` constructor.
Disclaimer: Pre-trained models are provided on an "as is" basis, without
warranties or conditions of any kind.
## Links
* [T5 Quickstart Notebook](coming soon)
* [T5 API Documentation](https://keras.io/keras_hub/api/models/t5/)
* [T5 Model Card](https://github.com/google-research/text-to-text-transfer-transformer/tree/main)
* [KerasHub Beginner Guide](https://keras.io/guides/keras_hub/getting_started/)
* [KerasHub Model Publishing Guide](https://keras.io/guides/keras_hub/upload/)
## Installation
Keras and KerasHub can be installed with:
```
pip install -U -q keras-hub
pip install -U -q keras
```
Jax, TensorFlow, and Torch come preinstalled in Kaggle Notebooks. For instructions on installing them in another environment see the [Keras Getting Started](https://keras.io/getting_started/) page.
## Presets
The following model checkpoints are provided by the Keras team. Full code examples for each are available below.
| Preset name | Parameters | Description |
|----------------|------------|--------------------------------------------------|
| t5_small_multi | 0 | 8-layer T5 model. Trained on the Colossal Clean Crawled Corpus (C4).|
| t5_base_multi| 0 | 12-layer T5 model. Trained on the Colossal Clean Crawled Corpus (C4). |
| t5_large_multi | 0 | 24-layer T5 model. Trained on the Colossal Clean Crawled Corpus (C4). |
| flan_small_multi | 0 | 8-layer T5 model. Trained on the Colossal Clean Crawled Corpus (C4). |
| flan_base_multi | 0 | 12-layer T5 model. Trained on the Colossal Clean Crawled Corpus (C4). |
| flan_large_multi | 0 | 24-layer T5 model. Trained on the Colossal Clean Crawled Corpus (C4). |
| t5_1.1_small | 60.51M | |
| tt5_1.1_base | 247.58M | |
| t5_1.1_large | 750.25M | |
| t5_1.1_xl | 2.85B | |
| t5_1.1_xxl | 11.14B | |
__Arguments__
- __vocabulary_size__: int. The size of the token vocabulary.
- __num_layers__: int. The number of Transformer layers.
- __num_heads__: int. The number of attention heads for each Transformer.
The hidden size must be divisible by the number of attention heads.
- __hidden_dim__: int. The hidden size of the Transformer layers.
- __intermediate_dim__: int. The output dimension of the first Dense layer in
a two-layer feedforward network for each Transformer layer.
- __key_value_dim__: int. The dimension of each head of the key/value
projections in the multi-head attention layers. Defaults to
hidden_dim / num_heads.
- __dropout__: float. Dropout probability for the Transformer layers.
- __activation__: activation function (or activation string name). The
activation to be used in the inner dense blocks of the
Transformer layers. Defaults to `"relu"`.
- __use_gated_activation__: boolean. Whether to use activation gating in
the inner dense blocks of the Transformer layers.
The original T5 architecture didn't use gating, but more
recent versions do. Defaults to `True`.
- __layer_norm_epsilon__: float. Epsilon factor to be used in the
layer normalization layers in the Transformer layers.
- __tie_embedding_weights__: boolean. If `True`, the weights of the token
embedding and the weights projecting language model outputs from
`hidden_dim`
|