--- library_name: keras-hub license: apache-2.0 tags: - text-classification - keras - text-generation pipeline_tag: text-generation --- ### Model Overview ⚠️ T5 is currently only available via the `keras-hub-nightly` package. Use `pip install keras-hub-nightly` to try this model. T5 encoder-decoder backbone model. T5 is a LLM pretrained on a mix of unsupervised and supervised tasks, where each task is converted to a sequence-to-sequence format. T5 works well on a variety of tasks out-of-the-box by prepending various prefixes to the input sequence, e.g., for translation: `"translate English to German: ..."`, for summarization: `"summarize: ..."`. T5 was introduced in [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) The default constructor gives a fully customizable, randomly initialized T5 model with any number of layers, heads, and embedding dimensions. To load preset architectures and weights, use the `from_preset` constructor. Disclaimer: Pre-trained models are provided on an "as is" basis, without warranties or conditions of any kind. ## Links * [T5 Quickstart Notebook](coming soon) * [T5 API Documentation](https://keras.io/keras_hub/api/models/t5/) * [T5 Model Card](https://github.com/google-research/text-to-text-transfer-transformer/tree/main) * [KerasHub Beginner Guide](https://keras.io/guides/keras_hub/getting_started/) * [KerasHub Model Publishing Guide](https://keras.io/guides/keras_hub/upload/) ## Installation Keras and KerasHub can be installed with: ``` pip install -U -q keras-hub pip install -U -q keras ``` Jax, TensorFlow, and Torch come preinstalled in Kaggle Notebooks. For instructions on installing them in another environment see the [Keras Getting Started](https://keras.io/getting_started/) page. ## Presets The following model checkpoints are provided by the Keras team. Full code examples for each are available below. | Preset name | Parameters | Description | |----------------|------------|--------------------------------------------------| | t5_small_multi | 0 | 8-layer T5 model. Trained on the Colossal Clean Crawled Corpus (C4).| | t5_base_multi| 0 | 12-layer T5 model. Trained on the Colossal Clean Crawled Corpus (C4). | | t5_large_multi | 0 | 24-layer T5 model. Trained on the Colossal Clean Crawled Corpus (C4). | | flan_small_multi | 0 | 8-layer T5 model. Trained on the Colossal Clean Crawled Corpus (C4). | | flan_base_multi | 0 | 12-layer T5 model. Trained on the Colossal Clean Crawled Corpus (C4). | | flan_large_multi | 0 | 24-layer T5 model. Trained on the Colossal Clean Crawled Corpus (C4). | | t5_1.1_small | 60.51M | | | tt5_1.1_base | 247.58M | | | t5_1.1_large | 750.25M | | | t5_1.1_xl | 2.85B | | | t5_1.1_xxl | 11.14B | | __Arguments__ - __vocabulary_size__: int. The size of the token vocabulary. - __num_layers__: int. The number of Transformer layers. - __num_heads__: int. The number of attention heads for each Transformer. The hidden size must be divisible by the number of attention heads. - __hidden_dim__: int. The hidden size of the Transformer layers. - __intermediate_dim__: int. The output dimension of the first Dense layer in a two-layer feedforward network for each Transformer layer. - __key_value_dim__: int. The dimension of each head of the key/value projections in the multi-head attention layers. Defaults to hidden_dim / num_heads. - __dropout__: float. Dropout probability for the Transformer layers. - __activation__: activation function (or activation string name). The activation to be used in the inner dense blocks of the Transformer layers. Defaults to `"relu"`. - __use_gated_activation__: boolean. Whether to use activation gating in the inner dense blocks of the Transformer layers. The original T5 architecture didn't use gating, but more recent versions do. Defaults to `True`. - __layer_norm_epsilon__: float. Epsilon factor to be used in the layer normalization layers in the Transformer layers. - __tie_embedding_weights__: boolean. If `True`, the weights of the token embedding and the weights projecting language model outputs from `hidden_dim`