t5_1.1_base / README.md

Update README.md with new model card content

07e878c verified 3 months ago

4.29 kB

	---
	library_name: keras-hub
	license: apache-2.0
	tags:
	- text-classification
	- keras
	pipeline_tag: text-generation
	---
	### Model Overview
	⚠️ T5 is currently only available via the `keras-hub-nightly` package. Use `pip install keras-hub-nightly` to try this model.

	T5 encoder-decoder backbone model.

	T5 is a LLM pretrained on a mix of unsupervised and supervised tasks,
	where each task is converted to a sequence-to-sequence format.
	T5 works well on a variety of tasks out-of-the-box by prepending
	various prefixes to the input sequence, e.g., for translation:
	`"translate English to German: ..."`, for summarization:
	`"summarize: ..."`.

	T5 was introduced in
	[Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683)

	The default constructor gives a fully customizable, randomly initialized T5
	model with any number of layers, heads, and embedding dimensions. To load
	preset architectures and weights, use the `from_preset` constructor.

	Disclaimer: Pre-trained models are provided on an "as is" basis, without
	warranties or conditions of any kind.

	## Links

	* [T5 Quickstart Notebook](coming soon)
	* [T5 API Documentation](https://keras.io/keras_hub/api/models/t5/)
	* [T5 Model Card](https://github.com/google-research/text-to-text-transfer-transformer/tree/main)
	* [KerasHub Beginner Guide](https://keras.io/guides/keras_hub/getting_started/)
	* [KerasHub Model Publishing Guide](https://keras.io/guides/keras_hub/upload/)

	## Installation

	Keras and KerasHub can be installed with:

	```
	pip install -U -q keras-hub
	pip install -U -q keras
	```

	Jax, TensorFlow, and Torch come preinstalled in Kaggle Notebooks. For instructions on installing them in another environment see the [Keras Getting Started](https://keras.io/getting_started/) page.

	## Presets

	The following model checkpoints are provided by the Keras team. Full code examples for each are available below.
	\| Preset name \| Parameters \| Description \|
	\|----------------\|------------\|--------------------------------------------------\|
	\| t5_small_multi \| 0 \| 8-layer T5 model. Trained on the Colossal Clean Crawled Corpus (C4).\|
	\| t5_base_multi\| 0 \| 12-layer T5 model. Trained on the Colossal Clean Crawled Corpus (C4). \|
	\| t5_large_multi \| 0 \| 24-layer T5 model. Trained on the Colossal Clean Crawled Corpus (C4). \|
	\| flan_small_multi \| 0 \| 8-layer T5 model. Trained on the Colossal Clean Crawled Corpus (C4). \|
	\| flan_base_multi \| 0 \| 12-layer T5 model. Trained on the Colossal Clean Crawled Corpus (C4). \|
	\| flan_large_multi \| 0 \| 24-layer T5 model. Trained on the Colossal Clean Crawled Corpus (C4). \|
	\| t5_1.1_small \| 60.51M \| \|
	\| tt5_1.1_base \| 247.58M \| \|
	\| t5_1.1_large \| 750.25M \| \|
	\| t5_1.1_xl \| 2.85B \| \|
	\| t5_1.1_xxl \| 11.14B \| \|

	__Arguments__


	- __vocabulary_size__: int. The size of the token vocabulary.
	- __num_layers__: int. The number of Transformer layers.
	- __num_heads__: int. The number of attention heads for each Transformer.
	The hidden size must be divisible by the number of attention heads.
	- __hidden_dim__: int. The hidden size of the Transformer layers.
	- __intermediate_dim__: int. The output dimension of the first Dense layer in
	a two-layer feedforward network for each Transformer layer.
	- __key_value_dim__: int. The dimension of each head of the key/value
	projections in the multi-head attention layers. Defaults to
	hidden_dim / num_heads.
	- __dropout__: float. Dropout probability for the Transformer layers.
	- __activation__: activation function (or activation string name). The
	activation to be used in the inner dense blocks of the
	Transformer layers. Defaults to `"relu"`.
	- __use_gated_activation__: boolean. Whether to use activation gating in
	the inner dense blocks of the Transformer layers.
	The original T5 architecture didn't use gating, but more
	recent versions do. Defaults to `True`.
	- __layer_norm_epsilon__: float. Epsilon factor to be used in the
	layer normalization layers in the Transformer layers.
	- __tie_embedding_weights__: boolean. If `True`, the weights of the token
	embedding and the weights projecting language model outputs from
	`hidden_dim`