KerasHub
mattdangerw commited on
Commit
0e603a8
1 Parent(s): d15c368

Update README.md with new model card content

Browse files
Files changed (1) hide show
  1. README.md +50 -17
README.md CHANGED
@@ -1,20 +1,53 @@
1
  ---
2
  library_name: keras-hub
3
  ---
4
- This is a [`T5` model](https://keras.io/api/keras_hub/models/t5) uploaded using the KerasHub library and can be used with JAX, TensorFlow, and PyTorch backends.
5
- Model config:
6
- * **name:** t5_backbone
7
- * **trainable:** True
8
- * **vocabulary_size:** 32128
9
- * **hidden_dim:** 2048
10
- * **intermediate_dim:** 5120
11
- * **num_layers:** 24
12
- * **num_heads:** 32
13
- * **activation:** gelu
14
- * **key_value_dim:** 64
15
- * **dropout:** 0.1
16
- * **use_gated_activation:** True
17
- * **layer_norm_epsilon:** 1e-06
18
- * **tie_embedding_weights:** False
19
-
20
- This model card has been generated automatically and should be completed by the model author. See [Model Cards documentation](https://huggingface.co/docs/hub/model-cards) for more information.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  library_name: keras-hub
3
  ---
4
+ ### Model Overview
5
+ ⚠️ T5 is currently only available via the `keras-hub-nightly` package. Use `pip install keras-hub-nightly` to try this model.
6
+
7
+ T5 encoder-decoder backbone model.
8
+
9
+ T5 is a LLM pretrained on a mix of unsupervised and supervised tasks,
10
+ where each task is converted to a sequence-to-sequence format.
11
+ T5 works well on a variety of tasks out-of-the-box by prepending
12
+ various prefixex to the input sequence, e.g., for translation:
13
+ `"translate English to German: ..."`, for summarization:
14
+ `"summarize: ..."`.
15
+
16
+ T5 was introduced in
17
+ [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683)
18
+
19
+ The default constructor gives a fully customizable, randomly initialized T5
20
+ model with any number of layers, heads, and embedding dimensions. To load
21
+ preset architectures and weights, use the `from_preset` constructor.
22
+
23
+ Disclaimer: Pre-trained models are provided on an "as is" basis, without
24
+ warranties or conditions of any kind.
25
+
26
+
27
+ __Arguments__
28
+
29
+
30
+ - __vocabulary_size__: int. The size of the token vocabulary.
31
+ - __num_layers__: int. The number of Transformer layers.
32
+ - __num_heads__: int. The number of attention heads for each Transformer.
33
+ The hidden size must be divisible by the number of attention heads.
34
+ - __hidden_dim__: int. The hidden size of the Transformer layers.
35
+ - __intermediate_dim__: int. The output dimension of the first Dense layer in
36
+ a two-layer feedforward network for each Transformer layer.
37
+ - __key_value_dim__: int. The dimension of each head of the key/value
38
+ projections in the multi-head attention layers. Defaults to
39
+ hidden_dim / num_heads.
40
+ - __dropout__: float. Dropout probability for the Transformer layers.
41
+ - __activation__: activation function (or activation string name). The
42
+ activation to be used in the inner dense blocks of the
43
+ Transformer layers. Defaults to `"relu"`.
44
+ - __use_gated_activation__: boolean. Whether to use activation gating in
45
+ the inner dense blocks of the Transformer layers.
46
+ The original T5 architecture didn't use gating, but more
47
+ recent versions do. Defaults to `True`.
48
+ - __layer_norm_epsilon__: float. Epsilon factor to be used in the
49
+ layer normalization layers in the Transformer layers.
50
+ - __tie_embedding_weights__: boolean. If `True`, the weights of the token
51
+ embedding and the weights projecting language model outputs from
52
+ `hidden_dim`
53
+