mattdangerw
commited on
Commit
•
0e603a8
1
Parent(s):
d15c368
Update README.md with new model card content
Browse files
README.md
CHANGED
@@ -1,20 +1,53 @@
|
|
1 |
---
|
2 |
library_name: keras-hub
|
3 |
---
|
4 |
-
|
5 |
-
|
6 |
-
|
7 |
-
|
8 |
-
|
9 |
-
|
10 |
-
|
11 |
-
|
12 |
-
|
13 |
-
|
14 |
-
|
15 |
-
|
16 |
-
|
17 |
-
|
18 |
-
|
19 |
-
|
20 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
library_name: keras-hub
|
3 |
---
|
4 |
+
### Model Overview
|
5 |
+
⚠️ T5 is currently only available via the `keras-hub-nightly` package. Use `pip install keras-hub-nightly` to try this model.
|
6 |
+
|
7 |
+
T5 encoder-decoder backbone model.
|
8 |
+
|
9 |
+
T5 is a LLM pretrained on a mix of unsupervised and supervised tasks,
|
10 |
+
where each task is converted to a sequence-to-sequence format.
|
11 |
+
T5 works well on a variety of tasks out-of-the-box by prepending
|
12 |
+
various prefixex to the input sequence, e.g., for translation:
|
13 |
+
`"translate English to German: ..."`, for summarization:
|
14 |
+
`"summarize: ..."`.
|
15 |
+
|
16 |
+
T5 was introduced in
|
17 |
+
[Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683)
|
18 |
+
|
19 |
+
The default constructor gives a fully customizable, randomly initialized T5
|
20 |
+
model with any number of layers, heads, and embedding dimensions. To load
|
21 |
+
preset architectures and weights, use the `from_preset` constructor.
|
22 |
+
|
23 |
+
Disclaimer: Pre-trained models are provided on an "as is" basis, without
|
24 |
+
warranties or conditions of any kind.
|
25 |
+
|
26 |
+
|
27 |
+
__Arguments__
|
28 |
+
|
29 |
+
|
30 |
+
- __vocabulary_size__: int. The size of the token vocabulary.
|
31 |
+
- __num_layers__: int. The number of Transformer layers.
|
32 |
+
- __num_heads__: int. The number of attention heads for each Transformer.
|
33 |
+
The hidden size must be divisible by the number of attention heads.
|
34 |
+
- __hidden_dim__: int. The hidden size of the Transformer layers.
|
35 |
+
- __intermediate_dim__: int. The output dimension of the first Dense layer in
|
36 |
+
a two-layer feedforward network for each Transformer layer.
|
37 |
+
- __key_value_dim__: int. The dimension of each head of the key/value
|
38 |
+
projections in the multi-head attention layers. Defaults to
|
39 |
+
hidden_dim / num_heads.
|
40 |
+
- __dropout__: float. Dropout probability for the Transformer layers.
|
41 |
+
- __activation__: activation function (or activation string name). The
|
42 |
+
activation to be used in the inner dense blocks of the
|
43 |
+
Transformer layers. Defaults to `"relu"`.
|
44 |
+
- __use_gated_activation__: boolean. Whether to use activation gating in
|
45 |
+
the inner dense blocks of the Transformer layers.
|
46 |
+
The original T5 architecture didn't use gating, but more
|
47 |
+
recent versions do. Defaults to `True`.
|
48 |
+
- __layer_norm_epsilon__: float. Epsilon factor to be used in the
|
49 |
+
layer normalization layers in the Transformer layers.
|
50 |
+
- __tie_embedding_weights__: boolean. If `True`, the weights of the token
|
51 |
+
embedding and the weights projecting language model outputs from
|
52 |
+
`hidden_dim`
|
53 |
+
|