Update README.md
Browse files
README.md
CHANGED
@@ -110,9 +110,9 @@ model = AutoModelForCausalLM.from_pretrained("togethercomputer/GPT-JT-6B-v1")
|
|
110 |
## UL2 Training Objective
|
111 |
|
112 |
We train GPT-JT using UL2 training objective [1][2].
|
113 |
-
The original GPT-J uses causal mask (as shown
|
114 |
-
In order to fully leverage the context information, we continue
|
115 |
-
Intuitively, being able to see context bidirectionally might improve downstream tasks that
|
116 |
|
117 |
$$
|
118 |
\begin{bmatrix}
|
|
|
110 |
## UL2 Training Objective
|
111 |
|
112 |
We train GPT-JT using UL2 training objective [1][2].
|
113 |
+
The original GPT-J uses causal mask (as shown below left) for autoregressive generation. So for each token, it can only see its previous context.
|
114 |
+
In order to fully leverage the context information, we continue to train GPT-J with UL2 training objectives, and uses causal mask with prefix (as shown below right) -- using bidirectional attention for the prompt / input and causal attention for token generation.
|
115 |
+
Intuitively, being able to see context bidirectionally might improve downstream tasks that require this information.
|
116 |
|
117 |
$$
|
118 |
\begin{bmatrix}
|