togethercomputer
/

GPT-JT-6B-v1

Text Generation

Model card Files Files and versions

juewang commited on Nov 28, 2022

Commit

9cbcd8e

·

1 Parent(s): 98f4ed2

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -110,9 +110,9 @@ model = AutoModelForCausalLM.from_pretrained("togethercomputer/GPT-JT-6B-v1")
 ## UL2 Training Objective
 We train GPT-JT using UL2 training objective [1][2].
-The original GPT-J uses causal mask (as shown in the lower left) to perform autoregressive generation.So for each token, it can only see its previous context.
-In order to fully leverage the context information, we continue training GPT-J with UL2 training objectives, and uses causal mask with prefix (as shown in the lower right) -- using bidirectional attention for the prompt / input and causal attention for token generation.
-Intuitively, being able to see context bidirectionally might improve downstream tasks that requires this information.
 $$
 \begin{bmatrix}

 ## UL2 Training Objective
 We train GPT-JT using UL2 training objective [1][2].
+The original GPT-J uses causal mask (as shown below left) for autoregressive generation. So for each token, it can only see its previous context.
+In order to fully leverage the context information, we continue to train GPT-J with UL2 training objectives, and uses causal mask with prefix (as shown below right) -- using bidirectional attention for the prompt / input and causal attention for token generation.
+Intuitively, being able to see context bidirectionally might improve downstream tasks that require this information.
 $$
 \begin{bmatrix}