Text Generation
Transformers
PyTorch
English
gptj
Inference Endpoints
juewang commited on
Commit
9cbcd8e
1 Parent(s): 98f4ed2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -110,9 +110,9 @@ model = AutoModelForCausalLM.from_pretrained("togethercomputer/GPT-JT-6B-v1")
110
  ## UL2 Training Objective
111
 
112
  We train GPT-JT using UL2 training objective [1][2].
113
- The original GPT-J uses causal mask (as shown in the lower left) to perform autoregressive generation.So for each token, it can only see its previous context.
114
- In order to fully leverage the context information, we continue training GPT-J with UL2 training objectives, and uses causal mask with prefix (as shown in the lower right) -- using bidirectional attention for the prompt / input and causal attention for token generation.
115
- Intuitively, being able to see context bidirectionally might improve downstream tasks that requires this information.
116
 
117
  $$
118
  \begin{bmatrix}
 
110
  ## UL2 Training Objective
111
 
112
  We train GPT-JT using UL2 training objective [1][2].
113
+ The original GPT-J uses causal mask (as shown below left) for autoregressive generation. So for each token, it can only see its previous context.
114
+ In order to fully leverage the context information, we continue to train GPT-J with UL2 training objectives, and uses causal mask with prefix (as shown below right) -- using bidirectional attention for the prompt / input and causal attention for token generation.
115
+ Intuitively, being able to see context bidirectionally might improve downstream tasks that require this information.
116
 
117
  $$
118
  \begin{bmatrix}