togethercomputer
/

GPT-JT-6B-v1

Text Generation

Model card Files Files and versions

juewang commited on Nov 28, 2022

Commit

6e5b68f

·

1 Parent(s): adf6322

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -81,8 +81,8 @@ widget:
 > With a new decentralized training algorithm, we fine-tuned GPT-J (6B) on 3.53 billion tokens, resulting in GPT-JT (6B), a model that outperforms many 100B+ parameter models on classification benchmarks.
 We incorporated a collection of open techniques and datasets to build GPT-JT:
-- GPT-JT was based on [GPT-J (6B)](https://huggingface.co/EleutherAI/gpt-j-6B), created by [EleutherAI](https://www.eleuther.ai);
-- We used [UL2](https://github.com/google-research/google-research/tree/master/ul2)'s training objective, which allows the model to use bidirectional context to process the prompt;
 - The model was trained on a large collection of diverse data, including [Chain-of-Thought (CoT)](https://ai.googleblog.com/2022/05/language-models-perform-reasoning-via.html), [Public Pool of Prompts (P3) dataset](https://huggingface.co/datasets/bigscience/P3), [Natural-Instructions (NI) dataset](https://github.com/allenai/natural-instructions).
 With the help of techniques mentioned above, GPT-JT significantly improves the performance of classification tasks over the original GPT-J, and even outperforms most 100B+ parameter models!

 > With a new decentralized training algorithm, we fine-tuned GPT-J (6B) on 3.53 billion tokens, resulting in GPT-JT (6B), a model that outperforms many 100B+ parameter models on classification benchmarks.
 We incorporated a collection of open techniques and datasets to build GPT-JT:
+- GPT-JT is a folk of [GPT-J (6B)](https://huggingface.co/EleutherAI/gpt-j-6B), which is created by [EleutherAI](https://www.eleuther.ai);
+- We used [UL2](https://github.com/google-research/google-research/tree/master/ul2)'s training objective, allowing the model to see bidirectional context of the prompt;
 - The model was trained on a large collection of diverse data, including [Chain-of-Thought (CoT)](https://ai.googleblog.com/2022/05/language-models-perform-reasoning-via.html), [Public Pool of Prompts (P3) dataset](https://huggingface.co/datasets/bigscience/P3), [Natural-Instructions (NI) dataset](https://github.com/allenai/natural-instructions).
 With the help of techniques mentioned above, GPT-JT significantly improves the performance of classification tasks over the original GPT-J, and even outperforms most 100B+ parameter models!