seonghyeonye commited on
Commit
6d785d6
·
1 Parent(s): 77a37a0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -13
README.md CHANGED
@@ -1,7 +1,7 @@
1
  **Official repository**: [seonghyeonye/Flipped-Learning](https://github.com/seonghyeonye/Flipped-Learning)
2
  # Model Description
3
  FLIPPED uses a unique meta-learning method to show zero-shot task generalization on classification natural language prompts, outperforming GPT-3 and T0-11B on many tasks with a 4x smaller scale.
4
- It is a series of encoder-decoder model trained on a numerous classification dataset. We show inputs and its corresponding outputs of each instances in each dataset to FLIPPED, and train it to generate its possible instruction. We add unlikelyhood loss in order **not** to generate the instruction when given the same input, but a wrong output. To obtain FLIPPED, we fine-tune a T5 model in a given scale on a multitask mixture covering many different classification NLP tasks.
5
  # Intended uses
6
  You can use the models to perform inference on tasks by specifying your input-output NLP query in a "input: {input}\noutput: {output}" form , and the model will predict the instruction. For example, You can try
7
  *"input: <extra_id_0> this is the best cast iron skillet you will ever buy<extra_id_1>\noutput: Positive"*
@@ -28,12 +28,12 @@ We also provide a quick [Jupyter Notebook](https://github.com/seonghyeonye/Flipp
28
 
29
  # Training procedure
30
  FLIPPED models are based on [T5](https://huggingface.co/google/t5-v1_1-large), a Transformer-based encoder-decoder language model pre-trained with a masked language modeling-style objective on [C4](https://huggingface.co/datasets/c4).
31
- At a high level, the input text along with output label is fed to the encoder and the instruction text is produced by the decoder. The model is fine-tuned to autoregressively generate the target. We also feed input text along with a wrong input, adding an unlikelyhood loss in order not to make model produce the proper instruction in that case. Here are our training details.
32
  Training details:
33
  - Fine-tuning steps: 5'000
34
- - Input sequence length: 384(512 for 3B)
35
  - Target sequence length: 64
36
- - Batch size: 1
37
  - Optimizer: Adafactor
38
  - Learning rate: 5e-5
39
  - Dropout: 0.1
@@ -82,14 +82,10 @@ We evaluate the robustness of models on following datasets with changing the out
82
  The template name we used can be found in the [promptsource template library](https://github.com/bigscience-workshop/promptsource/tree/main/promptsource/templates).
83
  # BibTeX entry and citation info
84
  ```bibtex
85
- @misc{https://doi.org/10.48550/arxiv.2210.02969,
86
- doi = {10.48550/ARXIV.2210.02969},
87
- url = {https://arxiv.org/abs/2210.02969},
88
- author = {Ye, Seonghyeon and Kim, Doyoung and Jang, Joel and Shin, Joongbo and Seo, Minjoon},
89
- keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
90
- title = {Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners},
91
- publisher = {arXiv},
92
- year = {2022},
93
- copyright = {Creative Commons Attribution 4.0 International}
94
  }
95
  ```
 
1
  **Official repository**: [seonghyeonye/Flipped-Learning](https://github.com/seonghyeonye/Flipped-Learning)
2
  # Model Description
3
  FLIPPED uses a unique meta-learning method to show zero-shot task generalization on classification natural language prompts, outperforming GPT-3 and T0-11B on many tasks with a 4x smaller scale.
4
+ It is a series of encoder-decoder model trained on a numerous classification dataset. We show inputs and its corresponding outputs of each instances in each dataset to FLIPPED, and train it to generate its possible instruction. We add unlikelihood loss in order **not** to generate the instruction when given the same input, but a wrong output. To obtain FLIPPED, we fine-tune a T5 model in a given scale on a multitask mixture covering many different classification NLP tasks.
5
  # Intended uses
6
  You can use the models to perform inference on tasks by specifying your input-output NLP query in a "input: {input}\noutput: {output}" form , and the model will predict the instruction. For example, You can try
7
  *"input: <extra_id_0> this is the best cast iron skillet you will ever buy<extra_id_1>\noutput: Positive"*
 
28
 
29
  # Training procedure
30
  FLIPPED models are based on [T5](https://huggingface.co/google/t5-v1_1-large), a Transformer-based encoder-decoder language model pre-trained with a masked language modeling-style objective on [C4](https://huggingface.co/datasets/c4).
31
+ At a high level, the input text along with output label is fed to the encoder and the instruction text is produced by the decoder. The model is fine-tuned to autoregressively generate the target. We also feed input text along with a wrong input, adding an unlikelihood loss in order not to make model produce the proper instruction in that case. Here are our training details.
32
  Training details:
33
  - Fine-tuning steps: 5'000
34
+ - Input sequence length: 384
35
  - Target sequence length: 64
36
+ - Batch size: 240
37
  - Optimizer: Adafactor
38
  - Learning rate: 5e-5
39
  - Dropout: 0.1
 
82
  The template name we used can be found in the [promptsource template library](https://github.com/bigscience-workshop/promptsource/tree/main/promptsource/templates).
83
  # BibTeX entry and citation info
84
  ```bibtex
85
+ @article{ye2022guess,
86
+ title={Guess the Instruction! Making Language Models Stronger Zero-Shot Learners},
87
+ author={Ye, Seonghyeon and Kim, Doyoung and Jang, Joel and Shin, Joongbo and Seo, Minjoon},
88
+ journal={arXiv preprint arXiv:2210.02969},
89
+ year={2022}
 
 
 
 
90
  }
91
  ```