AlexWortega commited on
Commit
eba8f13
1 Parent(s): d39ec88

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -36
README.md CHANGED
@@ -64,28 +64,7 @@ alt="drawing" width="600"/>
64
  9. [Model Card Authors](#model-card-authors)
65
 
66
  # TL;DR
67
-
68
- If you already know T5, FLAN-T5 is just better at everything. For the same number of parameters, these models have been fine-tuned on more than 1000 additional tasks covering also more languages.
69
- As mentioned in the first few lines of the abstract :
70
- > Flan-PaLM 540B achieves state-of-the-art performance on several benchmarks, such as 75.2% on five-shot MMLU. We also publicly release Flan-T5 checkpoints,1 which achieve strong few-shot performance even compared to much larger models, such as PaLM 62B. Overall, instruction finetuning is a general method for improving the performance and usability of pretrained language models.
71
-
72
- **Disclaimer**: Content from **this** model card has been written by the Hugging Face team, and parts of it were copy pasted from the [T5 model card](https://huggingface.co/t5-large).
73
-
74
- # Model Details
75
-
76
- ## Model Description
77
-
78
-
79
- - **Model type:** Language model
80
- - **Language(s) (NLP):** English, Spanish, Japanese, Persian, Hindi, French, Chinese, Bengali, Gujarati, German, Telugu, Italian, Arabic, Polish, Tamil, Marathi, Malayalam, Oriya, Panjabi, Portuguese, Urdu, Galician, Hebrew, Korean, Catalan, Thai, Dutch, Indonesian, Vietnamese, Bulgarian, Filipino, Central Khmer, Lao, Turkish, Russian, Croatian, Swedish, Yoruba, Kurdish, Burmese, Malay, Czech, Finnish, Somali, Tagalog, Swahili, Sinhala, Kannada, Zhuang, Igbo, Xhosa, Romanian, Haitian, Estonian, Slovak, Lithuanian, Greek, Nepali, Assamese, Norwegian
81
- - **License:** Apache 2.0
82
- - **Related Models:** [All FLAN-T5 Checkpoints](https://huggingface.co/models?search=flan-t5)
83
- - **Original Checkpoints:** [All Original FLAN-T5 Checkpoints](https://github.com/google-research/t5x/blob/main/docs/models.md#flan-t5-checkpoints)
84
- - **Resources for more information:**
85
- - [Research paper](https://arxiv.org/pdf/2210.11416.pdf)
86
- - [GitHub Repo](https://github.com/google-research/t5x)
87
- - [Hugging Face FLAN-T5 Docs (Similar to T5) ](https://huggingface.co/docs/transformers/model_doc/t5)
88
-
89
  # Usage
90
 
91
  Find below some example scripts on how to use the model in `transformers`:
@@ -244,17 +223,3 @@ The information below in this section are copied from the model's [official mode
244
 
245
  copyright = {Creative Commons Attribution 4.0 International}
246
  }
247
- ```
248
- ## Model Recycling
249
-
250
- [Evaluation on 36 datasets](https://ibm.github.io/model-recycling/model_gain_chart?avg=9.16&mnli_lp=nan&20_newsgroup=3.34&ag_news=1.49&amazon_reviews_multi=0.21&anli=13.91&boolq=16.75&cb=23.12&cola=9.97&copa=34.50&dbpedia=6.90&esnli=5.37&financial_phrasebank=18.66&imdb=0.33&isear=1.37&mnli=11.74&mrpc=16.63&multirc=6.24&poem_sentiment=14.62&qnli=3.41&qqp=6.18&rotten_tomatoes=2.98&rte=24.26&sst2=0.67&sst_5bins=5.44&stsb=20.68&trec_coarse=3.95&trec_fine=10.73&tweet_ev_emoji=13.39&tweet_ev_emotion=4.62&tweet_ev_hate=3.46&tweet_ev_irony=9.04&tweet_ev_offensive=1.69&tweet_ev_sentiment=0.75&wic=14.22&wnli=9.44&wsc=5.53&yahoo_answers=4.14&model_name=google%2Fflan-t5-base&base_name=google%2Ft5-v1_1-base) using AlexWortega/Flan_base_translated as a base model yields average score of 77.98 in comparison to 68.82 by google/t5-v1_1-base.
251
-
252
- The model is ranked 1st among all tested models for the google/t5-v1_1-base architecture as of 06/02/2023
253
- Results:
254
-
255
- | 20_newsgroup | ag_news | amazon_reviews_multi | anli | boolq | cb | cola | copa | dbpedia | esnli | financial_phrasebank | imdb | isear | mnli | mrpc | multirc | poem_sentiment | qnli | qqp | rotten_tomatoes | rte | sst2 | sst_5bins | stsb | trec_coarse | trec_fine | tweet_ev_emoji | tweet_ev_emotion | tweet_ev_hate | tweet_ev_irony | tweet_ev_offensive | tweet_ev_sentiment | wic | wnli | wsc | yahoo_answers |
256
- |---------------:|----------:|-----------------------:|--------:|--------:|--------:|--------:|-------:|----------:|--------:|-----------------------:|-------:|--------:|--------:|--------:|----------:|-----------------:|--------:|--------:|------------------:|--------:|--------:|------------:|--------:|--------------:|------------:|-----------------:|-------------------:|----------------:|-----------------:|---------------------:|---------------------:|--------:|-------:|--------:|----------------:|
257
- | 86.2188 | 89.6667 | 67.12 | 51.9688 | 82.3242 | 78.5714 | 80.1534 | 75 | 77.6667 | 90.9507 | 85.4 | 93.324 | 72.425 | 87.2457 | 89.4608 | 62.3762 | 82.6923 | 92.7878 | 89.7724 | 89.0244 | 84.8375 | 94.3807 | 57.2851 | 89.4759 | 97.2 | 92.8 | 46.848 | 80.2252 | 54.9832 | 76.6582 | 84.3023 | 70.6366 | 70.0627 | 56.338 | 53.8462 | 73.4 |
258
-
259
-
260
- For more information, see: [Model Recycling](https://ibm.github.io/model-recycling/)
 
64
  9. [Model Card Authors](#model-card-authors)
65
 
66
  # TL;DR
67
+ later
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
68
  # Usage
69
 
70
  Find below some example scripts on how to use the model in `transformers`:
 
223
 
224
  copyright = {Creative Commons Attribution 4.0 International}
225
  }