AlexWortega
commited on
Commit
•
eba8f13
1
Parent(s):
d39ec88
Update README.md
Browse files
README.md
CHANGED
@@ -64,28 +64,7 @@ alt="drawing" width="600"/>
|
|
64 |
9. [Model Card Authors](#model-card-authors)
|
65 |
|
66 |
# TL;DR
|
67 |
-
|
68 |
-
If you already know T5, FLAN-T5 is just better at everything. For the same number of parameters, these models have been fine-tuned on more than 1000 additional tasks covering also more languages.
|
69 |
-
As mentioned in the first few lines of the abstract :
|
70 |
-
> Flan-PaLM 540B achieves state-of-the-art performance on several benchmarks, such as 75.2% on five-shot MMLU. We also publicly release Flan-T5 checkpoints,1 which achieve strong few-shot performance even compared to much larger models, such as PaLM 62B. Overall, instruction finetuning is a general method for improving the performance and usability of pretrained language models.
|
71 |
-
|
72 |
-
**Disclaimer**: Content from **this** model card has been written by the Hugging Face team, and parts of it were copy pasted from the [T5 model card](https://huggingface.co/t5-large).
|
73 |
-
|
74 |
-
# Model Details
|
75 |
-
|
76 |
-
## Model Description
|
77 |
-
|
78 |
-
|
79 |
-
- **Model type:** Language model
|
80 |
-
- **Language(s) (NLP):** English, Spanish, Japanese, Persian, Hindi, French, Chinese, Bengali, Gujarati, German, Telugu, Italian, Arabic, Polish, Tamil, Marathi, Malayalam, Oriya, Panjabi, Portuguese, Urdu, Galician, Hebrew, Korean, Catalan, Thai, Dutch, Indonesian, Vietnamese, Bulgarian, Filipino, Central Khmer, Lao, Turkish, Russian, Croatian, Swedish, Yoruba, Kurdish, Burmese, Malay, Czech, Finnish, Somali, Tagalog, Swahili, Sinhala, Kannada, Zhuang, Igbo, Xhosa, Romanian, Haitian, Estonian, Slovak, Lithuanian, Greek, Nepali, Assamese, Norwegian
|
81 |
-
- **License:** Apache 2.0
|
82 |
-
- **Related Models:** [All FLAN-T5 Checkpoints](https://huggingface.co/models?search=flan-t5)
|
83 |
-
- **Original Checkpoints:** [All Original FLAN-T5 Checkpoints](https://github.com/google-research/t5x/blob/main/docs/models.md#flan-t5-checkpoints)
|
84 |
-
- **Resources for more information:**
|
85 |
-
- [Research paper](https://arxiv.org/pdf/2210.11416.pdf)
|
86 |
-
- [GitHub Repo](https://github.com/google-research/t5x)
|
87 |
-
- [Hugging Face FLAN-T5 Docs (Similar to T5) ](https://huggingface.co/docs/transformers/model_doc/t5)
|
88 |
-
|
89 |
# Usage
|
90 |
|
91 |
Find below some example scripts on how to use the model in `transformers`:
|
@@ -244,17 +223,3 @@ The information below in this section are copied from the model's [official mode
|
|
244 |
|
245 |
copyright = {Creative Commons Attribution 4.0 International}
|
246 |
}
|
247 |
-
```
|
248 |
-
## Model Recycling
|
249 |
-
|
250 |
-
[Evaluation on 36 datasets](https://ibm.github.io/model-recycling/model_gain_chart?avg=9.16&mnli_lp=nan&20_newsgroup=3.34&ag_news=1.49&amazon_reviews_multi=0.21&anli=13.91&boolq=16.75&cb=23.12&cola=9.97&copa=34.50&dbpedia=6.90&esnli=5.37&financial_phrasebank=18.66&imdb=0.33&isear=1.37&mnli=11.74&mrpc=16.63&multirc=6.24&poem_sentiment=14.62&qnli=3.41&qqp=6.18&rotten_tomatoes=2.98&rte=24.26&sst2=0.67&sst_5bins=5.44&stsb=20.68&trec_coarse=3.95&trec_fine=10.73&tweet_ev_emoji=13.39&tweet_ev_emotion=4.62&tweet_ev_hate=3.46&tweet_ev_irony=9.04&tweet_ev_offensive=1.69&tweet_ev_sentiment=0.75&wic=14.22&wnli=9.44&wsc=5.53&yahoo_answers=4.14&model_name=google%2Fflan-t5-base&base_name=google%2Ft5-v1_1-base) using AlexWortega/Flan_base_translated as a base model yields average score of 77.98 in comparison to 68.82 by google/t5-v1_1-base.
|
251 |
-
|
252 |
-
The model is ranked 1st among all tested models for the google/t5-v1_1-base architecture as of 06/02/2023
|
253 |
-
Results:
|
254 |
-
|
255 |
-
| 20_newsgroup | ag_news | amazon_reviews_multi | anli | boolq | cb | cola | copa | dbpedia | esnli | financial_phrasebank | imdb | isear | mnli | mrpc | multirc | poem_sentiment | qnli | qqp | rotten_tomatoes | rte | sst2 | sst_5bins | stsb | trec_coarse | trec_fine | tweet_ev_emoji | tweet_ev_emotion | tweet_ev_hate | tweet_ev_irony | tweet_ev_offensive | tweet_ev_sentiment | wic | wnli | wsc | yahoo_answers |
|
256 |
-
|---------------:|----------:|-----------------------:|--------:|--------:|--------:|--------:|-------:|----------:|--------:|-----------------------:|-------:|--------:|--------:|--------:|----------:|-----------------:|--------:|--------:|------------------:|--------:|--------:|------------:|--------:|--------------:|------------:|-----------------:|-------------------:|----------------:|-----------------:|---------------------:|---------------------:|--------:|-------:|--------:|----------------:|
|
257 |
-
| 86.2188 | 89.6667 | 67.12 | 51.9688 | 82.3242 | 78.5714 | 80.1534 | 75 | 77.6667 | 90.9507 | 85.4 | 93.324 | 72.425 | 87.2457 | 89.4608 | 62.3762 | 82.6923 | 92.7878 | 89.7724 | 89.0244 | 84.8375 | 94.3807 | 57.2851 | 89.4759 | 97.2 | 92.8 | 46.848 | 80.2252 | 54.9832 | 76.6582 | 84.3023 | 70.6366 | 70.0627 | 56.338 | 53.8462 | 73.4 |
|
258 |
-
|
259 |
-
|
260 |
-
For more information, see: [Model Recycling](https://ibm.github.io/model-recycling/)
|
|
|
64 |
9. [Model Card Authors](#model-card-authors)
|
65 |
|
66 |
# TL;DR
|
67 |
+
later
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
68 |
# Usage
|
69 |
|
70 |
Find below some example scripts on how to use the model in `transformers`:
|
|
|
223 |
|
224 |
copyright = {Creative Commons Attribution 4.0 International}
|
225 |
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|