Vikhrmodels
/

VikhrT5-240m

@@ -64,28 +64,7 @@ alt="drawing" width="600"/>
 9. [Model Card Authors](#model-card-authors)
 # TL;DR
-If you already know T5, FLAN-T5 is just better at everything. For the same number of parameters, these models have been fine-tuned on more than 1000 additional tasks covering also more languages.
-As mentioned in the first few lines of the abstract :
->  Flan-PaLM 540B achieves state-of-the-art performance on several benchmarks, such as 75.2% on five-shot MMLU. We also publicly release Flan-T5 checkpoints,1 which achieve strong few-shot performance even compared to much larger models, such as PaLM 62B. Overall, instruction finetuning is a general method for improving the performance and usability of pretrained language models.
-**Disclaimer**: Content from **this** model card has been written by the Hugging Face team, and parts of it were copy pasted from the [T5 model card](https://huggingface.co/t5-large).
-# Model Details
-## Model Description
-- **Model type:** Language model
-- **Language(s) (NLP):** English, Spanish, Japanese, Persian, Hindi, French, Chinese, Bengali, Gujarati, German, Telugu, Italian, Arabic, Polish, Tamil, Marathi, Malayalam, Oriya, Panjabi, Portuguese, Urdu, Galician, Hebrew, Korean, Catalan, Thai, Dutch, Indonesian, Vietnamese, Bulgarian, Filipino, Central Khmer, Lao, Turkish, Russian, Croatian, Swedish, Yoruba, Kurdish, Burmese, Malay, Czech, Finnish, Somali, Tagalog, Swahili, Sinhala, Kannada, Zhuang, Igbo, Xhosa, Romanian, Haitian, Estonian, Slovak, Lithuanian, Greek, Nepali, Assamese, Norwegian
-- **License:** Apache 2.0
-- **Related Models:** [All FLAN-T5 Checkpoints](https://huggingface.co/models?search=flan-t5)
-- **Original Checkpoints:** [All Original FLAN-T5 Checkpoints](https://github.com/google-research/t5x/blob/main/docs/models.md#flan-t5-checkpoints)
-- **Resources for more information:**
-  - [Research paper](https://arxiv.org/pdf/2210.11416.pdf)
-  - [GitHub Repo](https://github.com/google-research/t5x)
-  - [Hugging Face FLAN-T5 Docs (Similar to T5) ](https://huggingface.co/docs/transformers/model_doc/t5)
 # Usage
 Find below some example scripts on how to use the model in `transformers`:
@@ -244,17 +223,3 @@ The information below in this section are copied from the model's [official mode
   copyright = {Creative Commons Attribution 4.0 International}
 }
-```
-## Model Recycling
-[Evaluation on 36 datasets](https://ibm.github.io/model-recycling/model_gain_chart?avg=9.16&mnli_lp=nan&20_newsgroup=3.34&ag_news=1.49&amazon_reviews_multi=0.21&anli=13.91&boolq=16.75&cb=23.12&cola=9.97&copa=34.50&dbpedia=6.90&esnli=5.37&financial_phrasebank=18.66&imdb=0.33&isear=1.37&mnli=11.74&mrpc=16.63&multirc=6.24&poem_sentiment=14.62&qnli=3.41&qqp=6.18&rotten_tomatoes=2.98&rte=24.26&sst2=0.67&sst_5bins=5.44&stsb=20.68&trec_coarse=3.95&trec_fine=10.73&tweet_ev_emoji=13.39&tweet_ev_emotion=4.62&tweet_ev_hate=3.46&tweet_ev_irony=9.04&tweet_ev_offensive=1.69&tweet_ev_sentiment=0.75&wic=14.22&wnli=9.44&wsc=5.53&yahoo_answers=4.14&model_name=google%2Fflan-t5-base&base_name=google%2Ft5-v1_1-base) using AlexWortega/Flan_base_translated as a base model yields average score of 77.98 in comparison to 68.82 by google/t5-v1_1-base.
-The model is ranked 1st among all tested models for the google/t5-v1_1-base architecture as of 06/02/2023
-Results:
-|   20_newsgroup |   ag_news |   amazon_reviews_multi |    anli |   boolq |      cb |    cola |   copa |   dbpedia |   esnli |   financial_phrasebank |   imdb |   isear |    mnli |    mrpc |   multirc |   poem_sentiment |    qnli |     qqp |   rotten_tomatoes |     rte |    sst2 |   sst_5bins |    stsb |   trec_coarse |   trec_fine |   tweet_ev_emoji |   tweet_ev_emotion |   tweet_ev_hate |   tweet_ev_irony |   tweet_ev_offensive |   tweet_ev_sentiment |     wic |   wnli |     wsc |   yahoo_answers |
-|---------------:|----------:|-----------------------:|--------:|--------:|--------:|--------:|-------:|----------:|--------:|-----------------------:|-------:|--------:|--------:|--------:|----------:|-----------------:|--------:|--------:|------------------:|--------:|--------:|------------:|--------:|--------------:|------------:|-----------------:|-------------------:|----------------:|-----------------:|---------------------:|---------------------:|--------:|-------:|--------:|----------------:|
-|        86.2188 |   89.6667 |                  67.12 | 51.9688 | 82.3242 | 78.5714 | 80.1534 |     75 |   77.6667 | 90.9507 |                   85.4 | 93.324 |  72.425 | 87.2457 | 89.4608 |   62.3762 |          82.6923 | 92.7878 | 89.7724 |           89.0244 | 84.8375 | 94.3807 |     57.2851 | 89.4759 |          97.2 |        92.8 |           46.848 |            80.2252 |         54.9832 |          76.6582 |              84.3023 |              70.6366 | 70.0627 | 56.338 | 53.8462 |            73.4 |
-For more information, see: [Model Recycling](https://ibm.github.io/model-recycling/)

 9. [Model Card Authors](#model-card-authors)
 # TL;DR
+later
 # Usage
 Find below some example scripts on how to use the model in `transformers`:
   copyright = {Creative Commons Attribution 4.0 International}
 }