gsarti commited on
Commit
3a5144f
·
verified ·
1 Parent(s): 81ba648

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -9
README.md CHANGED
@@ -15,9 +15,9 @@ thumbnail: https://gsarti.com/publication/it5/featured.png
15
 
16
  The [IT5](https://huggingface.co/models?search=it5) model family represents the first effort in pretraining large-scale sequence-to-sequence transformer models for the Italian language, following the approach adopted by the original [T5 model](https://github.com/google-research/text-to-text-transfer-transformer).
17
 
18
- This model is released as part of the project ["IT5: Large-Scale Text-to-Text Pretraining for Italian Language Understanding and Generation"](https://arxiv.org/abs/2203.03759), by [Gabriele Sarti](https://gsarti.com/) and [Malvina Nissim](https://malvinanissim.github.io/) with the support of [Huggingface](https://discuss.huggingface.co/t/open-to-the-community-community-week-using-jax-flax-for-nlp-cv/7104) and with TPU usage sponsored by Google's [TPU Research Cloud](https://sites.research.google/trc/). All the training was conducted on a single TPU3v8-VM machine on Google Cloud. Refer to the Tensorboard tab of the repository for an overview of the training process.
19
 
20
- *TThe inference widget is deactivated because the model needs a task-specific seq2seq fine-tuning on a downstream task to be useful in practice. The models in the [`it5`](https://huggingface.co/it5) organization provide some examples of this model fine-tuned on various downstream task.*
21
 
22
  ## Model variants
23
 
@@ -76,12 +76,24 @@ For problems or updates on this model, please contact [[email protected]
76
  ## Citation Information
77
 
78
  ```bibtex
79
- @article{sarti-nissim-2022-it5,
80
- title={IT5: Large-scale Text-to-text Pretraining for Italian Language Understanding and Generation},
81
- author={Sarti, Gabriele and Nissim, Malvina},
82
- journal={ArXiv preprint 2203.03759},
83
- url={https://arxiv.org/abs/2203.03759},
84
- year={2022},
85
- month={mar}
 
 
 
 
 
 
 
 
 
 
 
86
  }
 
87
  ```
 
15
 
16
  The [IT5](https://huggingface.co/models?search=it5) model family represents the first effort in pretraining large-scale sequence-to-sequence transformer models for the Italian language, following the approach adopted by the original [T5 model](https://github.com/google-research/text-to-text-transfer-transformer).
17
 
18
+ This model is released as part of the project ["IT5: Text-to-Text Pretraining for Italian Language Understanding and Generation"](https://aclanthology.org/2024.lrec-main.823/), by [Gabriele Sarti](https://gsarti.com/) and [Malvina Nissim](https://malvinanissim.github.io/) with the support of [Huggingface](https://discuss.huggingface.co/t/open-to-the-community-community-week-using-jax-flax-for-nlp-cv/7104) and with TPU usage sponsored by Google's [TPU Research Cloud](https://sites.research.google/trc/). All the training was conducted on a single TPU3v8-VM machine on Google Cloud. Refer to the Tensorboard tab of the repository for an overview of the training process.
19
 
20
+ *TThe inference widget is deactivated because the model needs a task-specific seq2seq fine-tuning on a downstream task to be useful in practice.*
21
 
22
  ## Model variants
23
 
 
76
  ## Citation Information
77
 
78
  ```bibtex
79
+ @inproceedings{sarti-nissim-2024-it5-text,
80
+ title = "{IT}5: Text-to-text Pretraining for {I}talian Language Understanding and Generation",
81
+ author = "Sarti, Gabriele and
82
+ Nissim, Malvina",
83
+ editor = "Calzolari, Nicoletta and
84
+ Kan, Min-Yen and
85
+ Hoste, Veronique and
86
+ Lenci, Alessandro and
87
+ Sakti, Sakriani and
88
+ Xue, Nianwen",
89
+ booktitle = "Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)",
90
+ month = may,
91
+ year = "2024",
92
+ address = "Torino, Italia",
93
+ publisher = "ELRA and ICCL",
94
+ url = "https://aclanthology.org/2024.lrec-main.823",
95
+ pages = "9422--9433",
96
+ abstract = "We introduce IT5, the first family of encoder-decoder transformer models pretrained specifically on Italian. We document and perform a thorough cleaning procedure for a large Italian corpus and use it to pretrain four IT5 model sizes. We then introduce the ItaGen benchmark, which includes a broad range of natural language understanding and generation tasks for Italian, and use it to evaluate the performance of IT5 models and multilingual baselines. We find monolingual IT5 models to provide the best scale-to-performance ratio across tested models, consistently outperforming their multilingual counterparts and setting a new state-of-the-art for Italian language generation.",
97
  }
98
+
99
  ```