|
--- |
|
tags: |
|
- text2text-generation |
|
- definition-modeling |
|
metrics: |
|
- rouge, bleu, bert-f1 |
|
model-index: |
|
- name: flan-t5-definition-en-large |
|
results: [] |
|
language: |
|
- en |
|
widget: |
|
- text: "He ate a sweet apple. What is the definition of apple?" |
|
example_title: "Definition generation" |
|
- text: "The paper contains a number of original ideas about color perception. What is the definition of original?" |
|
example_title: "Definition generation" |
|
license: cc-by-sa-4.0 |
|
datasets: |
|
- marksverdhei/wordnet-definitions-en-2021 |
|
--- |
|
|
|
|
|
# FLAN-T5-Definition Large |
|
|
|
This model is a version of [FLAN-T5 Large](https://huggingface.co/google/flan-t5-large) finetuned on a dataset of English definitions and usage examples. |
|
|
|
It generates definitions of English words in context. |
|
Its input is the usage example and the instruction question "What is the definiton of TARGET_WORD?" |
|
|
|
This project is a collaboration between the [Dialogue Modelling Group](https://dmg-illc.github.io/dmg/) at the University of Amsterdam |
|
and the [Language Technology Group](https://www.mn.uio.no/ifi/english/research/groups/ltg/) at the University of Oslo. |
|
|
|
## Sizes: |
|
- [FLAN-T5-Definition Base (250M parameters)](https://huggingface.co/ltg/flan-t5-definition-en-base) |
|
- [FLAN-T5-Definition Large (780M parameters)](https://huggingface.co/ltg/flan-t5-definition-en-large) |
|
- [FLAN-T5-Definition XL (3B parameters)](https://huggingface.co/ltg/flan-t5-definition-en-xl) |
|
|
|
## Model description |
|
|
|
See details in the paper [`Interpretable Word Sense Representations via Definition Generation: The Case of Semantic Change Analysis`](https://arxiv.org/abs/2305.11993) (ACL'2023) by Mario Giulianelli, Iris Luden, Raquel Fernandez and Andrey Kutuzov. |
|
|
|
## Intended uses & limitations |
|
|
|
The model is intended for research purposes, as a source of contextualized dictionary-like lexical definitions. |
|
|
|
The fine-tuning datasets were limited to English. |
|
Although the original FLAN-T5 is a multilingual model, we did not thoroughly evaluate its ability to generate definitions in languages other than English. |
|
|
|
Generated definitions can contain all sorts of biases and stereotypes, stemming from the underlying language model. |
|
|
|
## Training and evaluation data |
|
|
|
Three datasets were used to fine-tune the model: |
|
- *WordNet* ([Ishiwatari et al., NAACL 2019](https://aclanthology.org/N19-1350/)), also [available on HF](https://huggingface.co/datasets/marksverdhei/wordnet-definitions-en-2021) |
|
- *Oxford dictionary or CHA* ([Gadetsky et al., ACL 2018](https://aclanthology.org/P18-2043/)) |
|
- English subset of *CodWoE* ([Mickus et al., SemEval 2022](https://aclanthology.org/2022.semeval-1.1/)) |
|
|
|
FLAN-T5-Definition Large achieves the following results on the WordNet test set: |
|
- BLEU: 14.37 |
|
- ROUGE-L: 33.74 |
|
- BERT-F1: 88.21 |
|
|
|
FLAN-T5-Definition Large achieves the following results on the Oxford dictionary test set: |
|
- BLEU: 10.90 |
|
- ROUGE-L: 30.05 |
|
- BERT-F1: 87.44 |
|
|
|
## Training procedure |
|
|
|
FLAN-T5 Base was fine-tuned in a sequence-to-sequence mode on examples of contextualized dictionary definitions. |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- learning_rate: 5e-05 |
|
- train_batch_size: 8 |
|
- eval_batch_size: 16 |
|
- seed: 42 |
|
- distributed_type: multi-GPU |
|
- num_devices: 8 |
|
- total_train_batch_size: 64 |
|
- total_eval_batch_size: 128 |
|
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 |
|
- lr_scheduler_type: linear |
|
- num_epochs: 15.0 |
|
|
|
### Training results |
|
|
|
| Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum | Gen Len | |
|
|:-------------:|:-----:|:-----:|:---------------:|:-------:|:-------:|:-------:|:---------:|:-------:| |
|
| 2.1769 | 1.0 | 2740 | 1.9050 | 28.7222 | 9.1873 | 26.6888 | 26.6937 | 11.3429 | |
|
| 1.9408 | 2.0 | 5480 | 1.8151 | 29.8799 | 10.2327 | 27.7947 | 27.8044 | 11.4165 | |
|
| 1.8124 | 3.0 | 8220 | 1.7608 | 30.9845 | 10.9982 | 28.8059 | 28.8131 | 11.5310 | |
|
| 1.7118 | 4.0 | 10960 | 1.7229 | 31.6943 | 11.7412 | 29.4967 | 29.5319 | 11.7037 | |
|
| 1.6286 | 5.0 | 13700 | 1.6937 | 32.5839 | 12.2431 | 30.1799 | 30.206 | 11.7784 | |
|
| 1.5597 | 6.0 | 16440 | 1.6748 | 32.9915 | 12.8514 | 30.7016 | 30.7145 | 11.5974 | |
|
| 1.4982 | 7.0 | 19180 | 1.6578 | 33.2157 | 13.1389 | 30.9428 | 30.9519 | 11.3580 | |
|
| 1.4468 | 8.0 | 21920 | 1.6473 | 33.6146 | 13.5922 | 31.3001 | 31.3235 | 11.5724 | |
|
| 1.4022 | 9.0 | 24660 | 1.6384 | 34.1711 | 14.1117 | 31.7951 | 31.8066 | 11.7389 | |
|
| 1.364 | 10.0 | 27400 | 1.6337 | 34.5489 | 14.5012 | 32.1329 | 32.1446 | 11.6659 | |
|
| 1.3321 | 11.0 | 30140 | 1.6291 | 34.7133 | 14.7297 | 32.3042 | 32.314 | 11.8003 | |
|
| 1.3054 | 12.0 | 32880 | 1.6267 | 34.9411 | 15.0282 | 32.5335 | 32.5451 | 11.7619 | |
|
| 1.2845 | 13.0 | 35620 | 1.6262 | 35.1648 | 15.2154 | 32.7387 | 32.742 | 11.8317 | |
|
| 1.2699 | 14.0 | 38360 | 1.6257 | 35.2849 | 15.3109 | 32.8508 | 32.853 | 11.8168 | |
|
| 1.2595 | 15.0 | 41100 | 1.6273 | 35.2224 | 15.2781 | 32.7718 | 32.7826 | 11.7971 | |
|
|
|
|
|
### Framework versions |
|
|
|
- Transformers 4.23.1 |
|
- Pytorch 1.12.1+rocm5.1.1 |
|
- Datasets 2.4.0 |
|
- Tokenizers 0.12.1 |
|
|