FLAN-T5-Definition Large

This model is a version of FLAN-T5 Large finetuned on a dataset of English definitions and usage examples.

It generates definitions of English words in context. Its input is the usage example and the instruction question "What is the definiton of TARGET_WORD?"

This project is a collaboration between the Dialogue Modelling Group at the University of Amsterdam and the Language Technology Group at the University of Oslo.

Sizes:

Model description

See details in the paper Interpretable Word Sense Representations via Definition Generation: The Case of Semantic Change Analysis (ACL'2023) by Mario Giulianelli, Iris Luden, Raquel Fernandez and Andrey Kutuzov.

Intended uses & limitations

The model is intended for research purposes, as a source of contextualized dictionary-like lexical definitions.

The fine-tuning datasets were limited to English. Although the original FLAN-T5 is a multilingual model, we did not thoroughly evaluate its ability to generate definitions in languages other than English.

Generated definitions can contain all sorts of biases and stereotypes, stemming from the underlying language model.

Training and evaluation data

Three datasets were used to fine-tune the model:

WordNet (Ishiwatari et al., NAACL 2019), also available on HF
Oxford dictionary or CHA (Gadetsky et al., ACL 2018)
English subset of CodWoE (Mickus et al., SemEval 2022)

FLAN-T5-Definition Large achieves the following results on the WordNet test set:

BLEU: 14.37
ROUGE-L: 33.74
BERT-F1: 88.21

FLAN-T5-Definition Large achieves the following results on the Oxford dictionary test set:

BLEU: 10.90
ROUGE-L: 30.05
BERT-F1: 87.44

Training procedure

FLAN-T5 Base was fine-tuned in a sequence-to-sequence mode on examples of contextualized dictionary definitions.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 16
seed: 42
distributed_type: multi-GPU
num_devices: 8
total_train_batch_size: 64
total_eval_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 15.0

Training results

Training Loss	Epoch	Step	Validation Loss	Rouge1	Rouge2	Rougel	Rougelsum	Gen Len
2.1769	1.0	2740	1.9050	28.7222	9.1873	26.6888	26.6937	11.3429
1.9408	2.0	5480	1.8151	29.8799	10.2327	27.7947	27.8044	11.4165
1.8124	3.0	8220	1.7608	30.9845	10.9982	28.8059	28.8131	11.5310
1.7118	4.0	10960	1.7229	31.6943	11.7412	29.4967	29.5319	11.7037
1.6286	5.0	13700	1.6937	32.5839	12.2431	30.1799	30.206	11.7784
1.5597	6.0	16440	1.6748	32.9915	12.8514	30.7016	30.7145	11.5974
1.4982	7.0	19180	1.6578	33.2157	13.1389	30.9428	30.9519	11.3580
1.4468	8.0	21920	1.6473	33.6146	13.5922	31.3001	31.3235	11.5724
1.4022	9.0	24660	1.6384	34.1711	14.1117	31.7951	31.8066	11.7389
1.364	10.0	27400	1.6337	34.5489	14.5012	32.1329	32.1446	11.6659
1.3321	11.0	30140	1.6291	34.7133	14.7297	32.3042	32.314	11.8003
1.3054	12.0	32880	1.6267	34.9411	15.0282	32.5335	32.5451	11.7619
1.2845	13.0	35620	1.6262	35.1648	15.2154	32.7387	32.742	11.8317
1.2699	14.0	38360	1.6257	35.2849	15.3109	32.8508	32.853	11.8168
1.2595	15.0	41100	1.6273	35.2224	15.2781	32.7718	32.7826	11.7971

Framework versions

Transformers 4.23.1
Pytorch 1.12.1+rocm5.1.1
Datasets 2.4.0
Tokenizers 0.12.1

Citation

@inproceedings{giulianelli-etal-2023-interpretable,
    title = "Interpretable Word Sense Representations via Definition Generation: The Case of Semantic Change Analysis",
    author = "Giulianelli, Mario  and
      Luden, Iris  and
      Fernandez, Raquel  and
      Kutuzov, Andrey",
    booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.acl-long.176",
    doi = "10.18653/v1/2023.acl-long.176",
    pages = "3130--3148",
    abstract = "We propose using automatically generated natural language definitions of contextualised word usages as interpretable word and word sense representations.Given a collection of usage examples for a target word, and the corresponding data-driven usage clusters (i.e., word senses), a definition is generated for each usage with a specialised Flan-T5 language model, and the most prototypical definition in a usage cluster is chosen as the sense label. We demonstrate how the resulting sense labels can make existing approaches to semantic change analysis more interpretable, and how they can allow users {---} historical linguists, lexicographers, or social scientists {---} to explore and intuitively explain diachronic trajectories of word meaning. Semantic change analysis is only one of many possible applications of the {`}definitions as representations{'} paradigm. Beyond being human-readable, contextualised definitions also outperform token or usage sentence embeddings in word-in-context semantic similarity judgements, making them a new promising type of lexical representation for NLP.",
}

ltg
/

flan-t5-definition-en-large