ltg
/

FLAN-T5-Definition Large

This model is a version of FLAN-T5 Large finetuned on a dataset of English definitions and usage examples.

It generates definitions of English words in context. Its input is the usage example and the instruction question "What is the definiton of TARGET_WORD?"

This project is a collaboration between the Dialogue Modelling Group at the University of Amsterdam and the Language Technology Group at the University of Oslo.

Sizes:

Model description

See details in the paper Interpretable Word Sense Representations via Definition Generation: The Case of Semantic Change Analysis (ACL'2023) by Mario Giulianelli, Iris Luden, Raquel Fernandez and Andrey Kutuzov.

Intended uses & limitations

The model is intended for research purposes, as a source of contextualized dictionary-like lexical definitions.

The fine-tuning datasets were limited to English. Although the original FLAN-T5 is a multilingual model, we did not thoroughly evaluate its ability to generate definitions in languages other than English.

Generated definitions can contain all sorts of biases and stereotypes, stemming from the underlying language model.

Training and evaluation data

Three datasets were used to fine-tune the model:

FLAN-T5-Definition Large achieves the following results on the WordNet test set:

  • BLEU: 14.37
  • ROUGE-L: 33.74
  • BERT-F1: 88.21

FLAN-T5-Definition Large achieves the following results on the Oxford dictionary test set:

  • BLEU: 10.90
  • ROUGE-L: 30.05
  • BERT-F1: 87.44

Training procedure

FLAN-T5 Base was fine-tuned in a sequence-to-sequence mode on examples of contextualized dictionary definitions.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • total_train_batch_size: 64
  • total_eval_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 15.0

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum Gen Len
2.1769 1.0 2740 1.9050 28.7222 9.1873 26.6888 26.6937 11.3429
1.9408 2.0 5480 1.8151 29.8799 10.2327 27.7947 27.8044 11.4165
1.8124 3.0 8220 1.7608 30.9845 10.9982 28.8059 28.8131 11.5310
1.7118 4.0 10960 1.7229 31.6943 11.7412 29.4967 29.5319 11.7037
1.6286 5.0 13700 1.6937 32.5839 12.2431 30.1799 30.206 11.7784
1.5597 6.0 16440 1.6748 32.9915 12.8514 30.7016 30.7145 11.5974
1.4982 7.0 19180 1.6578 33.2157 13.1389 30.9428 30.9519 11.3580
1.4468 8.0 21920 1.6473 33.6146 13.5922 31.3001 31.3235 11.5724
1.4022 9.0 24660 1.6384 34.1711 14.1117 31.7951 31.8066 11.7389
1.364 10.0 27400 1.6337 34.5489 14.5012 32.1329 32.1446 11.6659
1.3321 11.0 30140 1.6291 34.7133 14.7297 32.3042 32.314 11.8003
1.3054 12.0 32880 1.6267 34.9411 15.0282 32.5335 32.5451 11.7619
1.2845 13.0 35620 1.6262 35.1648 15.2154 32.7387 32.742 11.8317
1.2699 14.0 38360 1.6257 35.2849 15.3109 32.8508 32.853 11.8168
1.2595 15.0 41100 1.6273 35.2224 15.2781 32.7718 32.7826 11.7971

Framework versions

  • Transformers 4.23.1
  • Pytorch 1.12.1+rocm5.1.1
  • Datasets 2.4.0
  • Tokenizers 0.12.1

Citation

@inproceedings{giulianelli-etal-2023-interpretable,
    title = "Interpretable Word Sense Representations via Definition Generation: The Case of Semantic Change Analysis",
    author = "Giulianelli, Mario  and
      Luden, Iris  and
      Fernandez, Raquel  and
      Kutuzov, Andrey",
    booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.acl-long.176",
    doi = "10.18653/v1/2023.acl-long.176",
    pages = "3130--3148",
    abstract = "We propose using automatically generated natural language definitions of contextualised word usages as interpretable word and word sense representations.Given a collection of usage examples for a target word, and the corresponding data-driven usage clusters (i.e., word senses), a definition is generated for each usage with a specialised Flan-T5 language model, and the most prototypical definition in a usage cluster is chosen as the sense label. We demonstrate how the resulting sense labels can make existing approaches to semantic change analysis more interpretable, and how they can allow users {---} historical linguists, lexicographers, or social scientists {---} to explore and intuitively explain diachronic trajectories of word meaning. Semantic change analysis is only one of many possible applications of the {`}definitions as representations{'} paradigm. Beyond being human-readable, contextualised definitions also outperform token or usage sentence embeddings in word-in-context semantic similarity judgements, making them a new promising type of lexical representation for NLP.",
}
Downloads last month
18
Safetensors
Model size
783M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train ltg/flan-t5-definition-en-large

Collection including ltg/flan-t5-definition-en-large