ltg
/

ltgoslo's picture
ACL Anthology
dfeae3f verified
---
tags:
- text2text-generation
- definition-modeling
metrics:
- rouge
model-index:
- name: mt0-definition-en-xl
results: []
language:
- en
widget:
- text: "He ate a sweet apple. What is the definition of apple?"
example_title: "Definition generation"
- text: "The paper contains a number of original ideas about color perception. What is the definition of original?"
example_title: "Definition generation"
license: cc-by-sa-4.0
datasets:
- marksverdhei/wordnet-definitions-en-2021
---
# mT0-Definition-En XL
This model is a version of [mT0 XL](https://huggingface.co/bigscience/mt0-xl) finetuned on a dataset of English definitions and usage examples.
It generates definitions of English words in context.
Its input is the usage example and the instruction question "What is the definiton of TARGET_WORD?"
## Models for other languages:
- English: [mT0-Definition-En XL](https://huggingface.co/ltg/mt0-definition-en-xl)
- Norwegian: [mT0-Definition-No XL](https://huggingface.co/ltg/mt0-definition-no-xl)
- Russian: [mT0-Definition-Ru XL](https://huggingface.co/ltg/mt0-definition-ru-xl)
## Model description
See details in the paper [Enriching Word Usage Graphs with Cluster Definitions](https://aclanthology.org/2024.lrec-main.546/) (LREC-COLING'2024) by
Mariia Fedorova, Andrey Kutuzov, Nikolay Arefyev and Dominik Schlechtweg.
## Intended uses & limitations
The model is intended for research purposes, as a source of contextualized dictionary-like lexical definitions.
Generated definitions can contain all sorts of biases and stereotypes, stemming from the underlying language model.
## Training and evaluation data
Three datasets were used to fine-tune the model:
- *WordNet* ([Ishiwatari et al., NAACL 2019](https://aclanthology.org/N19-1350/)), also [available on HF](https://huggingface.co/datasets/marksverdhei/wordnet-definitions-en-2021)
- *Oxford dictionary or CHA* ([Gadetsky et al., ACL 2018](https://aclanthology.org/P18-2043/))
- English subset of *CodWoE* ([Mickus et al., SemEval 2022](https://aclanthology.org/2022.semeval-1.1/))
## Training results
mT0-Definition-En XL achieves the following results on concatenated validations sets from WordNet and Oxford dictionary:
- Loss: 1.7210
- Rouge1: 41.5067
- Rouge2: 23.7149
- Rougel: 39.138
- Rougelsum: 39.1647
- Gen Len: 15.1578
## Training procedure
mT0-Definition-En XL was fine-tuned in a sequence-to-sequence mode on examples of contextualized dictionary definitions.
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 4
- total_train_batch_size: 128
- total_eval_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 20.0
### Framework versions
- Transformers 4.30.2
- Pytorch 1.13.1+rocm5.2
- Datasets 2.12.0
- Tokenizers 0.12.1
## Citation
```
@inproceedings{kutuzov-etal-2024-enriching-word,
title = "Enriching Word Usage Graphs with Cluster Definitions",
author = "Kutuzov, Andrey and
Fedorova, Mariia and
Schlechtweg, Dominik and
Arefyev, Nikolay",
editor = "Calzolari, Nicoletta and
Kan, Min-Yen and
Hoste, Veronique and
Lenci, Alessandro and
Sakti, Sakriani and
Xue, Nianwen",
booktitle = "Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)",
month = may,
year = "2024",
address = "Torino, Italia",
publisher = "ELRA and ICCL",
url = "https://aclanthology.org/2024.lrec-main.546",
pages = "6189--6198",
abstract = "We present a dataset of word usage graphs (WUGs), where the existing WUGs for multiple languages are enriched with cluster labels functioning as sense definitions. They are generated from scratch by fine-tuned encoder-decoder language models. The conducted human evaluation has shown that these definitions match the existing clusters in WUGs better than the definitions chosen from WordNet by two baseline systems. At the same time, the method is straightforward to use and easy to extend to new languages. The resulting enriched datasets can be extremely helpful for moving on to explainable semantic change modeling.",
}
```