Transformers
English
word_sense_disambiguation
Inference Endpoints
File size: 3,213 Bytes
b86b5cd
 
 
 
 
 
6c304f4
b86b5cd
 
 
6c304f4
b86b5cd
 
 
 
0a49ec5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
910360f
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
---
license: apache-2.0
language:
  - en
tags:
  - word_sense_disambiguation
library_name: transformers
datasets:
  - SemCor
  - WordNet
  - WSD_Evaluation_Framework
metrics:
  - f1
---

# Semantic Specialization for Knowledge-based Word Sense Disambiguation
* This repository contains the trained model (projection heads) and sense/context embeddings used for training and evaluating the model.
* If you want to learn how to use these files, please refer to the [semantic_specialization_for_wsd](https://github.com/s-mizuki-nlp/semantic_specialization_for_wsd) repository.

## Trained Model (Projection Heads)
* File: checkpoints/baseline/last.ckpt
* This is one of the trained models used for reporting the main results (Table 2 in [Mizuki and Okazaki, EACL2023]).  
  NOTE: Five runs were performed in total.
* The main hyperparameters used for training are as follows:

| Argument name                                                  | Value                      | Description                                                                        |
|----------------------------------------------------------------|----------------------------|------------------------------------------------------------------------------------|
| max_epochs                                                     | 15                         | Maximum number of training epochs                                                  |
| cfg_similarity_class.temperature ($\beta^{-1}$)                | 0.015625 (=1/64)           | Temperature parameter for the contrastive loss                                     |
| batch_size ($N_B$)                                             | 256                        | Number of samples in each batch for the attract-repel and self-training objectives |
| coef_max_pool_margin_loss ($\alpha$)                           | 0.2                        | Coefficient for the self-training loss                                             |
| cfg_gloss_projection_head.n_layer                              | 2                          | Number of FFNN layers for the projection heads                                     |
| cfg_gloss_projection_head.max_l2_norm_ratio ($\epsilon$)       | 0.015                      | Hyperparameter for the distance constraint integrated in the projection heads      |

## Sense/context embeddings
* Directory: `data/bert_embeddings/`
* Sense embeddings: `bert-large-cased_WordNet_Gloss_Corpus.hdf5`
* Context embeddings for the self-training objective: `bert-large-cased_SemCor.hdf5`
* Context embeddings for evaluating the WSD task: `bert-large-cased_WSDEval-ALL.hdf5`

# Reference

```
@inproceedings{Mizuki:EACL2023,
    title     = "Semantic Specialization for Knowledge-based Word Sense Disambiguation",
    author    = "Mizuki, Sakae and Okazaki, Naoaki",
    booktitle = "Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume",
    series = {EACL},
    month = may,
    year = "2023",
    address = "Dubrovnik, Croatia",
    publisher = "Association for Computational Linguistics",
    pages = "3449--3462",
}
```

* [arXiv version](https://arxiv.org/abs/2304.11340) is also available.