File size: 2,910 Bytes
daaeeaa 8a7715f daaeeaa 8a7715f 0eaba94 06ea01c 0eaba94 06ea01c 0eaba94 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 |
---
language:
- mt
datasets:
- MLRS/korpus_malti
model-index:
- name: BERTu
results:
- task:
type: dependency-parsing
name: Dependency Parsing
dataset:
type: universal_dependencies
args: mt_mudt
name: Maltese Universal Dependencies Treebank (MUDT)
metrics:
- type: uas
value: 92.31
name: Unlabelled Attachment Score
- type: las
value: 88.14
name: Labelled Attachment Score
- task:
type: part-of-speech-tagging
name: Part-of-Speech Tagging
dataset:
type: mlrs_pos
name: MLRS POS dataset
metrics:
- type: accuracy
value: 98.58
name: UPOS Accuracy
args: upos
- type: accuracy
value: 98.54
name: XPOS Accuracy
args: xpos
- task:
type: named-entity-recognition
name: Named Entity Recognition
dataset:
type: wikiann
name: WikiAnn (Maltese)
args: mt
metrics:
- type: f1
args: span
value: 86.77
name: Span-based F1
- task:
type: sentiment-analysis
name: Sentiment Analysis
dataset:
type: mt-sentiment-analysis
name: Maltese Sentiment Analysis Dataset
metrics:
- type: f1
args: macro
value: 78.96
name: Macro-averaged F1
license: cc-by-nc-sa-4.0
widget:
- text: "Malta hija gżira fil-[MASK]."
---
# BERTu
A Maltese monolingual model pre-trained from scratch on the Korpus Malti v4.0 using the BERT (base) architecture.
## License
This work is licensed under a
[Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License][cc-by-nc-sa].
Permissions beyond the scope of this license may be available at [https://mlrs.research.um.edu.mt/](https://mlrs.research.um.edu.mt/).
[![CC BY-NC-SA 4.0][cc-by-nc-sa-image]][cc-by-nc-sa]
[cc-by-nc-sa]: http://creativecommons.org/licenses/by-nc-sa/4.0/
[cc-by-nc-sa-image]: https://licensebuttons.net/l/by-nc-sa/4.0/88x31.png
## Citation
This work was first presented in [Pre-training Data Quality and Quantity for a Low-Resource Language: New Corpus and BERT Models for Maltese](https://aclanthology.org/2022.deeplo-1.10/).
Cite it as follows:
```bibtex
@inproceedings{BERTu,
title = "Pre-training Data Quality and Quantity for a Low-Resource Language: New Corpus and {BERT} Models for {M}altese",
author = "Micallef, Kurt and
Gatt, Albert and
Tanti, Marc and
van der Plas, Lonneke and
Borg, Claudia",
booktitle = "Proceedings of the Third Workshop on Deep Learning for Low-Resource Natural Language Processing",
month = jul,
year = "2022",
address = "Hybrid",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.deeplo-1.10",
doi = "10.18653/v1/2022.deeplo-1.10",
pages = "90--101",
}
```
|