|
--- |
|
language: |
|
- mt |
|
datasets: |
|
- MLRS/korpus_malti |
|
model-index: |
|
- name: mBERTu |
|
results: |
|
- task: |
|
type: dependency-parsing |
|
name: Dependency Parsing |
|
dataset: |
|
type: universal_dependencies |
|
args: mt_mudt |
|
name: Maltese Universal Dependencies Treebank (MUDT) |
|
metrics: |
|
- type: uas |
|
value: 92.10 |
|
name: Unlabelled Attachment Score |
|
- type: las |
|
value: 87.87 |
|
name: Labelled Attachment Score |
|
- task: |
|
type: part-of-speech-tagging |
|
name: Part-of-Speech Tagging |
|
dataset: |
|
type: mlrs_pos |
|
name: MLRS POS dataset |
|
metrics: |
|
- type: accuracy |
|
value: 98.66 |
|
name: UPOS Accuracy |
|
args: upos |
|
- type: accuracy |
|
value: 98.58 |
|
name: XPOS Accuracy |
|
args: xpos |
|
- task: |
|
type: named-entity-recognition |
|
name: Named Entity Recognition |
|
dataset: |
|
type: wikiann |
|
name: WikiAnn (Maltese) |
|
args: mt |
|
metrics: |
|
- type: f1 |
|
args: span |
|
value: 86.60 |
|
name: Span-based F1 |
|
- task: |
|
type: sentiment-analysis |
|
name: Sentiment Analysis |
|
dataset: |
|
type: mt-sentiment-analysis |
|
name: Maltese Sentiment Analysis Dataset |
|
metrics: |
|
- type: f1 |
|
args: macro |
|
value: 76.79 |
|
name: Macro-averaged F1 |
|
license: cc-by-nc-sa-4.0 |
|
widget: |
|
- text: "Malta huwa pajjiż fl-[MASK]." |
|
--- |
|
|
|
# mBERTu |
|
|
|
A Maltese multilingual model pre-trained on the Korpus Malti v4.0 using multilingual BERT as the initial checkpoint. |
|
|
|
|
|
## License |
|
|
|
This work is licensed under a |
|
[Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License][cc-by-nc-sa]. |
|
Permissions beyond the scope of this license may be available at [https://mlrs.research.um.edu.mt/](https://mlrs.research.um.edu.mt/). |
|
|
|
[![CC BY-NC-SA 4.0][cc-by-nc-sa-image]][cc-by-nc-sa] |
|
|
|
[cc-by-nc-sa]: http://creativecommons.org/licenses/by-nc-sa/4.0/ |
|
[cc-by-nc-sa-image]: https://licensebuttons.net/l/by-nc-sa/4.0/88x31.png |
|
|
|
## Citation |
|
|
|
This work was first presented in [Pre-training Data Quality and Quantity for a Low-Resource Language: New Corpus and BERT Models for Maltese](https://arxiv.org/abs/2205.10517). |
|
Cite it as follows: |
|
|
|
```bibtex |
|
@inproceedings{BERTu, |
|
title = {Pre-training Data Quality and Quantity for a Low-Resource Language: New Corpus and {BERT} Models for {M}altese}, |
|
author = {Micallef, Kurt and |
|
Gatt, Albert and |
|
Tanti, Marc and |
|
van der Plas, Lonneke and |
|
Borg, Claudia}, |
|
booktitle = {Proceedings of the 3rd Workshop on Deep Learning for Low-Resource NLP (DeepLo 2022)}, |
|
day = {14}, |
|
month = {07}, |
|
year = {2022}, |
|
address = {Seattle, Washington}, |
|
publisher = {Association for Computational Linguistics}, |
|
} |
|
``` |
|
|