XLM-RoBERTa (base) Middle High German Charter Masked Language Model
This model is a fine-tuned version of xlm-roberta-base on Middle High German (gmh; ISO 639-2; c. 1050–1500) charters of the monasterium.net data set.
Model description
Please refer this model together with to the XLM-RoBERTa (base-sized model) card or the paper Unsupervised Cross-lingual Representation Learning at Scale by Conneau et al. for additional information.
Intended uses & limitations
This model can be used for sequence prediction tasks, i.e., fill-masks.
Training and evaluation data
The model was fine-tuned using the Middle High German Monasterium charters. It was trained on a Tesla V100-SXM2-16GB GPU.
Training hyperparameters
The following hyperparameters were used during training:
- num_train_epochs: 15
- learning_rate: 2e-5
- weight-decay: 0,01
- train_batch_size: 16
- eval_batch_size: 16
- num_proc: 4
- block_size: 256
Training results
Epoch | Training Loss | Validation Loss |
---|---|---|
1 | 2.423800 | 2.025645 |
2 | 1.876500 | 1.700380 |
3 | 1.702100 | 1.565900 |
4 | 1.582400 | 1.461868 |
5 | 1.506000 | 1.393849 |
6 | 1.407300 | 1.359359 |
7 | 1.385400 | 1.317869 |
8 | 1.336700 | 1.285630 |
9 | 1.301300 | 1.246812 |
10 | 1.273500 | 1.219290 |
11 | 1.245600 | 1.198312 |
12 | 1.225800 | 1.198695 |
13 | 1.214100 | 1.194895 |
14 | 1.209500 | 1.177452 |
15 | 1.200300 | 1.177396 |
Perplexity: 3.25
Updates
- 2023-03-30: Upload
Citation
Please cite the following papers when using this model.
@misc{xlm-roberta-base-mhg-charter-mlm,
title={xlm-roberta-base-mhg-charter-mlm},
author={Atzenhofer-Baumgartner, Florian},
year = { 2023 },
url = { https://huggingface.co/atzenhofer/xlm-roberta-base-mhg-charter-mlm },
publisher = { Hugging Face }
}
- Downloads last month
- 19
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.