Edit model card

XLM-RoBERTa (base) Middle High German Charter Masked Language Model

This model is a fine-tuned version of xlm-roberta-base on Middle High German (gmh; ISO 639-2; c. 1050–1500) charters of the monasterium.net data set.

Model description

Please refer this model together with to the XLM-RoBERTa (base-sized model) card or the paper Unsupervised Cross-lingual Representation Learning at Scale by Conneau et al. for additional information.

Intended uses & limitations

This model can be used for sequence prediction tasks, i.e., fill-masks.

Training and evaluation data

The model was fine-tuned using the Middle High German Monasterium charters. It was trained on a Tesla V100-SXM2-16GB GPU.

Training hyperparameters

The following hyperparameters were used during training:

  • num_train_epochs: 15
  • learning_rate: 2e-5
  • weight-decay: 0,01
  • train_batch_size: 16
  • eval_batch_size: 16
  • num_proc: 4
  • block_size: 256

Training results

Epoch Training Loss Validation Loss
1 2.423800 2.025645
2 1.876500 1.700380
3 1.702100 1.565900
4 1.582400 1.461868
5 1.506000 1.393849
6 1.407300 1.359359
7 1.385400 1.317869
8 1.336700 1.285630
9 1.301300 1.246812
10 1.273500 1.219290
11 1.245600 1.198312
12 1.225800 1.198695
13 1.214100 1.194895
14 1.209500 1.177452
15 1.200300 1.177396

Perplexity: 3.25

Updates

  • 2023-03-30: Upload

Citation

Please cite the following papers when using this model.

@misc{xlm-roberta-base-mhg-charter-mlm,
  title={xlm-roberta-base-mhg-charter-mlm},
  author={Atzenhofer-Baumgartner, Florian},
  year         = { 2023 },
  url          = { https://huggingface.co/atzenhofer/xlm-roberta-base-mhg-charter-mlm },
  publisher    = { Hugging Face }
}
Downloads last month
19
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.