German-English Code-Switching Identification

The Tongueswitcher BERT model finetuned for German-English identification. It was introduced in this paper. This model is case sensitive.

Overview

  • Initialized language model: german-english-code-switching-bert
  • Training data: The Denglish Corpus
  • Infrastructure: 1x Nvidia A100 GPU
  • Published: 16 October 2023

Hyperparameters

batch_size = 16
epochs = 3
n_steps = 789
max_seq_len = 512
learning_rate = 3e-5
weight_decay = 0.01
seed = 2021

Authors

  • Igor Sterner: is473 [at] cam.ac.uk
  • Simone Teufel: sht25 [at] cam.ac.uk

BibTeX entry and citation info

@inproceedings{sterner2023tongueswitcher,
  author    = {Igor Sterner and Simone Teufel},
  title     = {TongueSwitcher: Fine-Grained Identification of German-English Code-Switching},
  booktitle = {Sixth Workshop on Computational Approaches to Linguistic Code-Switching},
  publisher = {Empirical Methods in Natural Language Processing},
  year      = {2023},
}
Downloads last month
15
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.