igorsterner commited on
Commit
cd3bc7e
1 Parent(s): f852680

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +59 -0
README.md ADDED
@@ -0,0 +1,59 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - multilingual
4
+ - en
5
+ - de
6
+ license: mit
7
+ widget:
8
+ - text: "ich glaub ich muss echt rewatchen like i [MASK] so empty was soll ich denn jetzt machen"
9
+ example_title: "Example 1"
10
+ - text: "I don't get [MASK] er damit erreichen will."
11
+ example_title: "Example 2"
12
+ - text: "Sagt ein(e) Head(in) [MASK] research! Researchen Sie mal ein bisschen mehr."
13
+ example_title: "Example 3"
14
+ ---
15
+
16
+ # German-English Code-Switching BERT
17
+
18
+ A BERT-based model trained with masked language modelling on a large corpus of German--English code-switching. It was introduced in [this paper](). This model is case sensitive.
19
+
20
+ ## Overview
21
+ - **Initializd language model:** bert-base-multilingual-cased
22
+ - **Training data:** The TongueSwitcher Corpus
23
+ - **Infrastructure**: 4x Nvidia A100 GPUs
24
+ - **Published**: 16 October 2023
25
+
26
+ ## Hyperparameters
27
+
28
+ ```
29
+ batch_size = 32
30
+ n_steps = 191,950
31
+ max_seq_len = 512
32
+ learning_rate = 1e-4
33
+ weight_decay = 0.01
34
+ Adam beta = (0.9, 0.999)
35
+ lr_schedule = LinearWarmup
36
+ num_warmup_steps = 10,000
37
+ seed = 2021
38
+ ```
39
+
40
+ ## Performance
41
+
42
+ During training we monitored the evaluation loss on the TongueSwitcher dev set.
43
+
44
+ ![dev loss](loss.png)
45
+
46
+ ## Authors
47
+ - Igor Sterner: `is473 [at] cam.ac.uk`
48
+ - Simone Teufel: `sht25 [at] cam.ac.uk`
49
+
50
+ ### BibTeX entry and citation info
51
+
52
+ ```bibtex
53
+ @inproceedings{sterner2023tongueswitcher,
54
+ author = {Igor Sterner and Simone Teufel},
55
+ title = {TongueSwitcher: Fine-Grained Identification of German-English Code-Switching},
56
+ booktitle = {Sixth Workshop on Computational Approaches to Linguistic Code-Switching},
57
+ year = {2023},
58
+ }
59
+ ```