scheiblr commited on
Commit
1ba3dea
1 Parent(s): 2e09651

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +28 -21
README.md CHANGED
@@ -1,3 +1,12 @@
 
 
 
 
 
 
 
 
 
1
  # GottBERT: A pure German language model
2
 
3
  GottBERT is the first German-only RoBERTa model, pre-trained on the German portion of the first released OSCAR dataset. This model aims to provide enhanced natural language processing (NLP) performance for the German language across various tasks, including Named Entity Recognition (NER), text classification, and natural language inference (NLI). GottBERT has been developed in two versions: a **base model** and a **large model**, tailored specifically for German-language tasks.
@@ -49,30 +58,28 @@ Mertics:
49
 
50
 
51
  Details:
52
- - If nothing statetd the best checkpoint is referred based on perplexity. $\text{†}$ denotes last checkpoint at 100k optimization steps.
53
- - The model from our [pre-print](https://arxiv.org/abs/2012.02110v1) was moved from uklfr/gottbert-base to [tum/gottbert_base_last]().
54
- - $\mathrm{f}$ stands for filtered and marks the models trained on the filtered oscar portion.
55
  - **bold** values indicate the best performing model within one architecure (base, large), <ins>undescored</ins> values the second best.
56
 
57
 
58
- | Model | NLI | GermEval 14 | CoNLL | GermEval 2018 Coarse | GermEval 2018 Fine | 10kGNAD |
59
- |--------------------------------------------|--------------|-----------------|----------|-----------|---------|------------|
60
- | $\mathrm{GottBERT}_{\mathrm{base}}$ | 80.82 | 87.55 | <ins>85.93</ins> | 78.17 | 53.30 | 89.64 |
61
- | $\mathrm{GottBERT}_{\mathrm{base}}^{\text{†}}$ | 81.04 | 87.48 | 85.61 | <ins>78.18</ins> | **53.92** | 90.27 |
62
- | $^{\mathrm{f}}\mathrm{GottBERT}_{\mathrm{base}}$ | 80.56 | <ins>87.57</ins> | **86.14** | **78.65** | 52.82 | 89.79 |
63
- | $^{\mathrm{f}}\mathrm{GottBERT}_{\mathrm{base}}^{\text{†}}$ | 80.74 | **87.59** | 85.66 | 78.08 | 52.39 | 89.92 |
64
- | $\mathrm{GELECTRA}_{\mathrm{base}}$ | **81.70** | 86.91 | 85.37 | 77.26 | 50.07 | 89.02 |
65
- | $\mathrm{GBERT}_{\mathrm{base}}$ | 80.06 | 87.24 | 85.16 | 77.37 | 51.51 | **90.30** |
66
- | $\mathrm{dbmdzBERT}$ | 68.12 | 86.82 | 85.15 | 77.46 | 52.07 | **90.34** |
67
- | $\mathrm{GermanBERT}$ | 78.16 | 86.53 | 83.87 | 74.81 | 47.78 | 90.18 |
68
- | $\mathrm{XLM\text{-}R}_{\mathrm{base}}$ | 79.76 | 86.14 | 84.46 | 77.13 | 50.54 | 89.81 |
69
- | $\mathrm{mBERT}$ | 77.03 | 86.67 | 83.18 | 73.54 | 48.32 | 88.90 |
70
- | $\mathrm{GottBERT}_{\mathrm{large}}$ | 82.46 | 88.20 | <ins>86.78</ins> | 79.40 | 54.61 | 90.24 |
71
- | $^{\mathrm{f}}\mathrm{GottBERT}_{\mathrm{large}}$ | 83.31 | 88.13 | 86.30 | 79.32 | 54.70 | 90.31 |
72
- | $\mathrm{GottBERT}_{\mathrm{large}}^{\text{†}}$ | 82.79 | <ins>88.27</ins> | 86.28 | 78.96 | 54.72 | 90.17 |
73
- | $\mathrm{GELECTRA}_{\mathrm{large}}$ | **86.33** | <ins>88.72</ins> | <ins>86.78</ins> | **81.28** | <ins>56.17</ins> | **90.97** |
74
- | $\mathrm{GBERT}_{\mathrm{large}}$ | <ins>84.21</ins> | <ins>88.72</ins> | **87.19** | <ins>80.84</ins> | **57.37** | <ins>90.74</ins> |
75
- | $\mathrm{XLM\text{-}R}_{\mathrm{large}}$ | 84.07 | **88.83** | 86.54 | 79.05 | 55.06 | 90.17 |
 
76
 
77
  ## Model Architecture
78
  - **Base Model**: 12 layers, 125M parameters, 52k token vocabulary.
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - de
5
+ tags:
6
+ - RoBERTa
7
+ - GottBERT
8
+ - BERT
9
+ ---
10
  # GottBERT: A pure German language model
11
 
12
  GottBERT is the first German-only RoBERTa model, pre-trained on the German portion of the first released OSCAR dataset. This model aims to provide enhanced natural language processing (NLP) performance for the German language across various tasks, including Named Entity Recognition (NER), text classification, and natural language inference (NLI). GottBERT has been developed in two versions: a **base model** and a **large model**, tailored specifically for German-language tasks.
 
58
 
59
 
60
  Details:
 
 
 
61
  - **bold** values indicate the best performing model within one architecure (base, large), <ins>undescored</ins> values the second best.
62
 
63
 
64
+ | Model | Accuracy NLI | GermEval\_14 F1 | CoNLL F1 | Coarse F1 | Fine F1 | 10kGNAD F1 |
65
+ |-------------------------------------|--------------|----------------|----------|-----------|---------|------------|
66
+ | [GottBERT_base_best](https://huggingface.co/TUM/GottBERT_base_best) | 80.82 | 87.55 | <ins>85.93</ins> | 78.17 | 53.30 | 89.64 |
67
+ | [GottBERT_base_last](https://huggingface.co/TUM/GottBERT_base_last) | 81.04 | 87.48 | 85.61 | <ins>78.18</ins> | **53.92** | 90.27 |
68
+ | [GottBERT_filtered_base_best](https://huggingface.co/TUM/GottBERT_filtered_base_best) | 80.56 | <ins>87.57</ins> | **86.14** | **78.65** | 52.82 | 89.79 |
69
+ | [GottBERT_filtered_base_last](https://huggingface.co/TUM/GottBERT_filtered_base_last) | 80.74 | **87.59** | 85.66 | 78.08 | 52.39 | 89.92 |
70
+ | GELECTRA_base | **81.70** | 86.91 | 85.37 | 77.26 | 50.07 | 89.02 |
71
+ | GBERT_base | 80.06 | 87.24 | 85.16 | 77.37 | 51.51 | **90.30** |
72
+ | dbmdzBERT | 68.12 | 86.82 | 85.15 | 77.46 | 52.07 | **90.34** |
73
+ | GermanBERT | 78.16 | 86.53 | 83.87 | 74.81 | 47.78 | 90.18 |
74
+ | XLM-R_base | 79.76 | 86.14 | 84.46 | 77.13 | 50.54 | 89.81 |
75
+ | mBERT | 77.03 | 86.67 | 83.18 | 73.54 | 48.32 | 88.90 |
76
+ | [GottBERT_large](https://huggingface.co/TUM/GottBERT_large) | 82.46 | 88.20 | <ins>86.78</ins> | 79.40 | 54.61 | 90.24 |
77
+ | [GottBERT_filtered_large_best](https://huggingface.co/TUM/GottBERT_filtered_large_best) | 83.31 | 88.13 | 86.30 | 79.32 | 54.70 | 90.31 |
78
+ | [GottBERT_filtered_large_last](https://huggingface.co/TUM/GottBERT_filtered_large_last) | 82.79 | <ins>88.27</ins> | 86.28 | 78.96 | 54.72 | 90.17 |
79
+ | GELECTRA_large | **86.33** | <ins>88.72</ins> | <ins>86.78</ins> | **81.28** | <ins>56.17</ins> | **90.97** |
80
+ | GBERT_large | <ins>84.21</ins> | <ins>88.72</ins> | **87.19** | <ins>80.84</ins> | **57.37** | <ins>90.74</ins> |
81
+ | XLM-R_large | 84.07 | **88.83** | 86.54 | 79.05 | 55.06 | 90.17 |
82
+
83
 
84
  ## Model Architecture
85
  - **Base Model**: 12 layers, 125M parameters, 52k token vocabulary.