Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,12 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
# GottBERT: A pure German language model
|
2 |
|
3 |
GottBERT is the first German-only RoBERTa model, pre-trained on the German portion of the first released OSCAR dataset. This model aims to provide enhanced natural language processing (NLP) performance for the German language across various tasks, including Named Entity Recognition (NER), text classification, and natural language inference (NLI). GottBERT has been developed in two versions: a **base model** and a **large model**, tailored specifically for German-language tasks.
|
@@ -49,30 +58,28 @@ Mertics:
|
|
49 |
|
50 |
|
51 |
Details:
|
52 |
-
- If nothing statetd the best checkpoint is referred based on perplexity. $\text{†}$ denotes last checkpoint at 100k optimization steps.
|
53 |
-
- The model from our [pre-print](https://arxiv.org/abs/2012.02110v1) was moved from uklfr/gottbert-base to [tum/gottbert_base_last]().
|
54 |
-
- $\mathrm{f}$ stands for filtered and marks the models trained on the filtered oscar portion.
|
55 |
- **bold** values indicate the best performing model within one architecure (base, large), <ins>undescored</ins> values the second best.
|
56 |
|
57 |
|
58 |
-
| Model
|
59 |
-
|
60 |
-
|
|
61 |
-
|
|
62 |
-
|
|
63 |
-
|
|
64 |
-
|
|
65 |
-
|
|
66 |
-
|
|
67 |
-
|
|
68 |
-
|
|
69 |
-
|
|
70 |
-
|
|
71 |
-
|
|
72 |
-
|
|
73 |
-
|
|
74 |
-
|
|
75 |
-
|
|
|
|
76 |
|
77 |
## Model Architecture
|
78 |
- **Base Model**: 12 layers, 125M parameters, 52k token vocabulary.
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
language:
|
4 |
+
- de
|
5 |
+
tags:
|
6 |
+
- RoBERTa
|
7 |
+
- GottBERT
|
8 |
+
- BERT
|
9 |
+
---
|
10 |
# GottBERT: A pure German language model
|
11 |
|
12 |
GottBERT is the first German-only RoBERTa model, pre-trained on the German portion of the first released OSCAR dataset. This model aims to provide enhanced natural language processing (NLP) performance for the German language across various tasks, including Named Entity Recognition (NER), text classification, and natural language inference (NLI). GottBERT has been developed in two versions: a **base model** and a **large model**, tailored specifically for German-language tasks.
|
|
|
58 |
|
59 |
|
60 |
Details:
|
|
|
|
|
|
|
61 |
- **bold** values indicate the best performing model within one architecure (base, large), <ins>undescored</ins> values the second best.
|
62 |
|
63 |
|
64 |
+
| Model | Accuracy NLI | GermEval\_14 F1 | CoNLL F1 | Coarse F1 | Fine F1 | 10kGNAD F1 |
|
65 |
+
|-------------------------------------|--------------|----------------|----------|-----------|---------|------------|
|
66 |
+
| [GottBERT_base_best](https://huggingface.co/TUM/GottBERT_base_best) | 80.82 | 87.55 | <ins>85.93</ins> | 78.17 | 53.30 | 89.64 |
|
67 |
+
| [GottBERT_base_last](https://huggingface.co/TUM/GottBERT_base_last) | 81.04 | 87.48 | 85.61 | <ins>78.18</ins> | **53.92** | 90.27 |
|
68 |
+
| [GottBERT_filtered_base_best](https://huggingface.co/TUM/GottBERT_filtered_base_best) | 80.56 | <ins>87.57</ins> | **86.14** | **78.65** | 52.82 | 89.79 |
|
69 |
+
| [GottBERT_filtered_base_last](https://huggingface.co/TUM/GottBERT_filtered_base_last) | 80.74 | **87.59** | 85.66 | 78.08 | 52.39 | 89.92 |
|
70 |
+
| GELECTRA_base | **81.70** | 86.91 | 85.37 | 77.26 | 50.07 | 89.02 |
|
71 |
+
| GBERT_base | 80.06 | 87.24 | 85.16 | 77.37 | 51.51 | **90.30** |
|
72 |
+
| dbmdzBERT | 68.12 | 86.82 | 85.15 | 77.46 | 52.07 | **90.34** |
|
73 |
+
| GermanBERT | 78.16 | 86.53 | 83.87 | 74.81 | 47.78 | 90.18 |
|
74 |
+
| XLM-R_base | 79.76 | 86.14 | 84.46 | 77.13 | 50.54 | 89.81 |
|
75 |
+
| mBERT | 77.03 | 86.67 | 83.18 | 73.54 | 48.32 | 88.90 |
|
76 |
+
| [GottBERT_large](https://huggingface.co/TUM/GottBERT_large) | 82.46 | 88.20 | <ins>86.78</ins> | 79.40 | 54.61 | 90.24 |
|
77 |
+
| [GottBERT_filtered_large_best](https://huggingface.co/TUM/GottBERT_filtered_large_best) | 83.31 | 88.13 | 86.30 | 79.32 | 54.70 | 90.31 |
|
78 |
+
| [GottBERT_filtered_large_last](https://huggingface.co/TUM/GottBERT_filtered_large_last) | 82.79 | <ins>88.27</ins> | 86.28 | 78.96 | 54.72 | 90.17 |
|
79 |
+
| GELECTRA_large | **86.33** | <ins>88.72</ins> | <ins>86.78</ins> | **81.28** | <ins>56.17</ins> | **90.97** |
|
80 |
+
| GBERT_large | <ins>84.21</ins> | <ins>88.72</ins> | **87.19** | <ins>80.84</ins> | **57.37** | <ins>90.74</ins> |
|
81 |
+
| XLM-R_large | 84.07 | **88.83** | 86.54 | 79.05 | 55.06 | 90.17 |
|
82 |
+
|
83 |
|
84 |
## Model Architecture
|
85 |
- **Base Model**: 12 layers, 125M parameters, 52k token vocabulary.
|