ikim-uk-essen
/

geberta-base

Fill-Mask

Transformers

PyTorch

Safetensors

deberta-v2

Model card Files Files and versions Community

amindada commited on Jul 21, 2023

Commit

77a04ac

1 Parent(s): eda3324

Update README.md

Browse files

Files changed (1) hide show

README.md +39 -2

README.md CHANGED Viewed

@@ -4,11 +4,23 @@
 {}
 ---
-# Model Card for Model ID
 <!-- Provide a quick summary of what the model is/does. -->
 GeBERTa is a set of German DeBERTa models developed in a joint effort between the University of Florida, NVIDIA, and IKIM.
-The models range in size from 122M to 750M parameters. The pre-training dataset consists of documents from different domains:
 | Domain | Dataset | Data Size | #Docs | #Tokens |
 | -------- | ----------- | --------- | ------ | ------- |
@@ -29,7 +41,32 @@ The models range in size from 122M to 750M parameters. The pre-training dataset
 | - | Total | 167GB | 116,079,769 | 35.8B |

 {}
 ---
+# GeBERTa
 <!-- Provide a quick summary of what the model is/does. -->
 GeBERTa is a set of German DeBERTa models developed in a joint effort between the University of Florida, NVIDIA, and IKIM.
+The models range in size from 122M to 750M parameters.
+## Model details
+The models follow the architecture of DeBERTa-v2 and make use of sentence piece tokenizers. The base and large models use a 50k token vocabulary,
+while the large model uses a 128k token vocabulary. All models were trained with a batch size of 2k for a maximum of 1 million steps
+and have a maximum sequence length of 512 tokens.
+## Dataset
+The pre-training dataset consists of documents from different domains:
 | Domain | Dataset | Data Size | #Docs | #Tokens |
 | -------- | ----------- | --------- | ------ | ------- |
 | - | Total | 167GB | 116,079,769 | 35.8B |
+## Benchmark
+In a comprehensive benchmark, we evaluated existing German models and our own. The benchmark included a variety of task types, such as question answering,
+classification, and named entity recognition (NER). In addition, we introduced a new task focused on hate speech detection, using two existing datasets.
+When the datasets provided training, development, and test sets, we used them accordingly.
+We randomly split the data into 80% for training, 10% for validation, and 10% for test in cases where such sets were not available.
+The following table presents the F1 scores:
+|         Model         |   [GE14](https://huggingface.co/datasets/germeval_14)   |  [GQuAD](https://huggingface.co/datasets/deepset/germanquad)  |   [GE18](https://huggingface.co/datasets/philschmid/germeval18)   |    TS    |   [GGP](https://github.com/JULIELab/GGPOnc)   |  GRAS<sup>1</sup>  |    [JS](https://github.com/JULIELab/jsyncc)    |  [DROC](https://gitlab2.informatik.uni-wuerzburg.de/kallimachos/DROC-Release)  |  Avg   |
+|:---------------------:|:--------:|:----------:|:--------:|:--------:|:-------:|:------:|:--------:|:------:|:------:|
+|     gbert-base        | 87.10±0.12 | 72.19±0.82 | 51.27±1.4 | 72.34±0.48 | 78.17±0.25 | 62.90±0.01 | 77.18±3.34 | 88.03±0.20 | 73.65±0.50 |
+|   gelectra-base   | 86.19±0.5 | 74.09±0.70 | 48.02±1.80 | 70.62±0.44 | 77.53±0.11 | 65.97±0.01 | 71.17±2.94 | 88.06±0.37 | 72.71±0.66 |
+|  gottbert  | 87.15±0.19 | 72.76±0.378 | 51.12±1.20 | 74.25±0.80 | **78.18**±0.11 | 65.71±0.01 | 74.60±4.75 | 88.61±0.23 | 74.05±0.51 |
+| geberta-base | **88.06**±0.22 | **78.54**±0.32 | **53.16**±1.39 | **74.83**±0.36 | 78.13±0.15 | **68.37**±1.11 | **81.85**±5.23 | **89.14**±0.32 | **76.51**±0.32 |
+<sup>1</sup>Is not published yet but described in the [MedBERT.de paper](https://arxiv.org/abs/2303.08179).
+## Publication
+The publication is following soon.
+## Contact
+<[email protected]>