amindada commited on
Commit
e22d013
·
1 Parent(s): 75aa218

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +39 -0
README.md ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ # For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
3
+ # Doc / guide: https://huggingface.co/docs/hub/model-cards
4
+ {}
5
+ ---
6
+
7
+ # Model Card for Model ID
8
+
9
+ <!-- Provide a quick summary of what the model is/does. -->
10
+ Developed in a joint effort between the University of Florida, NVIDIA, and IKIM, GeBERTa is a series of German DeBERTa models ranging between 122M and 750M
11
+ parameters. The pre-training dataset consists of documents from different domains:
12
+
13
+ | Category | Source Data | Data Size | #Docs | #Tokens |
14
+ | -------- | ----------- | --------- | ------ | ------- |
15
+ | Formal | Wikipedia | 9GB | 2,665,357 | 1.9B |
16
+ | Formal | News | 28GB | 12,305,326 | 6.1B |
17
+ | Formal | GC4 | 90GB | 31,669,772 | 19.4B |
18
+ | Informal | Reddit 2019-2023 (GER) | 5.8GB | 15,036,592 | 1.3B |
19
+ | Informal | Holiday Reviews | 2GB | 4,876,405 | 428M |
20
+ | Legal | OpenLegalData: German cases and laws | 5.4GB | 308,228 | 1B |
21
+ | Medical | Charite doctoral theses abstracts | 28MB | 16,947 | 5M |
22
+ | Medical | Flexikon | 106MB | 74,136 | 23M |
23
+ | Medical | NTS of Animal Experiments | 24MB | 50,310 | 4M |
24
+ | Medical | German Guideline Program in Oncology | 13MB | 4,348 | 3M |
25
+ | Medical | Springer Abstract | 79MB | 34,035 | 15M |
26
+ | Medical | CC medical texts (GER) | 3.6GB | 2,000,000 | 682M |
27
+ | Medical | Medicine Dissertations | 1.4GB | 14,496 | 295M |
28
+ | Medical | Pubmed abstracts | 8.5GB | 21,044,382 | 1.7B |
29
+ | Medical | MIMIC III | 2.6GB | 24,221,834 | 695M |
30
+ | Medical | PMC-Patients-ReCDS | 2.1GB | 1,743,344 | 414M |
31
+ | Literature | German Fiction | 1.1GB | 3,219 | 243M |
32
+ | Literature | English books | 7.1GB | 11,038 | 1.6B |
33
+ | - | Total | 167GB | 116,079,769 | 35.8B |
34
+
35
+
36
+
37
+
38
+
39
+