etzion commited on
Commit
331ca5a
verified
1 Parent(s): 2b2e5c5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -28
README.md CHANGED
@@ -30,33 +30,6 @@ Here you can find a description on each of our models. Each row contains the mod
30
  | **DICTA_smart_NER** | Training the [DICTA-ner](https://huggingface.co/dicta-il/dictabert-ner) model on IAHALT + Name-Sentences[1] + Entity-Injection[2]] [dataset](https://huggingface.co/datasets/FusioNER/Smart_Injection) | [FusioNER/DICTA_Smart](https://huggingface.co/FusioNER/DICTA_Smart) | IAHALT | [FusioNER/Smart_Injection](https://huggingface.co/datasets/FusioNER/Smart_Injection) | [DICTA-ner](https://huggingface.co/dicta-il/dictabert-ner) | classic[4]
31
  | **DICTA_Large_Smart** | Training the [DICTA Large](https://huggingface.co/dicta-il/dictabert-large) model on IAHALT + Name-Sentences[1] + Entity-Injection[2]] [dataset](https://huggingface.co/datasets/FusioNER/Smart_Injection) | [FusioNER/Dicta_Large_Smart](https://huggingface.co/FusioNER/Dicta_Large_Smart) | IAHALT | [FusioNER/Smart_Injection](https://huggingface.co/datasets/FusioNER/Smart_Injection) | [DICTA Large](https://huggingface.co/dicta-il/dictabert-large) | classic[4]
32
  | **TEC_NER** | Basic technology NER model | model path | TEC_NER | https://huggingface.co/datasets/FusioNER/tec_ner/tree/main | base model | technology
33
- [1] **Name-Sentences**: Adding to the corpus sentences that contain only the entity we want the network to learn.
34
-
35
- [2] **Entity-Injection**: Replace a tagged entity in the original corpus with a new entity. By using, this method, the model can learn new entities (not labels!) which the model not extracted before.
36
-
37
- [3] **BI-BI Problem**: Building training corpus when entities from the same type appear in sequence, labeled as continuations of one another.
38
-
39
- For example, the text "讛讗专讬 驻讜讟专 讜专讜谉 讜讜讬讝诇讬" would tagged as **SINGLE** entity. That problem prevent the model to extract entities correctly.
40
-
41
- [4] **Classic**: The classic NER types:
42
-
43
- | entity type | full name | examples |
44
- |:-----------:|:----------| --------:|
45
- | **PER** | Person | 讗讚讜诇祝 讛讬讟诇专, 专讜讚讜诇祝 讛住, 诪专讚讻讬 讗谞讬诇讘讬抓 |
46
- | **GPE** | Geopolitical Entity | 讙专诪谞讬讛, 驻讜诇讬谉, 讘专诇讬谉, 讜讜专砖讛 |
47
- | **LOC** | Location | 诪讝专讞 讗讬专讜驻讛, 讗讙谉 讛讬诐 讛转讬讻讜谉, 讛讙诇讬诇 |
48
- | **FAC** | Facility | 讗讜讜砖讜讜讬抓, 诪讙讚诇讬 讛转讗讜诪讬诐, 谞转讘"讙 2000, 专讞讜讘 拽驻诇谉 |
49
- | **ORG** | Organization | 讛诪驻诇讙讛 讛谞讗爪讬转, 讞讘专转 讙讜讙诇, 诪诪砖诇转 讞讜祝 讛砖谞讛讘 |
50
- | **TIMEX** | Time Expression | 1945, 砖谞转 1993, 讬讜诐 讛砖讜讗讛, 砖谞讜转 讛-90 |
51
- | **EVE** | Event | 讛砖讜讗讛, 诪诇讞诪转 讛注讜诇诐 讛砖谞讬讬讛, 砖诇讟讜谉 讛讗驻专讟讛讬讬讚 |
52
- | **TTL** | Title | 驻讬讛专专, 拽讬住专, 诪谞讻"诇 |
53
- | **ANG** | Language | 注讘专讬转, 注专讘讬转, 讙专诪谞讬转 |
54
- | **DUC** | Product | 驻讬讬住讘讜拽, F-16, 转谞讜讘讛 |
55
- | **WOA** | Work of Art | 讚讜"讞 诪讘拽专 讛诪讚讬谞讛, 注讬转讜谉 讛讗专抓, 讛讗专讬 驻讜讟专, 转讬拽 2000, |
56
- | **MISC** | Miscellaneous聽 | 拽讜专讜谞讛, 讛转讜 讛讬专讜拽, 诪讚诇讬转 讝讛讘, 讘讬讟拽讜讬谉 |
57
-
58
-
59
-
60
 
61
  # Results
62
  We test our models on the **IAHALT test set**. We also check another models, such as [DictaBert](https://huggingface.co/dicta-il/dictabert) and [HeBert](https://huggingface.co/avichr/heBERT). This is the performence results:
@@ -79,7 +52,6 @@ We test our models on the **IAHALT test set**. We also check another models, suc
79
 
80
  According to the results, we recommend to use [**DICTA_Small_Smart**](https://huggingface.co/FusioNER/Dicta_Small_Smart) model.
81
 
82
-
83
  # Hebrew NLP models
84
  You can find in the table Hebrew NLP models:
85
 
@@ -90,4 +62,37 @@ You can find in the table Hebrew NLP models:
90
  | dicta-il/dictabert-large | [https://huggingface.co/dicta-il/dictabert-large](https://huggingface.co/dicta-il/dictabert-large) | Shaltiel Shmidman and Avi Shmidman and Moshe Koppel |
91
  | avichr/heBERT | [https://huggingface.co/avichr/heBERT](https://huggingface.co/avichr/heBERT) | Avihay Chriqui and Inbal Yahav |
92
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
93
  **MIT License**
 
30
  | **DICTA_smart_NER** | Training the [DICTA-ner](https://huggingface.co/dicta-il/dictabert-ner) model on IAHALT + Name-Sentences[1] + Entity-Injection[2]] [dataset](https://huggingface.co/datasets/FusioNER/Smart_Injection) | [FusioNER/DICTA_Smart](https://huggingface.co/FusioNER/DICTA_Smart) | IAHALT | [FusioNER/Smart_Injection](https://huggingface.co/datasets/FusioNER/Smart_Injection) | [DICTA-ner](https://huggingface.co/dicta-il/dictabert-ner) | classic[4]
31
  | **DICTA_Large_Smart** | Training the [DICTA Large](https://huggingface.co/dicta-il/dictabert-large) model on IAHALT + Name-Sentences[1] + Entity-Injection[2]] [dataset](https://huggingface.co/datasets/FusioNER/Smart_Injection) | [FusioNER/Dicta_Large_Smart](https://huggingface.co/FusioNER/Dicta_Large_Smart) | IAHALT | [FusioNER/Smart_Injection](https://huggingface.co/datasets/FusioNER/Smart_Injection) | [DICTA Large](https://huggingface.co/dicta-il/dictabert-large) | classic[4]
32
  | **TEC_NER** | Basic technology NER model | model path | TEC_NER | https://huggingface.co/datasets/FusioNER/tec_ner/tree/main | base model | technology
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
 
34
  # Results
35
  We test our models on the **IAHALT test set**. We also check another models, such as [DictaBert](https://huggingface.co/dicta-il/dictabert) and [HeBert](https://huggingface.co/avichr/heBERT). This is the performence results:
 
52
 
53
  According to the results, we recommend to use [**DICTA_Small_Smart**](https://huggingface.co/FusioNER/Dicta_Small_Smart) model.
54
 
 
55
  # Hebrew NLP models
56
  You can find in the table Hebrew NLP models:
57
 
 
62
  | dicta-il/dictabert-large | [https://huggingface.co/dicta-il/dictabert-large](https://huggingface.co/dicta-il/dictabert-large) | Shaltiel Shmidman and Avi Shmidman and Moshe Koppel |
63
  | avichr/heBERT | [https://huggingface.co/avichr/heBERT](https://huggingface.co/avichr/heBERT) | Avihay Chriqui and Inbal Yahav |
64
 
65
+ # Footnotes
66
+
67
+ #### [1] **Name-Sentences**:
68
+ Adding to the corpus sentences that contain only the entity we want the network to learn.
69
+
70
+ #### [2] **Entity-Injection**:
71
+ Replace a tagged entity in the original corpus with a new entity.
72
+ By using, this method, the model can learn new entities (not labels!) which the model not extracted before.
73
+
74
+ #### [3] **BI-BI Problem**:
75
+ Building training corpus when entities from the same type appear in sequence, labeled as continuations of one another.
76
+ For example, the text "讛讗专讬 驻讜讟专 讜专讜谉 讜讜讬讝诇讬" would tagged as **SINGLE** entity.
77
+ That problem prevent the model to extract entities correctly.
78
+
79
+ #### [4] **Classic**:
80
+
81
+ The classic NER types:
82
+
83
+ | entity type | full name | examples |
84
+ |:-----------:|:----------| --------:|
85
+ | **PER** | Person | 讗讚讜诇祝 讛讬讟诇专, 专讜讚讜诇祝 讛住, 诪专讚讻讬 讗谞讬诇讘讬抓 |
86
+ | **GPE** | Geopolitical Entity | 讙专诪谞讬讛, 驻讜诇讬谉, 讘专诇讬谉, 讜讜专砖讛 |
87
+ | **LOC** | Location | 诪讝专讞 讗讬专讜驻讛, 讗讙谉 讛讬诐 讛转讬讻讜谉, 讛讙诇讬诇 |
88
+ | **FAC** | Facility | 讗讜讜砖讜讜讬抓, 诪讙讚诇讬 讛转讗讜诪讬诐, 谞转讘"讙 2000, 专讞讜讘 拽驻诇谉 |
89
+ | **ORG** | Organization | 讛诪驻诇讙讛 讛谞讗爪讬转, 讞讘专转 讙讜讙诇, 诪诪砖诇转 讞讜祝 讛砖谞讛讘 |
90
+ | **TIMEX** | Time Expression | 1945, 砖谞转 1993, 讬讜诐 讛砖讜讗讛, 砖谞讜转 讛-90 |
91
+ | **EVE** | Event | 讛砖讜讗讛, 诪诇讞诪转 讛注讜诇诐 讛砖谞讬讬讛, 砖诇讟讜谉 讛讗驻专讟讛讬讬讚 |
92
+ | **TTL** | Title | 驻讬讛专专, 拽讬住专, 诪谞讻"诇 |
93
+ | **ANG** | Language | 注讘专讬转, 注专讘讬转, 讙专诪谞讬转 |
94
+ | **DUC** | Product | 驻讬讬住讘讜拽, F-16, 转谞讜讘讛 |
95
+ | **WOA** | Work of Art | 讚讜"讞 诪讘拽专 讛诪讚讬谞讛, 注讬转讜谉 讛讗专抓, 讛讗专讬 驻讜讟专, 转讬拽 2000, |
96
+ | **MISC** | Miscellaneous聽 | 拽讜专讜谞讛, 讛转讜 讛讬专讜拽, 诪讚诇讬转 讝讛讘, 讘讬讟拽讜讬谉 |
97
+
98
  **MIT License**