Update README.md
Browse files
README.md
CHANGED
@@ -30,33 +30,6 @@ Here you can find a description on each of our models. Each row contains the mod
|
|
30 |
| **DICTA_smart_NER** | Training the [DICTA-ner](https://huggingface.co/dicta-il/dictabert-ner) model on IAHALT + Name-Sentences[1] + Entity-Injection[2]] [dataset](https://huggingface.co/datasets/FusioNER/Smart_Injection) | [FusioNER/DICTA_Smart](https://huggingface.co/FusioNER/DICTA_Smart) | IAHALT | [FusioNER/Smart_Injection](https://huggingface.co/datasets/FusioNER/Smart_Injection) | [DICTA-ner](https://huggingface.co/dicta-il/dictabert-ner) | classic[4]
|
31 |
| **DICTA_Large_Smart** | Training the [DICTA Large](https://huggingface.co/dicta-il/dictabert-large) model on IAHALT + Name-Sentences[1] + Entity-Injection[2]] [dataset](https://huggingface.co/datasets/FusioNER/Smart_Injection) | [FusioNER/Dicta_Large_Smart](https://huggingface.co/FusioNER/Dicta_Large_Smart) | IAHALT | [FusioNER/Smart_Injection](https://huggingface.co/datasets/FusioNER/Smart_Injection) | [DICTA Large](https://huggingface.co/dicta-il/dictabert-large) | classic[4]
|
32 |
| **TEC_NER** | Basic technology NER model | model path | TEC_NER | https://huggingface.co/datasets/FusioNER/tec_ner/tree/main | base model | technology
|
33 |
-
[1] **Name-Sentences**: Adding to the corpus sentences that contain only the entity we want the network to learn.
|
34 |
-
|
35 |
-
[2] **Entity-Injection**: Replace a tagged entity in the original corpus with a new entity. By using, this method, the model can learn new entities (not labels!) which the model not extracted before.
|
36 |
-
|
37 |
-
[3] **BI-BI Problem**: Building training corpus when entities from the same type appear in sequence, labeled as continuations of one another.
|
38 |
-
|
39 |
-
For example, the text "讛讗专讬 驻讜讟专 讜专讜谉 讜讜讬讝诇讬" would tagged as **SINGLE** entity. That problem prevent the model to extract entities correctly.
|
40 |
-
|
41 |
-
[4] **Classic**: The classic NER types:
|
42 |
-
|
43 |
-
| entity type | full name | examples |
|
44 |
-
|:-----------:|:----------| --------:|
|
45 |
-
| **PER** | Person | 讗讚讜诇祝 讛讬讟诇专, 专讜讚讜诇祝 讛住, 诪专讚讻讬 讗谞讬诇讘讬抓 |
|
46 |
-
| **GPE** | Geopolitical Entity | 讙专诪谞讬讛, 驻讜诇讬谉, 讘专诇讬谉, 讜讜专砖讛 |
|
47 |
-
| **LOC** | Location | 诪讝专讞 讗讬专讜驻讛, 讗讙谉 讛讬诐 讛转讬讻讜谉, 讛讙诇讬诇 |
|
48 |
-
| **FAC** | Facility | 讗讜讜砖讜讜讬抓, 诪讙讚诇讬 讛转讗讜诪讬诐, 谞转讘"讙 2000, 专讞讜讘 拽驻诇谉 |
|
49 |
-
| **ORG** | Organization | 讛诪驻诇讙讛 讛谞讗爪讬转, 讞讘专转 讙讜讙诇, 诪诪砖诇转 讞讜祝 讛砖谞讛讘 |
|
50 |
-
| **TIMEX** | Time Expression | 1945, 砖谞转 1993, 讬讜诐 讛砖讜讗讛, 砖谞讜转 讛-90 |
|
51 |
-
| **EVE** | Event | 讛砖讜讗讛, 诪诇讞诪转 讛注讜诇诐 讛砖谞讬讬讛, 砖诇讟讜谉 讛讗驻专讟讛讬讬讚 |
|
52 |
-
| **TTL** | Title | 驻讬讛专专, 拽讬住专, 诪谞讻"诇 |
|
53 |
-
| **ANG** | Language | 注讘专讬转, 注专讘讬转, 讙专诪谞讬转 |
|
54 |
-
| **DUC** | Product | 驻讬讬住讘讜拽, F-16, 转谞讜讘讛 |
|
55 |
-
| **WOA** | Work of Art | 讚讜"讞 诪讘拽专 讛诪讚讬谞讛, 注讬转讜谉 讛讗专抓, 讛讗专讬 驻讜讟专, 转讬拽 2000, |
|
56 |
-
| **MISC** | Miscellaneous聽 | 拽讜专讜谞讛, 讛转讜 讛讬专讜拽, 诪讚诇讬转 讝讛讘, 讘讬讟拽讜讬谉 |
|
57 |
-
|
58 |
-
|
59 |
-
|
60 |
|
61 |
# Results
|
62 |
We test our models on the **IAHALT test set**. We also check another models, such as [DictaBert](https://huggingface.co/dicta-il/dictabert) and [HeBert](https://huggingface.co/avichr/heBERT). This is the performence results:
|
@@ -79,7 +52,6 @@ We test our models on the **IAHALT test set**. We also check another models, suc
|
|
79 |
|
80 |
According to the results, we recommend to use [**DICTA_Small_Smart**](https://huggingface.co/FusioNER/Dicta_Small_Smart) model.
|
81 |
|
82 |
-
|
83 |
# Hebrew NLP models
|
84 |
You can find in the table Hebrew NLP models:
|
85 |
|
@@ -90,4 +62,37 @@ You can find in the table Hebrew NLP models:
|
|
90 |
| dicta-il/dictabert-large | [https://huggingface.co/dicta-il/dictabert-large](https://huggingface.co/dicta-il/dictabert-large) | Shaltiel Shmidman and Avi Shmidman and Moshe Koppel |
|
91 |
| avichr/heBERT | [https://huggingface.co/avichr/heBERT](https://huggingface.co/avichr/heBERT) | Avihay Chriqui and Inbal Yahav |
|
92 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
93 |
**MIT License**
|
|
|
30 |
| **DICTA_smart_NER** | Training the [DICTA-ner](https://huggingface.co/dicta-il/dictabert-ner) model on IAHALT + Name-Sentences[1] + Entity-Injection[2]] [dataset](https://huggingface.co/datasets/FusioNER/Smart_Injection) | [FusioNER/DICTA_Smart](https://huggingface.co/FusioNER/DICTA_Smart) | IAHALT | [FusioNER/Smart_Injection](https://huggingface.co/datasets/FusioNER/Smart_Injection) | [DICTA-ner](https://huggingface.co/dicta-il/dictabert-ner) | classic[4]
|
31 |
| **DICTA_Large_Smart** | Training the [DICTA Large](https://huggingface.co/dicta-il/dictabert-large) model on IAHALT + Name-Sentences[1] + Entity-Injection[2]] [dataset](https://huggingface.co/datasets/FusioNER/Smart_Injection) | [FusioNER/Dicta_Large_Smart](https://huggingface.co/FusioNER/Dicta_Large_Smart) | IAHALT | [FusioNER/Smart_Injection](https://huggingface.co/datasets/FusioNER/Smart_Injection) | [DICTA Large](https://huggingface.co/dicta-il/dictabert-large) | classic[4]
|
32 |
| **TEC_NER** | Basic technology NER model | model path | TEC_NER | https://huggingface.co/datasets/FusioNER/tec_ner/tree/main | base model | technology
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
33 |
|
34 |
# Results
|
35 |
We test our models on the **IAHALT test set**. We also check another models, such as [DictaBert](https://huggingface.co/dicta-il/dictabert) and [HeBert](https://huggingface.co/avichr/heBERT). This is the performence results:
|
|
|
52 |
|
53 |
According to the results, we recommend to use [**DICTA_Small_Smart**](https://huggingface.co/FusioNER/Dicta_Small_Smart) model.
|
54 |
|
|
|
55 |
# Hebrew NLP models
|
56 |
You can find in the table Hebrew NLP models:
|
57 |
|
|
|
62 |
| dicta-il/dictabert-large | [https://huggingface.co/dicta-il/dictabert-large](https://huggingface.co/dicta-il/dictabert-large) | Shaltiel Shmidman and Avi Shmidman and Moshe Koppel |
|
63 |
| avichr/heBERT | [https://huggingface.co/avichr/heBERT](https://huggingface.co/avichr/heBERT) | Avihay Chriqui and Inbal Yahav |
|
64 |
|
65 |
+
# Footnotes
|
66 |
+
|
67 |
+
#### [1] **Name-Sentences**:
|
68 |
+
Adding to the corpus sentences that contain only the entity we want the network to learn.
|
69 |
+
|
70 |
+
#### [2] **Entity-Injection**:
|
71 |
+
Replace a tagged entity in the original corpus with a new entity.
|
72 |
+
By using, this method, the model can learn new entities (not labels!) which the model not extracted before.
|
73 |
+
|
74 |
+
#### [3] **BI-BI Problem**:
|
75 |
+
Building training corpus when entities from the same type appear in sequence, labeled as continuations of one another.
|
76 |
+
For example, the text "讛讗专讬 驻讜讟专 讜专讜谉 讜讜讬讝诇讬" would tagged as **SINGLE** entity.
|
77 |
+
That problem prevent the model to extract entities correctly.
|
78 |
+
|
79 |
+
#### [4] **Classic**:
|
80 |
+
|
81 |
+
The classic NER types:
|
82 |
+
|
83 |
+
| entity type | full name | examples |
|
84 |
+
|:-----------:|:----------| --------:|
|
85 |
+
| **PER** | Person | 讗讚讜诇祝 讛讬讟诇专, 专讜讚讜诇祝 讛住, 诪专讚讻讬 讗谞讬诇讘讬抓 |
|
86 |
+
| **GPE** | Geopolitical Entity | 讙专诪谞讬讛, 驻讜诇讬谉, 讘专诇讬谉, 讜讜专砖讛 |
|
87 |
+
| **LOC** | Location | 诪讝专讞 讗讬专讜驻讛, 讗讙谉 讛讬诐 讛转讬讻讜谉, 讛讙诇讬诇 |
|
88 |
+
| **FAC** | Facility | 讗讜讜砖讜讜讬抓, 诪讙讚诇讬 讛转讗讜诪讬诐, 谞转讘"讙 2000, 专讞讜讘 拽驻诇谉 |
|
89 |
+
| **ORG** | Organization | 讛诪驻诇讙讛 讛谞讗爪讬转, 讞讘专转 讙讜讙诇, 诪诪砖诇转 讞讜祝 讛砖谞讛讘 |
|
90 |
+
| **TIMEX** | Time Expression | 1945, 砖谞转 1993, 讬讜诐 讛砖讜讗讛, 砖谞讜转 讛-90 |
|
91 |
+
| **EVE** | Event | 讛砖讜讗讛, 诪诇讞诪转 讛注讜诇诐 讛砖谞讬讬讛, 砖诇讟讜谉 讛讗驻专讟讛讬讬讚 |
|
92 |
+
| **TTL** | Title | 驻讬讛专专, 拽讬住专, 诪谞讻"诇 |
|
93 |
+
| **ANG** | Language | 注讘专讬转, 注专讘讬转, 讙专诪谞讬转 |
|
94 |
+
| **DUC** | Product | 驻讬讬住讘讜拽, F-16, 转谞讜讘讛 |
|
95 |
+
| **WOA** | Work of Art | 讚讜"讞 诪讘拽专 讛诪讚讬谞讛, 注讬转讜谉 讛讗专抓, 讛讗专讬 驻讜讟专, 转讬拽 2000, |
|
96 |
+
| **MISC** | Miscellaneous聽 | 拽讜专讜谞讛, 讛转讜 讛讬专讜拽, 诪讚诇讬转 讝讛讘, 讘讬讟拽讜讬谉 |
|
97 |
+
|
98 |
**MIT License**
|