File size: 14,004 Bytes
7e86d7c
9150248
 
70364f1
 
 
 
 
 
0c85b18
70364f1
 
ae28db6
3b660cf
 
f783ddb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4c358b8
d933b98
 
 
37ed750
 
d933b98
 
 
37ed750
d933b98
37ed750
 
d933b98
 
 
37ed750
 
 
d933b98
8e392a2
d933b98
cf67500
bbff809
 
cf67500
 
 
 
bbff809
cf67500
d18291d
331ca5a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
76c46aa
 
 
78a7ae3
 
 
d18291d
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106

# Fusion NER Models

Here you can find NER models for Fusion project!

# Table of content:
1. [**NER-Models**](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#ner-models)
2. [**Results**](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#results)
3. [**Hebrew NLP models**](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#hebrew-nlp-models)
4. [**Footnotes**](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#footnotes)

# NER Models:

Here you can find a description on each of our models. Each row contains the model nickname, training description, model path (LINK), source dataset (with LINK), base model and entity types.

|model name | model description | model path | datasets | link to dataset | base model | entity types | trainer |
|:----------|:------------------|:-----------|:--------:|:----------------|:----------:| :----------- | :-----: |
| **Basic** | Basic training on IAHALT | [FusioNER/Basic_IAHALT](https://huggingface.co/FusioNER/Basic_IAHALT) | IAHALT | [FusioNER/Basic](https://huggingface.co/datasets/FusioNER/Basic) | [HeRo](https://huggingface.co/HeNLP/HeRo) | [classic[4]](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#4-classic) | [Etzion](https://huggingface.co/etzion) |
| **Vitaly** | Vitaly training on IAHALT (with [BI-BI problem](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#3-bi-bi-problem)) | [FusioNER/Vitaly_NER](https://huggingface.co/FusioNER/Vitaly_NER) | IAHALT | [FusioNER/Vitaly](https://huggingface.co/datasets/FusioNER/Vitaly) | [HeRo](https://huggingface.co/HeNLP/HeRo) | [classic[4]](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#4-classic) | [Vitaly]() |
| **Name-Sentences** | Training on IAHALT + [Name-Sentences](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#1-name-sentences) | [FusioNER/Name-Sentences](https://huggingface.co/FusioNER/Name-Sentences) | IAHALT | [FusioNER/Name_Sentences](https://huggingface.co/datasets/FusioNER/Name_Sentences) | [HeRo](https://huggingface.co/HeNLP/HeRo) | [classic[4]](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#4-classic) | [Etzion](https://huggingface.co/etzion) |
| **Entity-Injection** | Training on IAHALT + [Entity-Injection](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#2-entity-injection) | [FusioNER/Entity-Injection](https://huggingface.co/FusioNER/Entity-Injection) | IAHALT | [FusioNER/Entity_Injection](https://huggingface.co/datasets/FusioNER/Entity_Injection) | [HeRo](https://huggingface.co/HeNLP/HeRo) | [classic[4]](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#4-classic) | [Etzion](https://huggingface.co/etzion) |
| **Smart_Injection** | Training on IAHALT + [Name-Sentences](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#1-name-sentences) + [Entity-Injection](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#2-entity-injection) | [FusioNER/Smart_Injection](https://huggingface.co/FusioNER/Smart_Injection) | IAHALT | [FusioNER/Smart_Injection](https://huggingface.co/datasets/FusioNER/Smart_Injection) | [HeRo](https://huggingface.co/HeNLP/HeRo) | [classic[4]](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#4-classic) | [Etzion](https://huggingface.co/etzion) |
| **NEMO** | Basic training on NEMO dataset| [FusioNER/Nemo](https://huggingface.co/FusioNER/Nemo) | NEMO | [FusioNER/NEMO](https://huggingface.co/datasets/FusioNER/NEMO) | [HeRo](https://huggingface.co/HeNLP/HeRo) | [classic[4]](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#4-classic) | [Etzion](https://huggingface.co/etzion) |
| **IAHALT_and_NEMO** | Basic training on IAHALT + NEMO | [FusioNER/IAHALT_and_NEMO](https://huggingface.co/FusioNER/IAHALT_and_NEMO) | IAHALT + NEMO | [FusioNER/IAHALT_and_NEMO](https://huggingface.co/datasets/FusioNER/IAHALT_and_NEMO) | [HeRo](https://huggingface.co/HeNLP/HeRo) | [classic[4]](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#4-classic) |  [Etzion](https://huggingface.co/etzion) |
| **IAHALT_and_NEMO_PP** | Training on IAHALT + NEMO + [Name-Sentences](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#1-name-sentences) + [Entity-Injection](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#2-entity-injection) | [FusioNER/IAHALT_and_NEMO_and_PP](https://huggingface.co/FusioNER/IAHALT_and_NEMO_and_PP) | IAHALT + NEMO | [FusioNER/IAHALT_and_NEMO_PP](https://huggingface.co/datasets/FusioNER/IAHALT_and_NEMO_PP) | [HeRo](https://huggingface.co/HeNLP/HeRo) | [classic[4]](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#4-classic) |  [Etzion](https://huggingface.co/etzion) |
| **Animals** | Training on IAHALT + [Entity-Injection](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#2-entity-injection) (of animals names as PER entities) | [FusioNER/Animals](https://huggingface.co/FusioNER/Animals) | IAHALT | [FusioNER/Animals](https://huggingface.co/datasets/FusioNER/Animals) | [HeRo](https://huggingface.co/HeNLP/HeRo) | [classic[4]](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#4-classic) | [Etzion](https://huggingface.co/etzion) |
| **PRS-Injection** | Training on IAHALT + [Entity-Injection](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#2-entity-injection) (of PRS names as PER entities) | [FusioNER/PRS-Injection](https://huggingface.co/FusioNER/PRS-Injection) | IAHALT | [FusioNER/PRS_locations](https://huggingface.co/datasets/FusioNER/PRS_locations) | [HeRo](https://huggingface.co/HeNLP/HeRo) | [classic[4]](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#4-classic) | [Etzion](https://huggingface.co/etzion) |
| **DICTA_Basic** | Training the [DICTA](https://huggingface.co/dicta-il/dictabert) model on the [basic](https://huggingface.co/datasets/FusioNER/Basic) IAHALT dataset | [FusioNER/Dicta_Small_Basic](https://huggingface.co/FusioNER/Dicta_Small_Basic) | IAHALT | [FusioNER/Smart_Injection](https://huggingface.co/datasets/FusioNER/Smart_Injection) | [DICTA](https://huggingface.co/dicta-il/dictabert) | [classic[4]](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#4-classic) | [Etzion](https://huggingface.co/etzion) |
| **DICTA_Small_Smart** | Training the [DICTA](https://huggingface.co/dicta-il/dictabert) model on IAHALT + [Name-Sentences](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#1-name-sentences) + [Entity-Injection](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#2-entity-injection)] [dataset](https://huggingface.co/datasets/FusioNER/Smart_Injection) | [FusioNER/Dicta_Small_Smart](https://huggingface.co/FusioNER/Dicta_Small_Smart) | IAHALT | [FusioNER/Smart_Injection](https://huggingface.co/datasets/FusioNER/Smart_Injection) | [DICTA](https://huggingface.co/dicta-il/dictabert) | [classic[4]](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#4-classic) | [Etzion](https://huggingface.co/etzion) |
| **DICTA_basic_NER** | Training the [DICTA-ner](https://huggingface.co/dicta-il/dictabert-ner) model on the [basic](https://huggingface.co/datasets/FusioNER/Basic) IAHALT dataset| [FusioNER/DICTA_basic](https://huggingface.co/FusioNER/DICTA_basic) | IAHALT | [FusioNER/Basic](https://huggingface.co/datasets/FusioNER/Basic) | [DICTA-ner](https://huggingface.co/dicta-il/dictabert-ner) | [classic[4]](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#4-classic) | [Etzion](https://huggingface.co/etzion) |
| **DICTA_smart_NER** | Training the [DICTA-ner](https://huggingface.co/dicta-il/dictabert-ner) model on IAHALT + [Name-Sentences](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#1-name-sentences) + [Entity-Injection](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#2-entity-injection)] [dataset](https://huggingface.co/datasets/FusioNER/Smart_Injection) | [FusioNER/DICTA_Smart](https://huggingface.co/FusioNER/DICTA_Smart) | IAHALT | [FusioNER/Smart_Injection](https://huggingface.co/datasets/FusioNER/Smart_Injection) | [DICTA-ner](https://huggingface.co/dicta-il/dictabert-ner) | [classic[4]](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#4-classic) | [Etzion](https://huggingface.co/etzion) |
| **DICTA_Large_Smart** | Training the [DICTA Large](https://huggingface.co/dicta-il/dictabert-large) model on IAHALT + [Name-Sentences](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#1-name-sentences) + [Entity-Injection](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#2-entity-injection)] [dataset](https://huggingface.co/datasets/FusioNER/Smart_Injection) | [FusioNER/Dicta_Large_Smart](https://huggingface.co/FusioNER/Dicta_Large_Smart) | IAHALT | [FusioNER/Smart_Injection](https://huggingface.co/datasets/FusioNER/Smart_Injection) | [DICTA Large](https://huggingface.co/dicta-il/dictabert-large) | [classic[4]](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#4-classic) | [Etzion](https://huggingface.co/etzion) |
| **TEC_NER** | Basic technology NER model | [FusioNER/tec_ner](https://huggingface.co/FusioNER/tec_ner) | TEC_NER | [FusioNER/tec_ner](https://huggingface.co/datasets/FusioNER/tec_ner) | base model | TEC | [Yehoshua](https://huggingface.co/yehoshuadiller) |

# Results
We test our models on the **IAHALT test set**. We also check another models, such as [DictaBert](https://huggingface.co/dicta-il/dictabert) and [HeBert](https://huggingface.co/avichr/heBERT). This is the performence results:

| Model name | Precision | Recall | F1 - Score  | Time (in seconds) |
| :--------- | :-------: | :----: | :---------: | :---------------: |
| [**IAHALT_and_NEMO_PP**](https://huggingface.co/FusioNER/IAHALT_and_NEMO_and_PP) | 0.714 | 0.353 | 0.461 | 83.128 |
| [**HeBert**](https://huggingface.co/avichr/heBERT) | 0.574 | 0.474 | 0.494 | 86.483 |
| [**NEMO**](https://huggingface.co/FusioNER/Nemo) | 0.553 | 0.51 | 0.525 | 81.422 |
| [**IAHALT_and_NEMO**](https://huggingface.co/FusioNER/IAHALT_and_NEMO) | 0.692 | 0.678 | 0.684 | 83.702 |
| [**Vitaly**](https://huggingface.co/FusioNER/Vitaly_NER) | 0.883 | 0.794 | 0.836 | 83.773 |
| [**DictaBert**](https://huggingface.co/dicta-il/dictabert) | 0.916 | 0.834 | 0.872 | **70.465** |
| [**DICTA_large**](https://huggingface.co/dicta-il/dictabert-large) | **0.917** | 0.845 | 0.879 | 206.251 |
| [**Name-Sentences**](https://huggingface.co/FusioNER/Name-Sentences) | 0.895 | 0.865 | 0.879 | 82.674 |
| [**Basic**](FusioNER/Basic_IAHALT) | 0.897 | 0.866 | 0.881 | 84.479 |
| [**Smart_Injection**](https://huggingface.co/FusioNER/Smart_Injection) | 0.898 | 0.867 | 0.881 | 82.253 |
| [**DICTA_Basic**](https://huggingface.co/FusioNER/Dicta_Small_Basic) | 0.903 | **0.875** | 0.888 | **69.419** |
| [**DICTA_Large_Smart**](https://huggingface.co/FusioNER/Dicta_Large_Smart) | 0.904 | **0.875** | **0.889** | 204.324 |
| [**DICTA_Small_Smart**](https://huggingface.co/FusioNER/Dicta_Small_Smart) | 0.904 | **0.875** | **0.889** | **70.29** |

According to the results, we recommend to use [**DICTA_Small_Smart**](https://huggingface.co/FusioNER/Dicta_Small_Smart) model.

# Hebrew NLP models
You can find in the table Hebrew NLP models:

| Model name | Link | Creator |
|:-----------|:-----|:--------|
| HeNLP/HeRo | [https://huggingface.co/HeNLP/HeRo](HeNLP/HeRo) | Vitaly Shalumov and Harel Haskey |
| dicta-il/dictabert | [https://huggingface.co/dicta-il/dictabert](https://huggingface.co/dicta-il/dictabert) | Shaltiel Shmidman and Avi Shmidman and Moshe Koppel |
| dicta-il/dictabert-large | [https://huggingface.co/dicta-il/dictabert-large](https://huggingface.co/dicta-il/dictabert-large) | Shaltiel Shmidman and Avi Shmidman and Moshe Koppel |
| avichr/heBERT | [https://huggingface.co/avichr/heBERT](https://huggingface.co/avichr/heBERT) | Avihay Chriqui and Inbal Yahav |

# Footnotes

#### [1] **Name-Sentences**: 
Adding to the corpus sentences that contain only the entity we want the network to learn.

#### [2] **Entity-Injection**: 
Replace a tagged entity in the original corpus with a new entity. 
By using, this method, the model can learn new entities (not labels!) which the model not extracted before.

#### [3] **BI-BI Problem**: 
Building training corpus when entities from the same type appear in sequence, labeled as continuations of one another. 
For example, the text "讛讗专讬 驻讜讟专 讜专讜谉 讜讜讬讝诇讬" would tagged as **SINGLE** entity. 
That problem prevent the model to extract entities correctly.

#### [4] **Classic**: 

The classic NER types:

| entity type | full name | examples |
|:-----------:|:----------| --------:|
| **PER** | Person | 讗讚讜诇祝 讛讬讟诇专, 专讜讚讜诇祝 讛住, 诪专讚讻讬 讗谞讬诇讘讬抓 |
| **GPE** | Geopolitical Entity | 讙专诪谞讬讛, 驻讜诇讬谉, 讘专诇讬谉, 讜讜专砖讛 |
| **LOC** | Location | 诪讝专讞 讗讬专讜驻讛, 讗讙谉 讛讬诐 讛转讬讻讜谉, 讛讙诇讬诇 |
| **FAC** | Facility | 讗讜讜砖讜讜讬抓, 诪讙讚诇讬 讛转讗讜诪讬诐, 谞转讘"讙 2000, 专讞讜讘 拽驻诇谉 |
| **ORG** | Organization | 讛诪驻诇讙讛 讛谞讗爪讬转, 讞讘专转 讙讜讙诇, 诪诪砖诇转 讞讜祝 讛砖谞讛讘 |
| **TIMEX** | Time Expression | 1945, 砖谞转 1993, 讬讜诐 讛砖讜讗讛, 砖谞讜转 讛-90 |
| **EVE** | Event | 讛砖讜讗讛, 诪诇讞诪转 讛注讜诇诐 讛砖谞讬讬讛, 砖诇讟讜谉 讛讗驻专讟讛讬讬讚 |
| **TTL** | Title | 驻讬讛专专, 拽讬住专, 诪谞讻"诇 |
| **ANG** | Language | 注讘专讬转, 注专讘讬转, 讙专诪谞讬转 |
| **DUC** | Product | 驻讬讬住讘讜拽, F-16, 转谞讜讘讛 |
| **WOA** | Work of Art | 讚讜"讞 诪讘拽专 讛诪讚讬谞讛, 注讬转讜谉 讛讗专抓, 讛讗专讬 驻讜讟专, 转讬拽 2000, |
| **MISC** | Miscellaneous聽 | 拽讜专讜谞讛, 讛转讜 讛讬专讜拽, 诪讚诇讬转 讝讛讘, 讘讬讟拽讜讬谉 |

# Datasets for English NER (for cleaning wrong entities for english texts):
- [**ontonotes5**](https://huggingface.co/datasets/tner/ontonotes5)
- [**conll2003**](https://huggingface.co/datasets/eriktks/conll2003)



**MIT License**