# Fusion NER Models
Here you can find NER models for Fusion project!
# Table of content:
1. [**NER-Models**](
2. [**Results**](
3. [**Hebrew NLP models**](
4. [**Footnotes**](
# NER Models:
Here you can find a description on each of our models. Each row contains the model nickname, training description, model path (LINK), source dataset (with LINK), base model and entity types.
|model name | model description | model path | datasets | link to dataset | base model | entity types | trainer |
|:----------|:------------------|:-----------|:--------:|:----------------|:----------:| :----------- | :-----: |
| **Basic** | Basic training on IAHALT | [FusioNER/Basic_IAHALT]( | IAHALT | [FusioNER/Basic]( | [HeRo]( | [classic[4]]( | [Etzion]( |
| **Vitaly** | Vitaly training on IAHALT (with [BI-BI problem]( | [FusioNER/Vitaly_NER]( | IAHALT | [FusioNER/Vitaly]( | [HeRo]( | [classic[4]]( | [Vitaly]() |
| **Name-Sentences** | Training on IAHALT + [Name-Sentences]( | [FusioNER/Name-Sentences]( | IAHALT | [FusioNER/Name_Sentences]( | [HeRo]( | [classic[4]]( | [Etzion]( |
| **Entity-Injection** | Training on IAHALT + [Entity-Injection]( | [FusioNER/Entity-Injection]( | IAHALT | [FusioNER/Entity_Injection]( | [HeRo]( | [classic[4]]( | [Etzion]( |
| **Smart_Injection** | Training on IAHALT + [Name-Sentences]( + [Entity-Injection]( | [FusioNER/Smart_Injection]( | IAHALT | [FusioNER/Smart_Injection]( | [HeRo]( | [classic[4]]( | [Etzion]( |
| **NEMO** | Basic training on NEMO dataset| [FusioNER/Nemo]( | NEMO | [FusioNER/NEMO]( | [HeRo]( | [classic[4]]( | [Etzion]( |
| **IAHALT_and_NEMO** | Basic training on IAHALT + NEMO | [FusioNER/IAHALT_and_NEMO]( | IAHALT + NEMO | [FusioNER/IAHALT_and_NEMO]( | [HeRo]( | [classic[4]]( | [Etzion]( |
| **IAHALT_and_NEMO_PP** | Training on IAHALT + NEMO + [Name-Sentences]( + [Entity-Injection]( | [FusioNER/IAHALT_and_NEMO_and_PP]( | IAHALT + NEMO | [FusioNER/IAHALT_and_NEMO_PP]( | [HeRo]( | [classic[4]]( | [Etzion]( |
| **Animals** | Training on IAHALT + [Entity-Injection]( (of animals names as PER entities) | [FusioNER/Animals]( | IAHALT | [FusioNER/Animals]( | [HeRo]( | [classic[4]]( | [Etzion]( |
| **PRS-Injection** | Training on IAHALT + [Entity-Injection]( (of PRS names as PER entities) | [FusioNER/PRS-Injection]( | IAHALT | [FusioNER/PRS_locations]( | [HeRo]( | [classic[4]]( | [Etzion]( |
| **DICTA_Basic** | Training the [DICTA]( model on the [basic]( IAHALT dataset | [FusioNER/Dicta_Small_Basic]( | IAHALT | [FusioNER/Smart_Injection]( | [DICTA]( | [classic[4]]( | [Etzion]( |
| **DICTA_Small_Smart** | Training the [DICTA]( model on IAHALT + [Name-Sentences]( + [Entity-Injection](] [dataset]( | [FusioNER/Dicta_Small_Smart]( | IAHALT | [FusioNER/Smart_Injection]( | [DICTA]( | [classic[4]]( | [Etzion]( |
| **DICTA_basic_NER** | Training the [DICTA-ner]( model on the [basic]( IAHALT dataset| [FusioNER/DICTA_basic]( | IAHALT | [FusioNER/Basic]( | [DICTA-ner]( | [classic[4]]( | [Etzion]( |
| **DICTA_smart_NER** | Training the [DICTA-ner]( model on IAHALT + [Name-Sentences]( + [Entity-Injection](] [dataset]( | [FusioNER/DICTA_Smart]( | IAHALT | [FusioNER/Smart_Injection]( | [DICTA-ner]( | [classic[4]]( | [Etzion]( |
| **DICTA_Large_Smart** | Training the [DICTA Large]( model on IAHALT + [Name-Sentences]( + [Entity-Injection](] [dataset]( | [FusioNER/Dicta_Large_Smart]( | IAHALT | [FusioNER/Smart_Injection]( | [DICTA Large]( | [classic[4]]( | [Etzion]( |
| **TEC_NER** | Basic technology NER model | [FusioNER/tec_ner]( | TEC_NER | [FusioNER/tec_ner]( | base model | TEC | [Yehoshua]( |
# Results
We test our models on the **IAHALT test set**. We also check another models, such as [DictaBert]( and [HeBert]( This is the performence results:
| Model name | Precision | Recall | F1 - Score | Time (in seconds) |
| :--------- | :-------: | :----: | :---------: | :---------------: |
| [**IAHALT_and_NEMO_PP**]( | 0.714 | 0.353 | 0.461 | 83.128 |
| [**HeBert**]( | 0.574 | 0.474 | 0.494 | 86.483 |
| [**NEMO**]( | 0.553 | 0.51 | 0.525 | 81.422 |
| [**IAHALT_and_NEMO**]( | 0.692 | 0.678 | 0.684 | 83.702 |
| [**Vitaly**]( | 0.883 | 0.794 | 0.836 | 83.773 |
| [**DictaBert**]( | 0.916 | 0.834 | 0.872 | **70.465** |
| [**DICTA_large**]( | **0.917** | 0.845 | 0.879 | 206.251 |
| [**Name-Sentences**]( | 0.895 | 0.865 | 0.879 | 82.674 |
| [**Basic**](FusioNER/Basic_IAHALT) | 0.897 | 0.866 | 0.881 | 84.479 |
| [**Smart_Injection**]( | 0.898 | 0.867 | 0.881 | 82.253 |
| [**DICTA_Basic**]( | 0.903 | **0.875** | 0.888 | **69.419** |
| [**DICTA_Large_Smart**]( | 0.904 | **0.875** | **0.889** | 204.324 |
| [**DICTA_Small_Smart**]( | 0.904 | **0.875** | **0.889** | **70.29** |
According to the results, we recommend to use [**DICTA_Small_Smart**]( model.
# Hebrew NLP models
You can find in the table Hebrew NLP models:
| Model name | Link | Creator |
| HeNLP/HeRo | [](HeNLP/HeRo) | Vitaly Shalumov and Harel Haskey |
| dicta-il/dictabert | []( | Shaltiel Shmidman and Avi Shmidman and Moshe Koppel |
| dicta-il/dictabert-large | []( | Shaltiel Shmidman and Avi Shmidman and Moshe Koppel |
| avichr/heBERT | []( | Avihay Chriqui and Inbal Yahav |
# Footnotes
#### [1] **Name-Sentences**:
Adding to the corpus sentences that contain only the entity we want the network to learn.
#### [2] **Entity-Injection**:
Replace a tagged entity in the original corpus with a new entity.
By using, this method, the model can learn new entities (not labels!) which the model not extracted before.
#### [3] **BI-BI Problem**:
Building training corpus when entities from the same type appear in sequence, labeled as continuations of one another.
For example, the text "讛讗专讬 驻讜讟专 讜专讜谉 讜讜讬讝诇讬" would tagged as **SINGLE** entity.
That problem prevent the model to extract entities correctly.
#### [4] **Classic**:
The classic NER types:
| entity type | full name | examples |
|:-----------:|:----------| --------:|
| **PER** | Person | 讗讚讜诇祝 讛讬讟诇专, 专讜讚讜诇祝 讛住, 诪专讚讻讬 讗谞讬诇讘讬抓 |
| **GPE** | Geopolitical Entity | 讙专诪谞讬讛, 驻讜诇讬谉, 讘专诇讬谉, 讜讜专砖讛 |
| **LOC** | Location | 诪讝专讞 讗讬专讜驻讛, 讗讙谉 讛讬诐 讛转讬讻讜谉, 讛讙诇讬诇 |
| **FAC** | Facility | 讗讜讜砖讜讜讬抓, 诪讙讚诇讬 讛转讗讜诪讬诐, 谞转讘"讙 2000, 专讞讜讘 拽驻诇谉 |
| **ORG** | Organization | 讛诪驻诇讙讛 讛谞讗爪讬转, 讞讘专转 讙讜讙诇, 诪诪砖诇转 讞讜祝 讛砖谞讛讘 |
| **TIMEX** | Time Expression | 1945, 砖谞转 1993, 讬讜诐 讛砖讜讗讛, 砖谞讜转 讛-90 |
| **EVE** | Event | 讛砖讜讗讛, 诪诇讞诪转 讛注讜诇诐 讛砖谞讬讬讛, 砖诇讟讜谉 讛讗驻专讟讛讬讬讚 |
| **TTL** | Title | 驻讬讛专专, 拽讬住专, 诪谞讻"诇 |
| **ANG** | Language | 注讘专讬转, 注专讘讬转, 讙专诪谞讬转 |
| **DUC** | Product | 驻讬讬住讘讜拽, F-16, 转谞讜讘讛 |
| **WOA** | Work of Art | 讚讜"讞 诪讘拽专 讛诪讚讬谞讛, 注讬转讜谉 讛讗专抓, 讛讗专讬 驻讜讟专, 转讬拽 2000, |
| **MISC** | Miscellaneous聽 | 拽讜专讜谞讛, 讛转讜 讛讬专讜拽, 诪讚诇讬转 讝讛讘, 讘讬讟拽讜讬谉 |
# Datasets for English NER (for cleaning wrong entities for english texts):
- [**ontonotes5**](
- [**conll2003**](
