|
|
|
# Fusion NER Models |
|
|
|
Here you can find NER models for Fusion project! |
|
|
|
# Table of content: |
|
1. [**NER-Models**](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#ner-models) |
|
2. [**Results**](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#results) |
|
3. [**Hebrew NLP models**](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#hebrew-nlp-models) |
|
4. [**Footnotes**](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#footnotes) |
|
|
|
# NER Models: |
|
|
|
Here you can find a description on each of our models. Each row contains the model nickname, training description, model path (LINK), source dataset (with LINK), base model and entity types. |
|
|
|
|model name | model description | model path | datasets | link to dataset | base model | entity types | trainer | |
|
|:----------|:------------------|:-----------|:--------:|:----------------|:----------:| :----------- | :-----: | |
|
| **Basic** | Basic training on IAHALT | [FusioNER/Basic_IAHALT](https://huggingface.co/FusioNER/Basic_IAHALT) | IAHALT | [FusioNER/Basic](https://huggingface.co/datasets/FusioNER/Basic) | [HeRo](https://huggingface.co/HeNLP/HeRo) | [classic[4]](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#4-classic) | [Etzion](https://huggingface.co/etzion) | |
|
| **Vitaly** | Vitaly training on IAHALT (with [BI-BI problem](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#3-bi-bi-problem)) | [FusioNER/Vitaly_NER](https://huggingface.co/FusioNER/Vitaly_NER) | IAHALT | [FusioNER/Vitaly](https://huggingface.co/datasets/FusioNER/Vitaly) | [HeRo](https://huggingface.co/HeNLP/HeRo) | [classic[4]](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#4-classic) | [Vitaly]() | |
|
| **Name-Sentences** | Training on IAHALT + [Name-Sentences](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#1-name-sentences) | [FusioNER/Name-Sentences](https://huggingface.co/FusioNER/Name-Sentences) | IAHALT | [FusioNER/Name_Sentences](https://huggingface.co/datasets/FusioNER/Name_Sentences) | [HeRo](https://huggingface.co/HeNLP/HeRo) | [classic[4]](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#4-classic) | [Etzion](https://huggingface.co/etzion) | |
|
| **Entity-Injection** | Training on IAHALT + [Entity-Injection](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#2-entity-injection) | [FusioNER/Entity-Injection](https://huggingface.co/FusioNER/Entity-Injection) | IAHALT | [FusioNER/Entity_Injection](https://huggingface.co/datasets/FusioNER/Entity_Injection) | [HeRo](https://huggingface.co/HeNLP/HeRo) | [classic[4]](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#4-classic) | [Etzion](https://huggingface.co/etzion) | |
|
| **Smart_Injection** | Training on IAHALT + [Name-Sentences](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#1-name-sentences) + [Entity-Injection](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#2-entity-injection) | [FusioNER/Smart_Injection](https://huggingface.co/FusioNER/Smart_Injection) | IAHALT | [FusioNER/Smart_Injection](https://huggingface.co/datasets/FusioNER/Smart_Injection) | [HeRo](https://huggingface.co/HeNLP/HeRo) | [classic[4]](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#4-classic) | [Etzion](https://huggingface.co/etzion) | |
|
| **NEMO** | Basic training on NEMO dataset| [FusioNER/Nemo](https://huggingface.co/FusioNER/Nemo) | NEMO | [FusioNER/NEMO](https://huggingface.co/datasets/FusioNER/NEMO) | [HeRo](https://huggingface.co/HeNLP/HeRo) | [classic[4]](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#4-classic) | [Etzion](https://huggingface.co/etzion) | |
|
| **IAHALT_and_NEMO** | Basic training on IAHALT + NEMO | [FusioNER/IAHALT_and_NEMO](https://huggingface.co/FusioNER/IAHALT_and_NEMO) | IAHALT + NEMO | [FusioNER/IAHALT_and_NEMO](https://huggingface.co/datasets/FusioNER/IAHALT_and_NEMO) | [HeRo](https://huggingface.co/HeNLP/HeRo) | [classic[4]](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#4-classic) | [Etzion](https://huggingface.co/etzion) | |
|
| **IAHALT_and_NEMO_PP** | Training on IAHALT + NEMO + [Name-Sentences](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#1-name-sentences) + [Entity-Injection](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#2-entity-injection) | [FusioNER/IAHALT_and_NEMO_and_PP](https://huggingface.co/FusioNER/IAHALT_and_NEMO_and_PP) | IAHALT + NEMO | [FusioNER/IAHALT_and_NEMO_PP](https://huggingface.co/datasets/FusioNER/IAHALT_and_NEMO_PP) | [HeRo](https://huggingface.co/HeNLP/HeRo) | [classic[4]](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#4-classic) | [Etzion](https://huggingface.co/etzion) | |
|
| **Animals** | Training on IAHALT + [Entity-Injection](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#2-entity-injection) (of animals names as PER entities) | [FusioNER/Animals](https://huggingface.co/FusioNER/Animals) | IAHALT | [FusioNER/Animals](https://huggingface.co/datasets/FusioNER/Animals) | [HeRo](https://huggingface.co/HeNLP/HeRo) | [classic[4]](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#4-classic) | [Etzion](https://huggingface.co/etzion) | |
|
| **PRS-Injection** | Training on IAHALT + [Entity-Injection](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#2-entity-injection) (of PRS names as PER entities) | [FusioNER/PRS-Injection](https://huggingface.co/FusioNER/PRS-Injection) | IAHALT | [FusioNER/PRS_locations](https://huggingface.co/datasets/FusioNER/PRS_locations) | [HeRo](https://huggingface.co/HeNLP/HeRo) | [classic[4]](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#4-classic) | [Etzion](https://huggingface.co/etzion) | |
|
| **DICTA_Basic** | Training the [DICTA](https://huggingface.co/dicta-il/dictabert) model on the [basic](https://huggingface.co/datasets/FusioNER/Basic) IAHALT dataset | [FusioNER/Dicta_Small_Basic](https://huggingface.co/FusioNER/Dicta_Small_Basic) | IAHALT | [FusioNER/Smart_Injection](https://huggingface.co/datasets/FusioNER/Smart_Injection) | [DICTA](https://huggingface.co/dicta-il/dictabert) | [classic[4]](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#4-classic) | [Etzion](https://huggingface.co/etzion) | |
|
| **DICTA_Small_Smart** | Training the [DICTA](https://huggingface.co/dicta-il/dictabert) model on IAHALT + [Name-Sentences](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#1-name-sentences) + [Entity-Injection](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#2-entity-injection)] [dataset](https://huggingface.co/datasets/FusioNER/Smart_Injection) | [FusioNER/Dicta_Small_Smart](https://huggingface.co/FusioNER/Dicta_Small_Smart) | IAHALT | [FusioNER/Smart_Injection](https://huggingface.co/datasets/FusioNER/Smart_Injection) | [DICTA](https://huggingface.co/dicta-il/dictabert) | [classic[4]](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#4-classic) | [Etzion](https://huggingface.co/etzion) | |
|
| **DICTA_basic_NER** | Training the [DICTA-ner](https://huggingface.co/dicta-il/dictabert-ner) model on the [basic](https://huggingface.co/datasets/FusioNER/Basic) IAHALT dataset| [FusioNER/DICTA_basic](https://huggingface.co/FusioNER/DICTA_basic) | IAHALT | [FusioNER/Basic](https://huggingface.co/datasets/FusioNER/Basic) | [DICTA-ner](https://huggingface.co/dicta-il/dictabert-ner) | [classic[4]](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#4-classic) | [Etzion](https://huggingface.co/etzion) | |
|
| **DICTA_smart_NER** | Training the [DICTA-ner](https://huggingface.co/dicta-il/dictabert-ner) model on IAHALT + [Name-Sentences](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#1-name-sentences) + [Entity-Injection](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#2-entity-injection)] [dataset](https://huggingface.co/datasets/FusioNER/Smart_Injection) | [FusioNER/DICTA_Smart](https://huggingface.co/FusioNER/DICTA_Smart) | IAHALT | [FusioNER/Smart_Injection](https://huggingface.co/datasets/FusioNER/Smart_Injection) | [DICTA-ner](https://huggingface.co/dicta-il/dictabert-ner) | [classic[4]](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#4-classic) | [Etzion](https://huggingface.co/etzion) | |
|
| **DICTA_Large_Smart** | Training the [DICTA Large](https://huggingface.co/dicta-il/dictabert-large) model on IAHALT + [Name-Sentences](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#1-name-sentences) + [Entity-Injection](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#2-entity-injection)] [dataset](https://huggingface.co/datasets/FusioNER/Smart_Injection) | [FusioNER/Dicta_Large_Smart](https://huggingface.co/FusioNER/Dicta_Large_Smart) | IAHALT | [FusioNER/Smart_Injection](https://huggingface.co/datasets/FusioNER/Smart_Injection) | [DICTA Large](https://huggingface.co/dicta-il/dictabert-large) | [classic[4]](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#4-classic) | [Etzion](https://huggingface.co/etzion) | |
|
| **TEC_NER** | Basic technology NER model | [FusioNER/tec_ner](https://huggingface.co/FusioNER/tec_ner) | TEC_NER | [FusioNER/tec_ner](https://huggingface.co/datasets/FusioNER/tec_ner) | base model | TEC | [Yehoshua](https://huggingface.co/yehoshuadiller) | |
|
|
|
# Results |
|
We test our models on the **IAHALT test set**. We also check another models, such as [DictaBert](https://huggingface.co/dicta-il/dictabert) and [HeBert](https://huggingface.co/avichr/heBERT). This is the performence results: |
|
|
|
| Model name | Precision | Recall | F1 - Score | Time (in seconds) | |
|
| :--------- | :-------: | :----: | :---------: | :---------------: | |
|
| [**IAHALT_and_NEMO_PP**](https://huggingface.co/FusioNER/IAHALT_and_NEMO_and_PP) | 0.714 | 0.353 | 0.461 | 83.128 | |
|
| [**HeBert**](https://huggingface.co/avichr/heBERT) | 0.574 | 0.474 | 0.494 | 86.483 | |
|
| [**NEMO**](https://huggingface.co/FusioNER/Nemo) | 0.553 | 0.51 | 0.525 | 81.422 | |
|
| [**IAHALT_and_NEMO**](https://huggingface.co/FusioNER/IAHALT_and_NEMO) | 0.692 | 0.678 | 0.684 | 83.702 | |
|
| [**Vitaly**](https://huggingface.co/FusioNER/Vitaly_NER) | 0.883 | 0.794 | 0.836 | 83.773 | |
|
| [**DictaBert**](https://huggingface.co/dicta-il/dictabert) | 0.916 | 0.834 | 0.872 | **70.465** | |
|
| [**DICTA_large**](https://huggingface.co/dicta-il/dictabert-large) | **0.917** | 0.845 | 0.879 | 206.251 | |
|
| [**Name-Sentences**](https://huggingface.co/FusioNER/Name-Sentences) | 0.895 | 0.865 | 0.879 | 82.674 | |
|
| [**Basic**](FusioNER/Basic_IAHALT) | 0.897 | 0.866 | 0.881 | 84.479 | |
|
| [**Smart_Injection**](https://huggingface.co/FusioNER/Smart_Injection) | 0.898 | 0.867 | 0.881 | 82.253 | |
|
| [**DICTA_Basic**](https://huggingface.co/FusioNER/Dicta_Small_Basic) | 0.903 | **0.875** | 0.888 | **69.419** | |
|
| [**DICTA_Large_Smart**](https://huggingface.co/FusioNER/Dicta_Large_Smart) | 0.904 | **0.875** | **0.889** | 204.324 | |
|
| [**DICTA_Small_Smart**](https://huggingface.co/FusioNER/Dicta_Small_Smart) | 0.904 | **0.875** | **0.889** | **70.29** | |
|
|
|
According to the results, we recommend to use [**DICTA_Small_Smart**](https://huggingface.co/FusioNER/Dicta_Small_Smart) model. |
|
|
|
# Hebrew NLP models |
|
You can find in the table Hebrew NLP models: |
|
|
|
| Model name | Link | Creator | |
|
|:-----------|:-----|:--------| |
|
| HeNLP/HeRo | [https://huggingface.co/HeNLP/HeRo](HeNLP/HeRo) | Vitaly Shalumov and Harel Haskey | |
|
| dicta-il/dictabert | [https://huggingface.co/dicta-il/dictabert](https://huggingface.co/dicta-il/dictabert) | Shaltiel Shmidman and Avi Shmidman and Moshe Koppel | |
|
| dicta-il/dictabert-large | [https://huggingface.co/dicta-il/dictabert-large](https://huggingface.co/dicta-il/dictabert-large) | Shaltiel Shmidman and Avi Shmidman and Moshe Koppel | |
|
| avichr/heBERT | [https://huggingface.co/avichr/heBERT](https://huggingface.co/avichr/heBERT) | Avihay Chriqui and Inbal Yahav | |
|
|
|
# Footnotes |
|
|
|
#### [1] **Name-Sentences**: |
|
Adding to the corpus sentences that contain only the entity we want the network to learn. |
|
|
|
#### [2] **Entity-Injection**: |
|
Replace a tagged entity in the original corpus with a new entity. |
|
By using, this method, the model can learn new entities (not labels!) which the model not extracted before. |
|
|
|
#### [3] **BI-BI Problem**: |
|
Building training corpus when entities from the same type appear in sequence, labeled as continuations of one another. |
|
For example, the text "讛讗专讬 驻讜讟专 讜专讜谉 讜讜讬讝诇讬" would tagged as **SINGLE** entity. |
|
That problem prevent the model to extract entities correctly. |
|
|
|
#### [4] **Classic**: |
|
|
|
The classic NER types: |
|
|
|
| entity type | full name | examples | |
|
|:-----------:|:----------| --------:| |
|
| **PER** | Person | 讗讚讜诇祝 讛讬讟诇专, 专讜讚讜诇祝 讛住, 诪专讚讻讬 讗谞讬诇讘讬抓 | |
|
| **GPE** | Geopolitical Entity | 讙专诪谞讬讛, 驻讜诇讬谉, 讘专诇讬谉, 讜讜专砖讛 | |
|
| **LOC** | Location | 诪讝专讞 讗讬专讜驻讛, 讗讙谉 讛讬诐 讛转讬讻讜谉, 讛讙诇讬诇 | |
|
| **FAC** | Facility | 讗讜讜砖讜讜讬抓, 诪讙讚诇讬 讛转讗讜诪讬诐, 谞转讘"讙 2000, 专讞讜讘 拽驻诇谉 | |
|
| **ORG** | Organization | 讛诪驻诇讙讛 讛谞讗爪讬转, 讞讘专转 讙讜讙诇, 诪诪砖诇转 讞讜祝 讛砖谞讛讘 | |
|
| **TIMEX** | Time Expression | 1945, 砖谞转 1993, 讬讜诐 讛砖讜讗讛, 砖谞讜转 讛-90 | |
|
| **EVE** | Event | 讛砖讜讗讛, 诪诇讞诪转 讛注讜诇诐 讛砖谞讬讬讛, 砖诇讟讜谉 讛讗驻专讟讛讬讬讚 | |
|
| **TTL** | Title | 驻讬讛专专, 拽讬住专, 诪谞讻"诇 | |
|
| **ANG** | Language | 注讘专讬转, 注专讘讬转, 讙专诪谞讬转 | |
|
| **DUC** | Product | 驻讬讬住讘讜拽, F-16, 转谞讜讘讛 | |
|
| **WOA** | Work of Art | 讚讜"讞 诪讘拽专 讛诪讚讬谞讛, 注讬转讜谉 讛讗专抓, 讛讗专讬 驻讜讟专, 转讬拽 2000, | |
|
| **MISC** | Miscellaneous聽 | 拽讜专讜谞讛, 讛转讜 讛讬专讜拽, 诪讚诇讬转 讝讛讘, 讘讬讟拽讜讬谉 | |
|
|
|
# Datasets for English NER (for cleaning wrong entities for english texts): |
|
- [**ontonotes5**](https://huggingface.co/datasets/tner/ontonotes5) |
|
- [**conll2003**](https://huggingface.co/datasets/eriktks/conll2003) |
|
|
|
|
|
|
|
**MIT License** |
|
|