README / README.md
etzion's picture
Update README.md
78a7ae3 verified
# Fusion NER Models
Here you can find NER models for Fusion project!
# Table of content:
1. [**NER-Models**](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#ner-models)
2. [**Results**](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#results)
3. [**Hebrew NLP models**](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#hebrew-nlp-models)
4. [**Footnotes**](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#footnotes)
# NER Models:
Here you can find a description on each of our models. Each row contains the model nickname, training description, model path (LINK), source dataset (with LINK), base model and entity types.
|model name | model description | model path | datasets | link to dataset | base model | entity types | trainer |
|:----------|:------------------|:-----------|:--------:|:----------------|:----------:| :----------- | :-----: |
| **Basic** | Basic training on IAHALT | [FusioNER/Basic_IAHALT](https://huggingface.co/FusioNER/Basic_IAHALT) | IAHALT | [FusioNER/Basic](https://huggingface.co/datasets/FusioNER/Basic) | [HeRo](https://huggingface.co/HeNLP/HeRo) | [classic[4]](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#4-classic) | [Etzion](https://huggingface.co/etzion) |
| **Vitaly** | Vitaly training on IAHALT (with [BI-BI problem](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#3-bi-bi-problem)) | [FusioNER/Vitaly_NER](https://huggingface.co/FusioNER/Vitaly_NER) | IAHALT | [FusioNER/Vitaly](https://huggingface.co/datasets/FusioNER/Vitaly) | [HeRo](https://huggingface.co/HeNLP/HeRo) | [classic[4]](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#4-classic) | [Vitaly]() |
| **Name-Sentences** | Training on IAHALT + [Name-Sentences](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#1-name-sentences) | [FusioNER/Name-Sentences](https://huggingface.co/FusioNER/Name-Sentences) | IAHALT | [FusioNER/Name_Sentences](https://huggingface.co/datasets/FusioNER/Name_Sentences) | [HeRo](https://huggingface.co/HeNLP/HeRo) | [classic[4]](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#4-classic) | [Etzion](https://huggingface.co/etzion) |
| **Entity-Injection** | Training on IAHALT + [Entity-Injection](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#2-entity-injection) | [FusioNER/Entity-Injection](https://huggingface.co/FusioNER/Entity-Injection) | IAHALT | [FusioNER/Entity_Injection](https://huggingface.co/datasets/FusioNER/Entity_Injection) | [HeRo](https://huggingface.co/HeNLP/HeRo) | [classic[4]](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#4-classic) | [Etzion](https://huggingface.co/etzion) |
| **Smart_Injection** | Training on IAHALT + [Name-Sentences](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#1-name-sentences) + [Entity-Injection](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#2-entity-injection) | [FusioNER/Smart_Injection](https://huggingface.co/FusioNER/Smart_Injection) | IAHALT | [FusioNER/Smart_Injection](https://huggingface.co/datasets/FusioNER/Smart_Injection) | [HeRo](https://huggingface.co/HeNLP/HeRo) | [classic[4]](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#4-classic) | [Etzion](https://huggingface.co/etzion) |
| **NEMO** | Basic training on NEMO dataset| [FusioNER/Nemo](https://huggingface.co/FusioNER/Nemo) | NEMO | [FusioNER/NEMO](https://huggingface.co/datasets/FusioNER/NEMO) | [HeRo](https://huggingface.co/HeNLP/HeRo) | [classic[4]](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#4-classic) | [Etzion](https://huggingface.co/etzion) |
| **IAHALT_and_NEMO** | Basic training on IAHALT + NEMO | [FusioNER/IAHALT_and_NEMO](https://huggingface.co/FusioNER/IAHALT_and_NEMO) | IAHALT + NEMO | [FusioNER/IAHALT_and_NEMO](https://huggingface.co/datasets/FusioNER/IAHALT_and_NEMO) | [HeRo](https://huggingface.co/HeNLP/HeRo) | [classic[4]](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#4-classic) | [Etzion](https://huggingface.co/etzion) |
| **IAHALT_and_NEMO_PP** | Training on IAHALT + NEMO + [Name-Sentences](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#1-name-sentences) + [Entity-Injection](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#2-entity-injection) | [FusioNER/IAHALT_and_NEMO_and_PP](https://huggingface.co/FusioNER/IAHALT_and_NEMO_and_PP) | IAHALT + NEMO | [FusioNER/IAHALT_and_NEMO_PP](https://huggingface.co/datasets/FusioNER/IAHALT_and_NEMO_PP) | [HeRo](https://huggingface.co/HeNLP/HeRo) | [classic[4]](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#4-classic) | [Etzion](https://huggingface.co/etzion) |
| **Animals** | Training on IAHALT + [Entity-Injection](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#2-entity-injection) (of animals names as PER entities) | [FusioNER/Animals](https://huggingface.co/FusioNER/Animals) | IAHALT | [FusioNER/Animals](https://huggingface.co/datasets/FusioNER/Animals) | [HeRo](https://huggingface.co/HeNLP/HeRo) | [classic[4]](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#4-classic) | [Etzion](https://huggingface.co/etzion) |
| **PRS-Injection** | Training on IAHALT + [Entity-Injection](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#2-entity-injection) (of PRS names as PER entities) | [FusioNER/PRS-Injection](https://huggingface.co/FusioNER/PRS-Injection) | IAHALT | [FusioNER/PRS_locations](https://huggingface.co/datasets/FusioNER/PRS_locations) | [HeRo](https://huggingface.co/HeNLP/HeRo) | [classic[4]](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#4-classic) | [Etzion](https://huggingface.co/etzion) |
| **DICTA_Basic** | Training the [DICTA](https://huggingface.co/dicta-il/dictabert) model on the [basic](https://huggingface.co/datasets/FusioNER/Basic) IAHALT dataset | [FusioNER/Dicta_Small_Basic](https://huggingface.co/FusioNER/Dicta_Small_Basic) | IAHALT | [FusioNER/Smart_Injection](https://huggingface.co/datasets/FusioNER/Smart_Injection) | [DICTA](https://huggingface.co/dicta-il/dictabert) | [classic[4]](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#4-classic) | [Etzion](https://huggingface.co/etzion) |
| **DICTA_Small_Smart** | Training the [DICTA](https://huggingface.co/dicta-il/dictabert) model on IAHALT + [Name-Sentences](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#1-name-sentences) + [Entity-Injection](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#2-entity-injection)] [dataset](https://huggingface.co/datasets/FusioNER/Smart_Injection) | [FusioNER/Dicta_Small_Smart](https://huggingface.co/FusioNER/Dicta_Small_Smart) | IAHALT | [FusioNER/Smart_Injection](https://huggingface.co/datasets/FusioNER/Smart_Injection) | [DICTA](https://huggingface.co/dicta-il/dictabert) | [classic[4]](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#4-classic) | [Etzion](https://huggingface.co/etzion) |
| **DICTA_basic_NER** | Training the [DICTA-ner](https://huggingface.co/dicta-il/dictabert-ner) model on the [basic](https://huggingface.co/datasets/FusioNER/Basic) IAHALT dataset| [FusioNER/DICTA_basic](https://huggingface.co/FusioNER/DICTA_basic) | IAHALT | [FusioNER/Basic](https://huggingface.co/datasets/FusioNER/Basic) | [DICTA-ner](https://huggingface.co/dicta-il/dictabert-ner) | [classic[4]](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#4-classic) | [Etzion](https://huggingface.co/etzion) |
| **DICTA_smart_NER** | Training the [DICTA-ner](https://huggingface.co/dicta-il/dictabert-ner) model on IAHALT + [Name-Sentences](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#1-name-sentences) + [Entity-Injection](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#2-entity-injection)] [dataset](https://huggingface.co/datasets/FusioNER/Smart_Injection) | [FusioNER/DICTA_Smart](https://huggingface.co/FusioNER/DICTA_Smart) | IAHALT | [FusioNER/Smart_Injection](https://huggingface.co/datasets/FusioNER/Smart_Injection) | [DICTA-ner](https://huggingface.co/dicta-il/dictabert-ner) | [classic[4]](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#4-classic) | [Etzion](https://huggingface.co/etzion) |
| **DICTA_Large_Smart** | Training the [DICTA Large](https://huggingface.co/dicta-il/dictabert-large) model on IAHALT + [Name-Sentences](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#1-name-sentences) + [Entity-Injection](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#2-entity-injection)] [dataset](https://huggingface.co/datasets/FusioNER/Smart_Injection) | [FusioNER/Dicta_Large_Smart](https://huggingface.co/FusioNER/Dicta_Large_Smart) | IAHALT | [FusioNER/Smart_Injection](https://huggingface.co/datasets/FusioNER/Smart_Injection) | [DICTA Large](https://huggingface.co/dicta-il/dictabert-large) | [classic[4]](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#4-classic) | [Etzion](https://huggingface.co/etzion) |
| **TEC_NER** | Basic technology NER model | [FusioNER/tec_ner](https://huggingface.co/FusioNER/tec_ner) | TEC_NER | [FusioNER/tec_ner](https://huggingface.co/datasets/FusioNER/tec_ner) | base model | TEC | [Yehoshua](https://huggingface.co/yehoshuadiller) |
# Results
We test our models on the **IAHALT test set**. We also check another models, such as [DictaBert](https://huggingface.co/dicta-il/dictabert) and [HeBert](https://huggingface.co/avichr/heBERT). This is the performence results:
| Model name | Precision | Recall | F1 - Score | Time (in seconds) |
| :--------- | :-------: | :----: | :---------: | :---------------: |
| [**IAHALT_and_NEMO_PP**](https://huggingface.co/FusioNER/IAHALT_and_NEMO_and_PP) | 0.714 | 0.353 | 0.461 | 83.128 |
| [**HeBert**](https://huggingface.co/avichr/heBERT) | 0.574 | 0.474 | 0.494 | 86.483 |
| [**NEMO**](https://huggingface.co/FusioNER/Nemo) | 0.553 | 0.51 | 0.525 | 81.422 |
| [**IAHALT_and_NEMO**](https://huggingface.co/FusioNER/IAHALT_and_NEMO) | 0.692 | 0.678 | 0.684 | 83.702 |
| [**Vitaly**](https://huggingface.co/FusioNER/Vitaly_NER) | 0.883 | 0.794 | 0.836 | 83.773 |
| [**DictaBert**](https://huggingface.co/dicta-il/dictabert) | 0.916 | 0.834 | 0.872 | **70.465** |
| [**DICTA_large**](https://huggingface.co/dicta-il/dictabert-large) | **0.917** | 0.845 | 0.879 | 206.251 |
| [**Name-Sentences**](https://huggingface.co/FusioNER/Name-Sentences) | 0.895 | 0.865 | 0.879 | 82.674 |
| [**Basic**](FusioNER/Basic_IAHALT) | 0.897 | 0.866 | 0.881 | 84.479 |
| [**Smart_Injection**](https://huggingface.co/FusioNER/Smart_Injection) | 0.898 | 0.867 | 0.881 | 82.253 |
| [**DICTA_Basic**](https://huggingface.co/FusioNER/Dicta_Small_Basic) | 0.903 | **0.875** | 0.888 | **69.419** |
| [**DICTA_Large_Smart**](https://huggingface.co/FusioNER/Dicta_Large_Smart) | 0.904 | **0.875** | **0.889** | 204.324 |
| [**DICTA_Small_Smart**](https://huggingface.co/FusioNER/Dicta_Small_Smart) | 0.904 | **0.875** | **0.889** | **70.29** |
According to the results, we recommend to use [**DICTA_Small_Smart**](https://huggingface.co/FusioNER/Dicta_Small_Smart) model.
# Hebrew NLP models
You can find in the table Hebrew NLP models:
| Model name | Link | Creator |
|:-----------|:-----|:--------|
| HeNLP/HeRo | [https://huggingface.co/HeNLP/HeRo](HeNLP/HeRo) | Vitaly Shalumov and Harel Haskey |
| dicta-il/dictabert | [https://huggingface.co/dicta-il/dictabert](https://huggingface.co/dicta-il/dictabert) | Shaltiel Shmidman and Avi Shmidman and Moshe Koppel |
| dicta-il/dictabert-large | [https://huggingface.co/dicta-il/dictabert-large](https://huggingface.co/dicta-il/dictabert-large) | Shaltiel Shmidman and Avi Shmidman and Moshe Koppel |
| avichr/heBERT | [https://huggingface.co/avichr/heBERT](https://huggingface.co/avichr/heBERT) | Avihay Chriqui and Inbal Yahav |
# Footnotes
#### [1] **Name-Sentences**:
Adding to the corpus sentences that contain only the entity we want the network to learn.
#### [2] **Entity-Injection**:
Replace a tagged entity in the original corpus with a new entity.
By using, this method, the model can learn new entities (not labels!) which the model not extracted before.
#### [3] **BI-BI Problem**:
Building training corpus when entities from the same type appear in sequence, labeled as continuations of one another.
For example, the text "讛讗专讬 驻讜讟专 讜专讜谉 讜讜讬讝诇讬" would tagged as **SINGLE** entity.
That problem prevent the model to extract entities correctly.
#### [4] **Classic**:
The classic NER types:
| entity type | full name | examples |
|:-----------:|:----------| --------:|
| **PER** | Person | 讗讚讜诇祝 讛讬讟诇专, 专讜讚讜诇祝 讛住, 诪专讚讻讬 讗谞讬诇讘讬抓 |
| **GPE** | Geopolitical Entity | 讙专诪谞讬讛, 驻讜诇讬谉, 讘专诇讬谉, 讜讜专砖讛 |
| **LOC** | Location | 诪讝专讞 讗讬专讜驻讛, 讗讙谉 讛讬诐 讛转讬讻讜谉, 讛讙诇讬诇 |
| **FAC** | Facility | 讗讜讜砖讜讜讬抓, 诪讙讚诇讬 讛转讗讜诪讬诐, 谞转讘"讙 2000, 专讞讜讘 拽驻诇谉 |
| **ORG** | Organization | 讛诪驻诇讙讛 讛谞讗爪讬转, 讞讘专转 讙讜讙诇, 诪诪砖诇转 讞讜祝 讛砖谞讛讘 |
| **TIMEX** | Time Expression | 1945, 砖谞转 1993, 讬讜诐 讛砖讜讗讛, 砖谞讜转 讛-90 |
| **EVE** | Event | 讛砖讜讗讛, 诪诇讞诪转 讛注讜诇诐 讛砖谞讬讬讛, 砖诇讟讜谉 讛讗驻专讟讛讬讬讚 |
| **TTL** | Title | 驻讬讛专专, 拽讬住专, 诪谞讻"诇 |
| **ANG** | Language | 注讘专讬转, 注专讘讬转, 讙专诪谞讬转 |
| **DUC** | Product | 驻讬讬住讘讜拽, F-16, 转谞讜讘讛 |
| **WOA** | Work of Art | 讚讜"讞 诪讘拽专 讛诪讚讬谞讛, 注讬转讜谉 讛讗专抓, 讛讗专讬 驻讜讟专, 转讬拽 2000, |
| **MISC** | Miscellaneous聽 | 拽讜专讜谞讛, 讛转讜 讛讬专讜拽, 诪讚诇讬转 讝讛讘, 讘讬讟拽讜讬谉 |
# Datasets for English NER (for cleaning wrong entities for english texts):
- [**ontonotes5**](https://huggingface.co/datasets/tner/ontonotes5)
- [**conll2003**](https://huggingface.co/datasets/eriktks/conll2003)
**MIT License**