|
--- |
|
language: |
|
- ar |
|
thumbnail: url to a thumbnail used in social sharing |
|
tags: |
|
- ner |
|
- token-classification |
|
- Arabic-NER |
|
metrics: |
|
- accuracy |
|
- f1 |
|
- precision |
|
- recall |
|
widget: |
|
- text: النجم محمد صلاح لاعب المنتخب المصري يعيش في مصر بالتحديد من نجريج, الشرقية |
|
example_title: Mohamed Salah |
|
- text: انا ساكن في حدايق الزتون و بدرس في جامعه عين شمس |
|
example_title: Egyptian Dialect |
|
- text: يقع نهر الأمازون في قارة أمريكا الجنوبية |
|
example_title: Standard Arabic |
|
datasets: |
|
- Fine-grained-Arabic-Named-Entity-Corpora |
|
pipeline_tag: token-classification |
|
--- |
|
|
|
|
|
|
|
|
|
|
|
# Arabic Named Entity Recognition |
|
|
|
This project is made to enrich the Arabic Named Entity Recognition(ANER). Arabic is a tough language to deal with and has alot of difficulties. |
|
We managed to made a model based on Arabert to support 50 entities. |
|
|
|
# Paper: |
|
|
|
This is the paper for the system, where you can find all the details: https://arxiv.org/abs/2308.14669 |
|
|
|
|
|
# Dataset |
|
|
|
- [Fine-grained Arabic Named Entity Corpora](https://fsalotaibi.kau.edu.sa/Pages-Arabic-NE-Corpora.aspx) |
|
|
|
|
|
# Evaluation results |
|
|
|
The model achieves the following results: |
|
|
|
| Dataset | WikiFANE Gold | WikiFANE Gold | WikiFANE Gold | NewsFANE Gold | NewsFANE Gold | NewsFANE Gold |
|
|:--------:|:-------:|:-------:|:------:|:------:|:---------:|:------:| |
|
| (metric) | (Recall) | (Precision) | (F1) | (Recall) | (Precision) | (F1) |
|
| | 87.0 | 90.5 | 88.7 | 78.1 | 77.4 | 77.7 |
|
|
|
|
|
# Usage |
|
|
|
The model is available on the HuggingFace model page under the name: [boda/ANER](https://huggingface.co/boda/ANER). Checkpoints are available only in PyTorch at the time. |
|
|
|
### Use in python: |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForTokenClassification |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("boda/ANER") |
|
|
|
model = AutoModelForTokenClassification.from_pretrained("boda/ANER") |
|
``` |
|
|
|
|
|
# Acknowledgments |
|
|
|
Thanks to [Arabert](https://github.com/aub-mind/arabert) for providing the Arabic Bert model, which we used as a base model for our work. |
|
|
|
We also would like to thank [Prof. Fahd Saleh S Alotaibi](https://fsalotaibi.kau.edu.sa/Pages-Arabic-NE-Corpora.aspx) at the Faculty of Computing and Information Technology King Abdulaziz University, for providing the dataset which we used to train our model with. |
|
|
|
# Contacts |
|
|
|
**Abdelrahman Atef** |
|
|
|
- [LinkedIn](linkedin.com/in/boda-sadalla) |
|
- [Github](https://github.com/BodaSadalla98) |
|
- <[email protected]> |