Arabic Named Entity Recognition

This project is made to enrich the Arabic Named Entity Recognition(ANER). Arabic is a tough language to deal with and has alot of difficulties. We managed to made a model based on Arabert to support 50 entities.

Paper:

This is the paper for the system, where you can find all the details: https://arxiv.org/abs/2308.14669

Dataset

Evaluation results

The model achieves the following results:

Dataset WikiFANE Gold WikiFANE Gold WikiFANE Gold NewsFANE Gold NewsFANE Gold NewsFANE Gold
(metric) (Recall) (Precision) (F1) (Recall) (Precision) (F1)
87.0 90.5 88.7 78.1 77.4 77.7

Usage

The model is available on the HuggingFace model page under the name: boda/ANER. Checkpoints are available only in PyTorch at the time.

Use in python:

from transformers import AutoTokenizer, AutoModelForTokenClassification

tokenizer = AutoTokenizer.from_pretrained("boda/ANER")

model = AutoModelForTokenClassification.from_pretrained("boda/ANER")

Acknowledgments

Thanks to Arabert for providing the Arabic Bert model, which we used as a base model for our work.

We also would like to thank Prof. Fahd Saleh S Alotaibi at the Faculty of Computing and Information Technology King Abdulaziz University, for providing the dataset which we used to train our model with.

Contacts

Abdelrahman Atef

Downloads last month
1,892
Safetensors
Model size
135M params
Tensor type
I64
·
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.