AREEj: Arabic Relation Extraction with Evidence

You can use AREEj to extract relations from Arabic documents. Each document can contain multiple relations, and each relation contains six elements, the source, target, their named entities, relation type between them, and evidence. The evidence is used for two reasons: improving the Relation Extraction task, and explaining the LLM's predictions. You can also use it as an edge between the related entities.

AREEj was introduced in the Proceedings of The Second Arabic Natural Language Processing Conference paper AREEj: Arabic Relation Extraction with Evidence.

How to use

pip install transformers datasets evaluate transformers[torch]
pip install sentencepiece
from transformers import MBartTokenizer, MBartForConditionalGeneration
import torch

tokenizer = MBartTokenizer.from_pretrained('dru-ac/AREEj', max_length=1024)
model = MBartForConditionalGeneration.from_pretrained('dru-ac/AREEj')
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

model.to(device)

def generate_prediction(input_text):
    input_ids = tokenizer.encode(input_text, return_tensors="pt").to(device)    
    with torch.no_grad():
        output = model.generate(
            input_ids,
            decoder_start_token_id=tokenizer.lang_code_to_id['ar_AR'],
        )

    prediction = tokenizer.decode(output[0], skip_special_tokens=False)
    
    return prediction

input_text = 'تأسس المركز العربي للأبحاث ودراسة السياسات في عام 2010 في الدوحة في قطر'
prediction = generate_prediction(input_text)
print('Prediction:', prediction)

If you use the code or model, please reference this work in your paper:

@inproceedings{mraikhat-etal-2024-areej,
    title = "{AREE}j: {A}rabic Relation Extraction with Evidence",
    author = "Mraikhat, Osama  and
      Hamoud, Hadi  and
      Zaraket, Fadi",
    editor = "Habash, Nizar  and
      Bouamor, Houda  and
      Eskander, Ramy  and
      Tomeh, Nadi  and
      Abu Farha, Ibrahim  and
      Abdelali, Ahmed  and
      Touileb, Samia  and
      Hamed, Injy  and
      Onaizan, Yaser  and
      Alhafni, Bashar  and
      Antoun, Wissam  and
      Khalifa, Salam  and
      Haddad, Hatem  and
      Zitouni, Imed  and
      AlKhamissi, Badr  and
      Almatham, Rawan  and
      Mrini, Khalil",
    booktitle = "Proceedings of The Second Arabic Natural Language Processing Conference",
    month = aug,
    year = "2024",
    address = "Bangkok, Thailand",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.arabicnlp-1.6",
    pages = "67--72",
    abstract = "Relational entity extraction is key in building knowledge graphs. A relational entity has a source, a tail and atype. In this paper, we consider Arabic text and introduce evidence enrichment which intuitivelyinforms models for better predictions. Relational evidence is an expression in the textthat explains how sources and targets relate. {\%}It also provides hints from which models learn. This paper augments the existing relational extraction dataset with evidence annotation to its 2.9-million Arabic relations.We leverage the augmented dataset to build , a relation extraction with evidence model from Arabic documents. The evidence augmentation model we constructed to complete the dataset achieved .82 F1-score (.93 precision, .73 recall). The target outperformed SOTA mREBEL with .72 F1-score (.78 precision, .66 recall).",
}

License

This model is licensed under the CC BY-SA 4.0 license. The text of the license can be found here.

Downloads last month
16
Safetensors
Model size
611M params
Tensor type
F32
·
Inference API
Unable to determine this model's library. Check the docs .

Dataset used to train dru-ac/AREEj