AREEj: Arabic Relation Extraction with Evidence
You can use AREEj to extract relations from Arabic documents. Each document can contain multiple relations, and each relation contains six elements, the source, target, their named entities, relation type between them, and evidence. The evidence is used for two reasons: improving the Relation Extraction task, and explaining the LLM's predictions. You can also use it as an edge between the related entities.
AREEj was introduced in the Proceedings of The Second Arabic Natural Language Processing Conference paper AREEj: Arabic Relation Extraction with Evidence.
How to use
pip install transformers datasets evaluate transformers[torch]
pip install sentencepiece
from transformers import MBartTokenizer, MBartForConditionalGeneration
import torch
tokenizer = MBartTokenizer.from_pretrained('dru-ac/AREEj', max_length=1024)
model = MBartForConditionalGeneration.from_pretrained('dru-ac/AREEj')
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)
def generate_prediction(input_text):
input_ids = tokenizer.encode(input_text, return_tensors="pt").to(device)
with torch.no_grad():
output = model.generate(
input_ids,
decoder_start_token_id=tokenizer.lang_code_to_id['ar_AR'],
)
prediction = tokenizer.decode(output[0], skip_special_tokens=False)
return prediction
input_text = 'تأسس المركز العربي للأبحاث ودراسة السياسات في عام 2010 في الدوحة في قطر'
prediction = generate_prediction(input_text)
print('Prediction:', prediction)
If you use the code or model, please reference this work in your paper:
@inproceedings{mraikhat-etal-2024-areej,
title = "{AREE}j: {A}rabic Relation Extraction with Evidence",
author = "Mraikhat, Osama and
Hamoud, Hadi and
Zaraket, Fadi",
editor = "Habash, Nizar and
Bouamor, Houda and
Eskander, Ramy and
Tomeh, Nadi and
Abu Farha, Ibrahim and
Abdelali, Ahmed and
Touileb, Samia and
Hamed, Injy and
Onaizan, Yaser and
Alhafni, Bashar and
Antoun, Wissam and
Khalifa, Salam and
Haddad, Hatem and
Zitouni, Imed and
AlKhamissi, Badr and
Almatham, Rawan and
Mrini, Khalil",
booktitle = "Proceedings of The Second Arabic Natural Language Processing Conference",
month = aug,
year = "2024",
address = "Bangkok, Thailand",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.arabicnlp-1.6",
pages = "67--72",
abstract = "Relational entity extraction is key in building knowledge graphs. A relational entity has a source, a tail and atype. In this paper, we consider Arabic text and introduce evidence enrichment which intuitivelyinforms models for better predictions. Relational evidence is an expression in the textthat explains how sources and targets relate. {\%}It also provides hints from which models learn. This paper augments the existing relational extraction dataset with evidence annotation to its 2.9-million Arabic relations.We leverage the augmented dataset to build , a relation extraction with evidence model from Arabic documents. The evidence augmentation model we constructed to complete the dataset achieved .82 F1-score (.93 precision, .73 recall). The target outperformed SOTA mREBEL with .72 F1-score (.78 precision, .66 recall).",
}
License
This model is licensed under the CC BY-SA 4.0 license. The text of the license can be found here.
- Downloads last month
- 16