--- license: cc-by-sa-4.0 language: - ar tags: - relation-extraction - evidence-extraction - seq2seq datasets: - dru-ac/ArSRED --- # AREEj: Arabic Relation Extraction with Evidence You can use AREEj to extract relations from Arabic documents. Each document can contain multiple relations, and each relation contains six elements, the source, target, their named entities, relation type between them, and evidence. The evidence is used for two reasons: improving the Relation Extraction task, and explaining the LLM's predictions. You can also use it as an edge between the related entities. AREEj was introduced in the Proceedings of The Second Arabic Natural Language Processing Conference paper [AREEj: Arabic Relation Extraction with Evidence](https://aclanthology.org/2024.arabicnlp-1.6/). ### How to use ``` pip install transformers datasets evaluate transformers[torch] pip install sentencepiece ``` ```python from transformers import MBartTokenizer, MBartForConditionalGeneration import torch tokenizer = MBartTokenizer.from_pretrained('dru-ac/AREEj', max_length=1024) model = MBartForConditionalGeneration.from_pretrained('dru-ac/AREEj') device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') model.to(device) def generate_prediction(input_text): input_ids = tokenizer.encode(input_text, return_tensors="pt").to(device) with torch.no_grad(): output = model.generate( input_ids, decoder_start_token_id=tokenizer.lang_code_to_id['ar_AR'], ) prediction = tokenizer.decode(output[0], skip_special_tokens=False) return prediction input_text = 'تأسس المركز العربي للأبحاث ودراسة السياسات في عام 2010 في الدوحة في قطر' prediction = generate_prediction(input_text) print('Prediction:', prediction) ``` ### If you use the code or model, please reference this work in your paper: ``` @inproceedings{mraikhat-etal-2024-areej, title = "{AREE}j: {A}rabic Relation Extraction with Evidence", author = "Mraikhat, Osama and Hamoud, Hadi and Zaraket, Fadi", editor = "Habash, Nizar and Bouamor, Houda and Eskander, Ramy and Tomeh, Nadi and Abu Farha, Ibrahim and Abdelali, Ahmed and Touileb, Samia and Hamed, Injy and Onaizan, Yaser and Alhafni, Bashar and Antoun, Wissam and Khalifa, Salam and Haddad, Hatem and Zitouni, Imed and AlKhamissi, Badr and Almatham, Rawan and Mrini, Khalil", booktitle = "Proceedings of The Second Arabic Natural Language Processing Conference", month = aug, year = "2024", address = "Bangkok, Thailand", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2024.arabicnlp-1.6", pages = "67--72", abstract = "Relational entity extraction is key in building knowledge graphs. A relational entity has a source, a tail and atype. In this paper, we consider Arabic text and introduce evidence enrichment which intuitivelyinforms models for better predictions. Relational evidence is an expression in the textthat explains how sources and targets relate. {\%}It also provides hints from which models learn. This paper augments the existing relational extraction dataset with evidence annotation to its 2.9-million Arabic relations.We leverage the augmented dataset to build , a relation extraction with evidence model from Arabic documents. The evidence augmentation model we constructed to complete the dataset achieved .82 F1-score (.93 precision, .73 recall). The target outperformed SOTA mREBEL with .72 F1-score (.78 precision, .66 recall).", } ``` ### License This model is licensed under the CC BY-SA 4.0 license. The text of the license can be found [here](https://creativecommons.org/licenses/by-sa/4.0/).