Best Practices and Lessons Learned on Synthetic Data for Language Models Paper • 2404.07503 • Published Apr 11, 2024 • 29
Leveraging Corpus Metadata to Detect Template-based Translation: An Exploratory Case Study of the Egyptian Arabic Wikipedia Edition Paper • 2404.00565 • Published Mar 31, 2024 • 6
Dar Datasets Collection datasets uploaded by https://github.com/ARBML/dar • 200 items • Updated Aug 22, 2024 • 9
Arabic Synonym BERT-based Adversarial Examples for Text Classification Paper • 2402.03477 • Published Feb 5, 2024 • 2
CIDAR: Culturally Relevant Instruction Dataset For Arabic Paper • 2402.03177 • Published Feb 5, 2024 • 6