|
--- |
|
library_name: transformers |
|
tags: |
|
- readability |
|
license: mit |
|
base_model: |
|
- CAMeL-Lab/bert-base-arabic-camelbert-msa |
|
pipeline_tag: text-classification |
|
--- |
|
# CAMeLBERT+Word+CE Readability Model |
|
|
|
## Model description |
|
**CAMeLBERT+Word+CE** is a readability assessment model that was built by fine-tuning the **CAMeLBERT-msa** model with cross-entropy loss (**CE**). |
|
For the fine-tuning, we used the **Word** input variant from [BAREC-Corpus-v1.0](https://huggingface.co/datasets/CAMeL-Lab/BAREC-Corpus-v1.0). |
|
Our fine-tuning procedure and the hyperparameters we used can be found in our paper *"[A Large and Balanced Corpus for Fine-grained Arabic Readability Assessment](https://arxiv.org/abs/2502.13520)."* |
|
|
|
## Intended uses |
|
You can use the CAMeLBERT+Word+CE model as part of the transformers pipeline. |
|
|
|
## How to use |
|
To use the model with a transformers pipeline: |
|
|
|
```python |
|
>>> from transformers import pipeline |
|
>>> readability = pipeline("text-classification", model="CAMeL-Lab/readability-camelbert-word-CE") |
|
>>> text = 'و قال له انه يحب اكل الطعام بكثره' |
|
>>> readability_level = int(readability(text)[0]['label'][6:])+1 |
|
>>> print("readability level: {}".format(readability_level)) |
|
readability level: 10 |
|
``` |
|
|
|
## Citation |
|
```bibtex |
|
@inproceedings{elmadani-etal-2025-readability, |
|
title = "A Large and Balanced Corpus for Fine-grained Arabic Readability Assessment", |
|
author = "Elmadani, Khalid N. and |
|
Habash, Nizar and |
|
Taha-Thomure, Hanada", |
|
booktitle = "Findings of the Association for Computational Linguistics: ACL 2025", |
|
year = "2025", |
|
address = "Vienna, Austria", |
|
publisher = "Association for Computational Linguistics" |
|
} |
|
``` |