File size: 3,565 Bytes
7b98338 21d186e 7b98338 21d186e c566d40 9be51b7 21d186e c566d40 7b98338 21d186e 9ec9d19 049fe98 9ec9d19 049fe98 4982cf0 fafd096 4982cf0 21d186e 049fe98 21d186e 9d527c8 21d186e 049fe98 21d186e 9d527c8 1e98aa1 21d186e 049fe98 21d186e 881051c 21d186e 881051c 2ad65fb 881051c 233d3ea 881051c 21d186e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 |
---
language:
- ar
metrics:
- bleu
- accuracy
library_name: transformers
pipeline_tag: text-classification
tags:
- t5
- Classification
- ArabicT5
- Text Classification
widget:
- example_title: >
الديني
- text: >
الحمد لله رب العالمين والصلاة والسلام على سيد المرسلين نبينا محمد وآله وصحبه أجمعين،وبعد:فإنه يجب على العبد أن يتجنب الذنوب كلها دقها وجلها صغيرها وكبيرها وأن يتعاهد نفسه بالتوبة الصادقة والإنابة إلى ربه. قال تعالى: (وَتُوبُوا إِلَى اللَّهِ جَمِيعًا أَيُّهَا الْمُؤْمِنُونَ لَعَلَّكُمْ تُفْلِحُونَ)النور 31.
---
# # Arabic text classification using deep learning (ArabicT5)
# # Our experiment
- The category mapping
category_mapping = {
'Politics':1,
'Finance':2,
'Medical':3,
'Sports':4,
'Culture':5,
'Tech':6,
'Religion':7
}
- Training parameters
| | |
| :-------------------: | :-----------:|
| Training batch size | `8` |
| Evaluation batch size | `8` |
| Learning rate | `1e-4` |
| Max length input | `200` |
| Max length target | `3` |
| Number workers | `4` |
| Epoch | `2` |
| | |
- Results
| | |
| :---------------------: | :-----------: |
| Validation Loss | `0.0479` |
| Accuracy | `96.49%` |
| BLeU | `96.49%` |
# # SANAD: Single-label Arabic News Articles Dataset for automatic text categorization
- Paper
[https://www.researchgate.net/publication/333605992_SANAD_Single-Label_Arabic_News_Articles_Dataset_for_Automatic_Text_Categorization]
- Dataset
[https://data.mendeley.com/datasets/57zpx667y9/2]
# # Arabic text classification using deep learning models
- Paper
[https://www.sciencedirect.com/science/article/abs/pii/S0306457319303413]
- Their experiment'
"Our experimental results showed that all models did very well on SANAD corpus with a minimum accuracy of 93.43%, achieved by CGRU, and top performance of 95.81%, achieved by HANGRU."
| Model | Accuracy |
| :---------------------: | :---------------------: |
| CGRU | 93.43% |
| HANGRU | 95.81% |
# # Example usage
```python
from transformers import T5ForConditionalGeneration, T5Tokenizer
model_name="Hezam/ArabicT5_Classification"
model = T5ForConditionalGeneration.from_pretrained(model_name)
tokenizer = T5Tokenizer.from_pretrained(model_name)
text = "الزين فيك القناه الاولي المغربيه الزين فيك القناه الاولي المغربيه اخبارنا المغربيه متابعه تفاجا زوار موقع القناه الاولي المغربي"
tokens=tokenizer(text, max_length=200,
truncation=True,
padding="max_length",
return_tensors="pt"
)
output= model.generate(tokens['input_ids'],
max_length=3,
length_penalty=10)
output = [tokenizer.decode(ids, skip_special_tokens=True,clean_up_tokenization_spaces=True)for ids in output]
output
```
```bash
['5']
``` |