File size: 3,490 Bytes

---
language:
- ar
metrics:
- bleu
- accuracy
library_name: transformers
pipeline_tag: text-classification
tags:
- t5
- Classification
- ArabicT5
- Text Classification
widget:
- example_title: > 
    الديني
- text: >
    الحمد لله رب العالمين والصلاة والسلام على سيد المرسلين نبينا محمد وآله وصحبه أجمعين،وبعد:فإنه يجب على العبد أن يتجنب الذنوب كلها دقها وجلها صغيرها وكبيرها وأن يتعاهد نفسه بالتوبة الصادقة والإنابة إلى ربه. قال تعالى: (وَتُوبُوا إِلَى اللَّهِ جَمِيعًا أَيُّهَا الْمُؤْمِنُونَ لَعَلَّكُمْ تُفْلِحُونَ)النور 31.
---

# # Arabic text classification using deep learning (ArabicT5)

  - SANAD: Single-label Arabic News Articles Dataset for automatic text categorization
  
  - Paper
  [https://www.researchgate.net/publication/333605992_SANAD_Single-Label_Arabic_News_Articles_Dataset_for_Automatic_Text_Categorization]
  
  -Dataset
  [https://data.mendeley.com/datasets/57zpx667y9/2]

# # Their experiment'

[https://www.sciencedirect.com/science/article/abs/pii/S0306457319303413]

"Our experimental results showed that all models did very well on SANAD corpus with a minimum accuracy of 93.43%, achieved by CGRU, and top performance of 95.81%, achieved by HANGRU."
|         Model           |         Accuracy        | 
| :---------------------: | :---------------------: | 
|           CGRU          |          93.43%         |   
|          HANGRU         |          95.81%         | 

# # Our experiment

# # The category mapping
  category_mapping = {
  
      'Politics':1,
      'Finance':2,
      'Medical':3,
      'Sports':4,
      'Culture':5,
      'Tech':6,
      'Religion':7
  }
  
# # Training parameters

|                       |              |
| :-------------------: | :-----------:|
|  Training batch size  |     `8`      |
| Evaluation batch size |     `8`      |
|     Learning rate     |    `1e-4`    |
|    Max length input   |    `200`     |
|   Max length target   |     `3`      |
|     Number workers    |     `4`      |
|         Epoch         |     `2`      |
|                       |              |

# # Results

|                         |               |
| :---------------------: | :-----------: | 
|   Validation Loss       |   `0.0479`    |  
|        Accuracy         |   `96.49%`    | 
|          BLeU           |   `96.49%`    |

# # Example usage
```python
from transformers import T5ForConditionalGeneration, T5Tokenizer

model_name="Hezam/ArabicT5_Classification"
model = T5ForConditionalGeneration.from_pretrained(model_name)
tokenizer = T5Tokenizer.from_pretrained(model_name)

text = "الزين فيك القناه الاولي المغربيه الزين فيك القناه الاولي المغربيه اخبارنا المغربيه  متابعه تفاجا زوار موقع القناه الاولي المغربي"
tokens=tokenizer(text, max_length=200,
                    truncation=True,
                    padding="max_length",
                    return_tensors="pt"
                )

output= model.generate(tokens['input_ids'],
                       max_length=3,
                       length_penalty=10)

output = [tokenizer.decode(ids, skip_special_tokens=True,clean_up_tokenization_spaces=True)for ids in output]
output

```
```bash
['5']
```